From w.bryant at ucl.ac.uk  Mon Jun  1 04:06:58 2009
From: w.bryant at ucl.ac.uk (Will Bryant)
Date: Mon, 01 Jun 2009 09:06:58 +0100
Subject: [Bioperl-l] Extract genomic data from GenBank
Message-ID: <4A238C22.9090604@ucl.ac.uk>

I'm trying to retrieve the complete GenBank format sequence file for a 
specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not 
exist' errors, even when trying the example gi '405830'.  The script was 
running fine September last year, but when I came back to it this week 
it wasn't working.  Am I missing something obvious?

In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100

Code:

#!/usr/bin/perl -w

use strict;
use Bio::Perl;
use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank');

my $straincomp = $gb->get_Seq_by_gi('405830');

my $seqout = 0;

#my $set_output_file = '$seqout = Bio::SeqIO->new( -format => 
\'genbank\', -file => 
\'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');';

#print $set_output_file;
eval ($set_output_file);

$seqout -> write_seq($straincomp);


Error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: gi does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi 
c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209
STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12
-----------------------------------------------------------

Many thanks,

Will Bryant.

From David.Messina at sbc.su.se  Mon Jun  1 05:04:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 1 Jun 2009 11:04:40 +0200
Subject: [Bioperl-l] Extract genomic data from GenBank
In-Reply-To: <4A238C22.9090604@ucl.ac.uk>
References: <4A238C22.9090604@ucl.ac.uk>
Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com>

Hey Will,
I think there have been API changes in GenBank's remote query interface that
have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to
BioPerl 1.6 and see if that works for you.

(Note that I've only glanced at your code -- I'm assuming that's not the
problem since it worked fine for you before.)


Dave

From fontanez at fas.harvard.edu  Mon Jun  1 08:41:06 2009
From: fontanez at fas.harvard.edu (Kristina Fontanez)
Date: Mon, 1 Jun 2009 08:41:06 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<4A205502.2030701@sendu.me.uk>
	<024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu>
	<4A205D46.4090105@sendu.me.uk>
	<C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu>

Hey everyone-

Thanks for all the advice. I reinstalled Xcode tools, installed Fink  
and downloaded bioperl successfully. It's now working smoothly.

Thanks again,
Kristina
---------------------------------------------------------------
Kristina Fontanez
PhD candidate
Department of Organismic and Evolutionary Biology
Cavanaugh lab
Harvard University
16 Divinity Ave.
Cambridge, MA 02138

tel: 617-495-1138
fax: 617-496-6933
email: fontanez at fas.harvard.edu


On May 29, 2009, at 10:40 PM, Chris Fields wrote:

Kristina,

You aren't running as superuser:

 > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez 
$ cpan

You'll need to run cpan using 'sudo cpan' if installing modules  
anywhere requiring superuser permissions.

chris

On May 29, 2009, at 5:10 PM, Sendu Bala wrote:

> Kristina Fontanez wrote:
>> Hello everyone-
>> Sendu - I took your advice but doing Install Bundle::CPAN did not  
>> take care of the dependencies. It still failed. See attached txt  
>> file with my terminal output. Does anyone have any idea how this  
>> might be?
>
> From reading the output it seems like perhaps you don't have 'make'  
> or there is something wrong when using it. If you're on a mac you  
> may need to install the dev tools. Someone else want to jump in here  
> with advice?
>
> Also, check your CPAN configuration to ensure it is trying to use  
> the correct make commands. ('o conf' etc.)
>
>
>> If I wanted to wipe all perl from my computer and simply start  
>> over, how might this be accomplished?
>
> Don't do that. At least not until you know you have a working make  
> setup.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun  1 10:55:50 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 10:55:50 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife>

Hi All 
Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. 
cheers and thanks for all input--
Mark

From cjfields at illinois.edu  Mon Jun  1 11:21:30 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 10:21:30 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>

A autogenerated passthrough Makefile.PL is generated with the  
distribution:

http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL

We may remove that in future releases, but it should work regardless  
(i.e. call Module::Build and Build.PL).  I'm pretty convinced that the  
issue was permissions-based at heart.  Note Kristina ran 'cpan'  
instead of 'sudo cpan' to invoke the shell, so the shell is using  
current user config instead of su for installation.  You need to use  
'sudo' to install anything /Library/Perl on Mac (unless you are  
already 'root', but on recent OS X version logging in as 'root' is  
turned off).

I just noticed nothing is mentioned along these lines in the  
installation docs, so we'll need to update those.

chris

On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:

> Hi Kristina,
>
> [Don't forget to reply-all, so the list stays in the loop. Many many  
> more helpers
> there.]
>
> Apparently cpan can't make the Makefile, but can download and expand  
> the
> library directories, in your .cpan directory (see edited highlights  
> below).
>
> Let's appeal to the BioPerl brethren/sestren---answers?
>
> MAJ
>
>
> term dump:
>
> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
> Terminal does not support AddHistory.
>
> cpan shell -- CPAN exploration and modules installation (v1.7602)
> ReadLine support available (try 'install Bundle::CPAN')
>
> cpan> install Test::Harness
> CPAN: Storable loaded ok
> Going to read /Users/kristinafontanez/.cpan/Metadata
> Database was generated on Fri, 29 May 2009 11:27:00 GMT
> Running install for module Test::Harness
> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
> CPAN: Digest::MD5 loaded ok
> CPAN: Compress::Zlib loaded ok
> Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ 
> ANDYA/Test-Harness-3.17.tar.gz ok
> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
> Test-Harness-3.17/
> Test-Harness-3.17/Build.PL
> ...
> Test-Harness-3.17/xt/perls/sample-tests/
> Test-Harness-3.17/xt/perls/sample-tests/perl_version
> Removing previously used /Users/kristinafontanez/.cpan/build/Test- 
> Harness-3.17
>
> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>
> Checking if your kit is complete...
> Looks good
> Writing Makefile for Test::Harness
>   -- NOT OK
> Running make test
> Can't test without successful make
> Running make install
> make had returned bad status, install seems impossible
>
> cpan> install File::HomeDir
> ...[more of same]...
>
>
> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu 
> >
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Friday, May 29, 2009 3:56 PM
> Subject: Re: [Bioperl-l] problem with bioperl install
>
>
>> Mr. Jensen-
>>
>> Thank you for your help but unfortunately the installation of
>> Test::Harness etc didn't work. I copied my terminal output and
>> attached the file. Any advice on what's still going wrong?
>>
>> Thanks,
>> Kristina
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>>
>>
>>
>> ---------------------------------------------------------------
>> Kristina Fontanez
>> PhD candidate
>> Department of Organismic and Evolutionary Biology
>> Cavanaugh lab
>> Harvard University
>> 16 Divinity Ave.
>> Cambridge, MA 02138
>>
>> tel: 617-495-1138
>> fax: 617-496-6933
>> email: fontanez at fas.harvard.edu
>>
>>
>>
>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>
>> The message says you are first updating your CPAN.pm.
>> That module needs modules you don't have, so
>>
>> use cpan to install the dependencies you don't have, viz.
>>>   Test::Harness
>>>   File::HomeDir
>>
>> $ cpan
>>> install Test::Harness
>> etc.
>> Then install CPAN.pm again (or run the Bioperl install again).
>>
>> Lather, rinse, repeat the install of Bioperl until it completes
>> without errors.
>>
>> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu
>> >
>> To: <bioperl-l at bioperl.org>
>> Sent: Friday, May 29, 2009 3:07 PM
>> Subject: [Bioperl-l] problem with bioperl install
>>
>>
>>> Hello-
>>>
>>> I am trying to install bioperl and I ran into some problems. See
>>> list  below.
>>>
>>>
>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>
>>> Checking if your kit is complete...
>>> Looks good
>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>> Writing Makefile for CPAN
>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>> CPAN-1.94.tar.gz] -----
>>>   Test::Harness
>>>   File::HomeDir
>>>
>>>
>>> How can I fix this?
>>>
>>>
>>> Thanks,
>>> Kristina
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jun  1 12:14:07 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 11:14:07 -0500
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>

I think, as long is it doesn't significantly impact SearchIO  
performance wise (from reading the HOWTO I can't see how it will), I  
say commit away. In fact, I consider this a bug fix that should be in  
the next 1.6 point release. We should add deprecation warnings where  
needed for 1.7...

chris

On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:

> Hi All
> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  
> exhibition of B::S::Tiling, use cases, code snippets, design,  
> implementation and algorithm discussions. We're just about ready to  
> port over to core from bioperl-dev; please shout out if this is not  
> a good idea.
> cheers and thanks for all input--
> Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.bolser at gmail.com  Mon Jun  1 12:27:30 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Mon, 1 Jun 2009 17:27:30 +0100
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>

2009/6/1 Chris Fields <cjfields at illinois.edu>:

...
> for installation. ?You need to use 'sudo' to install anything /Library/Perl
> on Mac (unless you are already 'root', but on recent OS X version logging in
...

local::lib is supposed to take care of this. Is this broken on Mac?
Building stuff as root is generally considered to be bad.


> I just noticed nothing is mentioned along these lines in the installation
> docs, so we'll need to update those.

I tried to write down a clear 'recipe' for getting things installed
(this was actually on the GMod wiki). I really think the install docs
could be improved. Sometimes less verbose is better.

Dan

> chris
>
> On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:
>
>> Hi Kristina,
>>
>> [Don't forget to reply-all, so the list stays in the loop. Many many more
>> helpers
>> there.]
>>
>> Apparently cpan can't make the Makefile, but can download and expand the
>> library directories, in your .cpan directory (see edited highlights
>> below).
>>
>> Let's appeal to the BioPerl brethren/sestren---answers?
>>
>> MAJ
>>
>>
>> term dump:
>>
>> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
>> Terminal does not support AddHistory.
>>
>> cpan shell -- CPAN exploration and modules installation (v1.7602)
>> ReadLine support available (try 'install Bundle::CPAN')
>>
>> cpan> install Test::Harness
>> CPAN: Storable loaded ok
>> Going to read /Users/kristinafontanez/.cpan/Metadata
>> Database was generated on Fri, 29 May 2009 11:27:00 GMT
>> Running install for module Test::Harness
>> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> CPAN: Digest::MD5 loaded ok
>> CPAN: Compress::Zlib loaded ok
>> Checksum for
>> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> ok
>> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
>> Test-Harness-3.17/
>> Test-Harness-3.17/Build.PL
>> ...
>> Test-Harness-3.17/xt/perls/sample-tests/
>> Test-Harness-3.17/xt/perls/sample-tests/perl_version
>> Removing previously used
>> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17
>>
>> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>>
>> Checking if your kit is complete...
>> Looks good
>> Writing Makefile for Test::Harness
>> ?-- NOT OK
>> Running make test
>> Can't test without successful make
>> Running make install
>> make had returned bad status, install seems impossible
>>
>> cpan> install File::HomeDir
>> ...[more of same]...
>>
>>
>> ----- Original Message ----- From: "Kristina Fontanez"
>> <fontanez at fas.harvard.edu>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Friday, May 29, 2009 3:56 PM
>> Subject: Re: [Bioperl-l] problem with bioperl install
>>
>>
>>> Mr. Jensen-
>>>
>>> Thank you for your help but unfortunately the installation of
>>> Test::Harness etc didn't work. I copied my terminal output and
>>> attached the file. Any advice on what's still going wrong?
>>>
>>> Thanks,
>>> Kristina
>>>
>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>>
>>> The message says you are first updating your CPAN.pm.
>>> That module needs modules you don't have, so
>>>
>>> use cpan to install the dependencies you don't have, viz.
>>>>
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>
>>> $ cpan
>>>>
>>>> install Test::Harness
>>>
>>> etc.
>>> Then install CPAN.pm again (or run the Bioperl install again).
>>>
>>> Lather, rinse, repeat the install of Bioperl until it completes
>>> without errors.
>>>
>>> ----- Original Message ----- From: "Kristina Fontanez"
>>> <fontanez at fas.harvard.edu
>>> >
>>> To: <bioperl-l at bioperl.org>
>>> Sent: Friday, May 29, 2009 3:07 PM
>>> Subject: [Bioperl-l] problem with bioperl install
>>>
>>>
>>>> Hello-
>>>>
>>>> I am trying to install bioperl and I ran into some problems. See
>>>> list ?below.
>>>>
>>>>
>>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>>
>>>> Checking if your kit is complete...
>>>> Looks good
>>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>>> Writing Makefile for CPAN
>>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>>> CPAN-1.94.tar.gz] -----
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>>
>>>>
>>>> How can I fix this?
>>>>
>>>>
>>>> Thanks,
>>>> Kristina
>>>> ---------------------------------------------------------------
>>>> Kristina Fontanez
>>>> PhD candidate
>>>> Department of Organismic and Evolutionary Biology
>>>> Cavanaugh lab
>>>> Harvard University
>>>> 16 Divinity Ave.
>>>> Cambridge, MA 02138
>>>>
>>>> tel: 617-495-1138
>>>> fax: 617-496-6933
>>>> email: fontanez at fas.harvard.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Jun  1 13:15:42 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 12:15:42 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>


On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote:

> 2009/6/1 Chris Fields <cjfields at illinois.edu>:
>
> ...
>> for installation.  You need to use 'sudo' to install anything / 
>> Library/Perl
>> on Mac (unless you are already 'root', but on recent OS X version  
>> logging in
> ...
>
> local::lib is supposed to take care of this. Is this broken on Mac?
> Building stuff as root is generally considered to be bad.

You can install to a local lib, yes, but cpan needs to be manually  
configured to do this; I don't think it is automatically configured to  
do so in OS X, eg. it defaults to /Library/Perl.

Frankly, I sidestep the whole issue with my own custom perl  
installation, but that's me.

>> I just noticed nothing is mentioned along these lines in the  
>> installation
>> docs, so we'll need to update those.
>
> I tried to write down a clear 'recipe' for getting things installed
> (this was actually on the GMod wiki). I really think the install docs
> could be improved. Sometimes less verbose is better.
>
> Dan

True, but I would much rather have reasonable instructions that  
outline most installation issues than ones that aren't detailed enough.

My thought is to strip down the INSTALL doc that comes with BioPerl  
down to the essentials and point to the wiki for the more detailed  
ones (including problems encountered).  It's too hard to maintain both  
and backport the wiki into plain text.

chris

From maj at fortinbras.us  Mon Jun  1 15:03:05 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 15:03:05 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
	<6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
Message-ID: <AABEFA992F2345548C861ADDFDC50132@NewLife>

Thanks, Chris--

Bio::Search::Tiling is now ported to core; the snapshot of the ported version is 
in bioperl-dev/tags/tiling-port-to-core-060109.
Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets 
BIOPERL_TILING_EXHAUSTIVE_TESTS .

Cry 'Havoc!' and let slip the dogs of war...

MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Sendu Bala" <bix at sendu.me.uk>; "Dave Messina" <dave at davemessina.com>; 
"BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, June 01, 2009 12:14 PM
Subject: Re: [Bioperl-l] a HOWTO for Tiling


>I think, as long is it doesn't significantly impact SearchIO  performance wise 
>(from reading the HOWTO I can't see how it will), I  say commit away. In fact, 
>I consider this a bug fix that should be in  the next 1.6 point release. We 
>should add deprecation warnings where  needed for 1.7...
>
> chris
>
> On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:
>
>> Hi All
>> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  exhibition of 
>> B::S::Tiling, use cases, code snippets, design,  implementation and algorithm 
>> discussions. We're just about ready to  port over to core from bioperl-dev; 
>> please shout out if this is not  a good idea.
>> cheers and thanks for all input--
>> Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From koenvanderdrift at gmail.com  Mon Jun  1 18:22:23 2009
From: koenvanderdrift at gmail.com (Koen van der Drift)
Date: Mon, 1 Jun 2009 18:22:23 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>


On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:

> My thought is to strip down the INSTALL doc that comes with BioPerl  
> down to the essentials and point to the wiki for the more detailed  
> ones (including problems encountered).  It's too hard to maintain  
> both and backport the wiki into plain text.


Good idea, please then also update the file PLATFORMS. It has a link  
to a very outdated website for the installation of bioperl on OS X.  
And maybe a line + link to the bioperl wiki can be added that  
recommends the use of fink as an alternative to cpan?

cheers,

- Koen.


From cjfields at illinois.edu  Mon Jun  1 19:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 18:27:32 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
	<2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu>


On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote:

>
> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:
>
>> My thought is to strip down the INSTALL doc that comes with BioPerl  
>> down to the essentials and point to the wiki for the more detailed  
>> ones (including problems encountered).  It's too hard to maintain  
>> both and backport the wiki into plain text.
>
>
> Good idea, please then also update the file PLATFORMS. It has a link  
> to a very outdated website for the installation of bioperl on OS X.  
> And maybe a line + link to the bioperl wiki can be added that  
> recommends the use of fink as an alternative to cpan?
>
> cheers,
>
> - Koen.

Done. I've added a ticket on bugzilla for tracking this so it doesn't  
get lost:

http://bugzilla.open-bio.org/show_bug.cgi?id=2846

chris

From shalabh.sharma7 at gmail.com  Tue Jun  2 10:44:25 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 10:44:25 -0400
Subject: [Bioperl-l] Refseq Hits
Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>

Hi All,
          This is not really a bioperl query, but i am really confused and
need some help.
I blasted some sequences against refseq database (locally). After parsing
the blast result what i noticed that some description fields contain two hit
names like:
hit_name ->    gi|71082715|ref|YP_265434.1|
Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
[Candidatus Pelagibacter ubique HTCC1002]

So besides giving me description for hit_name (HTCC 1062) its also giving me
HTCC 1002.
I will really appreciate if someone can help me out.

Thanks
Shalabh
_________________________________________________
Shalabh Sharma
Scientific Computing Professional Associate
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

phone: 706-542-0341
email: ssharmai at uga.edu

From jonathancrabtree at gmail.com  Tue Jun  2 11:04:33 2009
From: jonathancrabtree at gmail.com (Jonathan Crabtree)
Date: Tue, 2 Jun 2009 11:04:33 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>

Hi Shalabh-

I believe RefSeq is a non-redundant database, in which sequence entries with
identical sequences are merged and their descriptions are concatenated in
the FASTA defline.  If you look up the two accession numbers/gi numbers from
your search results I think you'll see that both are valid matches because
their polypeptide sequences are identical:

http://www.ncbi.nlm.nih.gov/protein/71082715
http://www.ncbi.nlm.nih.gov/protein/91762865

You're just getting a single match with two descriptions instead of two
matches with one description, but the sequence is the same and so, therefore
are the blast alignments.

Jonathan

On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>          This is not really a bioperl query, but i am really confused and
> need some help.
> I blasted some sequences against refseq database (locally). After parsing
> the blast result what i noticed that some description fields contain two
> hit
> names like:
> hit_name ->    gi|71082715|ref|YP_265434.1|
> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
> [Candidatus Pelagibacter ubique HTCC1002]
>
> So besides giving me description for hit_name (HTCC 1062) its also giving
> me
> HTCC 1002.
> I will really appreciate if someone can help me out.
>
> Thanks
> Shalabh
> _________________________________________________
> Shalabh Sharma
> Scientific Computing Professional Associate
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>
> phone: 706-542-0341
> email: ssharmai at uga.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From shalabh.sharma7 at gmail.com  Tue Jun  2 11:15:45 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 11:15:45 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>

Hi Jonathan,                  Your information is really helpful. Thanks a
lot.

-Shalabh


On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
jonathancrabtree at gmail.com> wrote:

>
> Hi Shalabh-
>
> I believe RefSeq is a non-redundant database, in which sequence entries
> with identical sequences are merged and their descriptions are concatenated
> in the FASTA defline.  If you look up the two accession numbers/gi numbers
> from your search results I think you'll see that both are valid matches
> because their polypeptide sequences are identical:
>
> http://www.ncbi.nlm.nih.gov/protein/71082715
> http://www.ncbi.nlm.nih.gov/protein/91762865
>
> You're just getting a single match with two descriptions instead of two
> matches with one description, but the sequence is the same and so, therefore
> are the blast alignments.
>
> Jonathan
>
> On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>          This is not really a bioperl query, but i am really confused and
>> need some help.
>> I blasted some sequences against refseq database (locally). After parsing
>> the blast result what i noticed that some description fields contain two
>> hit
>> names like:
>> hit_name ->    gi|71082715|ref|YP_265434.1|
>> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
>> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
>> protein
>> [Candidatus Pelagibacter ubique HTCC1002]
>>
>> So besides giving me description for hit_name (HTCC 1062) its also giving
>> me
>> HTCC 1002.
>> I will really appreciate if someone can help me out.
>>
>> Thanks
>> Shalabh
>> _________________________________________________
>> Shalabh Sharma
>> Scientific Computing Professional Associate
>> Department of Marine Sciences
>> University of Georgia
>> Athens, GA 30602-3636
>>
>> phone: 706-542-0341
>> email: ssharmai at uga.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

From tristan.lefebure at gmail.com  Tue Jun  2 12:24:21 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 2 Jun 2009 12:24:21 -0400
Subject: [Bioperl-l] Creating a fastq format file?
In-Reply-To: <ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com>
	<ddde1f420904262242s533bd5abqeb9db75463d5a8f2@mail.gmail.com>
	<ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
Message-ID: <200906021224.21439.tristan.lefebure@gmail.com>

On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote:
> I convinced at least myself to the degree that I wrote
> the range_convert() method - with plenty of tests. I
> mention this now so that no-one else need to start
> thinking through all the edge values.
>
> :)
>
> I'll contribute it to the code base once there is a
> consensus of best way forward.
>

Heikki,

This thread has been quiet for a while, but I don't see 
anything new in Bio::Seq::Quality. Did we reach a consensus 
or are you waiting for some more discussion on the subject?

(I'm pretty impatient to see bioperl handling both sanger 
and illumina ranges on the fly!)

--Tristan

>     -Heikki
>
> 2009/4/27 Heikki Lehvaslaiho 
<heikki.lehvaslaiho at gmail.com>:
> >> I have tried to summarise this in a central place:
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >
> > Torsten,
> >
> > Thanks for putting this together. Very helpful.
> >
> > Do you have a plan of action?  Let me propose one for
> > BioPerl. It based on following assumptions:
> >
> > 1. There is multitude of different ways of coding
> > quality values out there. 2. Bio::Seq::Quality is
> > agnostic of any quality value range rules 3. The
> > emerging open standard is the Sanger fastq
> > specification 4. Open source programs use the Sanger
> > fastq specs
> >
> >
> > From these it follows that:
> >
> >
> > 1. BioPerl should support Sanger fastq standard
> >
> > 1.1. it already does and there are other SeqIO modules
> > for dealing with other non-fastq formats.
> >
> > 2. BioPerl should offer simple ways of converting
> > between quality range rules
> >
> > 2.1. Have a generic method accessible from
> > Bio::Seq::Quality with preset versions of the method
> > for converting between known variants (Sanger fastq and
> > the two Illumina versions)
> >
> > For example:
> >
> > range_convert ($from_lower, $from_upper, $to_lower,
> > $to_upper, $value) throw if $value < $from_lower or
> > $value > $from_upper return $newvalue
> >
> > range_convert_illumina2fastq(),
> > range_convert_fastq2illumina(),
> > range_convert_fastq2phred(),
> >  range_convert_phred2fastq()....
> >
> > (assuming that illumina 1.3 eq phred)
> >
> > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert
> > Illumina qualities into Sanger fastq on the fly
> >
> > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the
> > incoming stream of quality value range either
> > automatically or be given a keyword parameter
> > indicating the range.
> >
> > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.4. It would be useful but not absolutely necessary
> > for Bio::SeqIO::Fastq::write_seq to be able to write
> > out in Illumina ranges
> >
> >
> > What do you think?
> >
> >    -Heikki
> >
> > 2009/4/26 Torsten Seemann 
<torsten.seemann at infotech.monash.edu.au>:
> >>> > This might be a good place to ask the question:
> >>> > having looked at the fastq.pm page, is the fastq
> >>> > format defined (only) by a "@'" followed by
> >>>
> >>> a
> >>>
> >>> > sequence line and a "+" header followed by a
> >>> > quality line and the two headers have to agree? Now
> >>> > that Illumina is using phred scaling, are 'Sanger'
> >>> > and 'Illumina' versions the same?
> >>>
> >>> No they aren't the same, Illumina still encodes the
> >>> ascii as value + 64 and Sanger as value + 33.
> >>
> >> Illumina have now CHANGED how they calculate the
> >> quality value however in the last month or so... Their
> >> Q range used to be -5..40 mapped to ASCII 64+, but now
> >> they produce Q >= 0 and it is unclear if they start at
> >> 69 or 64 now...
> >>
> >> I have tried to summarise this in a central place:
> >>
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >>
> >> Corrections welcome!
> >>
> >>
> >> --Torsten Seemann
> >> --Victorian Bioinformatics Consortium, Dept.
> >> Microbiology, Monash University, AUSTRALIA
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> >    -Heikki
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +27 (0)714328090
> > Sent from Claremont, WC, South Africa


From Russell.Smithies at agresearch.co.nz  Tue Jun  2 16:56:26 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 3 Jun 2009 08:56:26 +1200
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
	<9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz>

The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Wednesday, 3 June 2009 3:16 a.m.
> To: Jonathan Crabtree
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Refseq Hits
> 
> Hi Jonathan,                  Your information is really helpful. Thanks a
> lot.
> 
> -Shalabh
> 
> 
> On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
> jonathancrabtree at gmail.com> wrote:
> 
> >
> > Hi Shalabh-
> >
> > I believe RefSeq is a non-redundant database, in which sequence entries
> > with identical sequences are merged and their descriptions are concatenated
> > in the FASTA defline.  If you look up the two accession numbers/gi numbers
> > from your search results I think you'll see that both are valid matches
> > because their polypeptide sequences are identical:
> >
> > http://www.ncbi.nlm.nih.gov/protein/71082715
> > http://www.ncbi.nlm.nih.gov/protein/91762865
> >
> > You're just getting a single match with two descriptions instead of two
> > matches with one description, but the sequence is the same and so, therefore
> > are the blast alignments.
> >
> > Jonathan
> >
> > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > > wrote:
> >
> >> Hi All,
> >>          This is not really a bioperl query, but i am really confused and
> >> need some help.
> >> I blasted some sequences against refseq database (locally). After parsing
> >> the blast result what i noticed that some description fields contain two
> >> hit
> >> names like:
> >> hit_name ->    gi|71082715|ref|YP_265434.1|
> >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
> >> protein
> >> [Candidatus Pelagibacter ubique HTCC1002]
> >>
> >> So besides giving me description for hit_name (HTCC 1062) its also giving
> >> me
> >> HTCC 1002.
> >> I will really appreciate if someone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _________________________________________________
> >> Shalabh Sharma
> >> Scientific Computing Professional Associate
> >> Department of Marine Sciences
> >> University of Georgia
> >> Athens, GA 30602-3636
> >>
> >> phone: 706-542-0341
> >> email: ssharmai at uga.edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Tue Jun  2 17:05:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 2 Jun 2009 17:05:03 -0400
Subject: [Bioperl-l] Bio::Search::Tiling
Message-ID: <B006036D760941179148C9F8E2AD7E05@NewLife>

All-
Bio::Search::Tiling is now in bioperl-live, passes all tests.
Thanks, 
Mark

From shalabh.sharma7 at gmail.com  Wed Jun  3 13:27:59 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 3 Jun 2009 13:27:59 -0400
Subject: [Bioperl-l] gbf to gff
Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>

Hi all,                 I am working on Roseobacters. Many times I've
converted gbk file from GenBank to gff format but now one genome
"Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
gbf files:

https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain

So now how i can convert this genome to one gff file so i can use it in
gbrowse?
I would really appreciate if anyone can help me out.

Thanks

From scott at scottcain.net  Wed Jun  3 14:11:54 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 3 Jun 2009 14:11:54 -0400
Subject: [Bioperl-l] gbf to gff
In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com>

Hi Shalabh,

Do you want them combined onto a single reference sequence?  I'm
guessing this is a circular microbial genome in two segments.  Do you
know how to the coordinates in one genbank file relates to the other
(or are you willing to make something up)?  I imagine the way I would
do it would be to convert both files to gff and then write a quicky
script to convert the coordinates and reference sequence name (column
1) of one file to be consistent with the other.

Scott


On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've
> converted gbk file from GenBank to gff format but now one genome
> "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
> gbf files:
>
> https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain
>
> So now how i can convert this genome to one gff file so i can use it in
> gbrowse?
> I would really appreciate if anyone can help me out.
>
> Thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From alperyilmaz at gmail.com  Fri Jun  5 14:50:46 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Fri, 5 Jun 2009 14:50:46 -0400
Subject: [Bioperl-l] GBroswe2 - feature details
Message-ID: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>

Dear all,

I have a question about utilizing the tag/value pairs that were used
in 9th of GFF. If my 9th column is like this:

ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22

How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
print name and sequence of a BindingSite, what do I need to replace
question marks below?

balloon hover = <font size=small color=red>Motif name: $name,
Sequence: ???????</font>


The manual is mentioning that it's possible to use user defined
tag/value pairs, but I couldn't figure out how. The manual is
mentioning:
 [feature_type:details]
 tag1 = formatting rule
 tag2 = formatting rule
 tag3 = formatting rule

can be used to adjust formatting of a tag, but I don't how this can be
used to assign value to a tag? I tried ;
[cis-elements:details]
bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
mentioned, tags are case-insensitive)
 OR
$bs_seq = <b>$value</b>

but, I cannot use $bs_seq in hover link option after doing this. What
am I doing wrong?

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954
www.grassius.org

From cjfields at illinois.edu  Fri Jun  5 16:43:04 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 5 Jun 2009 15:43:04 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm?
In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
References: <002b01c9e567$e09b0de0$a1d129a0$@edu>
	<A145C0B1-D2B3-47CB-BA46-DCCDD693D05F@illinois.edu>
	<52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
Message-ID: <C29B8160-5682-48AF-BD9E-A5FF26EC679F@illinois.edu>

(Just so this is going to the correct list)

Marcos,

I'll look into it.  This may have been fixed in between the releases,  
though.

There isn't a PPM available for 1.6 yet (several prereqs were missing  
at the time of the 1.6 release, such as Graphviz and so on).  A bug  
report is in the queue for this, though, as a reminder.  I think those  
are now available, though, so we should *theoretically* be capable of  
getting a PPM ready.  I say 'theoretically' b/c I don't have easy  
access to a PC running Windows (I have moved to OS X).  I'll see what  
I can do about that in the next few weeks.

In the meantime, if you need it you can download 1.6 or the 'nightly  
build' version (nightly snapshots of svn code) and add it to PERL5LIB  
or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work.

Nightly builds:

http://bioperl.org/DIST/nightly_builds/

chris

On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote:

> OK, I attached the first record for both files.  These are GenBank  
> flat files that were emailed to us and transferred from Macs to PCs,  
> so I am not sure if the encoding/line terminations got messed up at  
> some point.  I converted the line terminations to Unix and the  
> encoding to Western European Windows, still, it didn't work. May be  
> worth it mention that BioEdit did understand the format after I  
> fixed the encoding.
>
> The data was erased because my boss is kind of finicky about sharing  
> information.  However, I tested the files attached to this email and  
> got the same results.
>
> I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the  
> availability of an upgrade from CPAN, are you releasing the PPD as  
> well?
>
> Thanks!
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thu 6/4/2009 8:05 PM
> To: Barbeitos, Marcos
> Cc: bioperl-guts-l at lists.open-bio.org
> Subject: Re: [Bioperl-guts-l] Bug in genbank.pm?
>
> Marcos,
>
> We need the GenBank file (or the accession) you are attempting to
> parse.  Also, what version are you using?  We have released v. 1.6 on
> CPAN, and I intend on releasing 1.6.1 soon.
>
> chris
>
> On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote:
>
>> Hello.  I am trying to parse the Info from GeneBank flat files using
>> Bio::SeqIO.  I got two file which are virtually identical and one of
>> them
>> gets parsed just fine.  However, in the case of the other, the  
>> program
>> croaks when trying to parse the features and gives me:
>>
>>
>>
>> -------------------- WARNING ---------------------
>>
>> MSG: Unexpected error in feature table for  Skipping feature,
>> attempting to
>> recover
>>
>> ---------------------------------------------------
>>
>>
>>
>> I noticed that it does that after it reads the entry '/organism' in
>> Features.  The only difference I can see between the two files is the
>> presence of the feature ' /organelle' and of the line BASE COUNT in
>> one of
>> them, but the error persists even after I remove these lines.  Apart
>> from
>> that, there are the number of white spaces that precede the
>> beginning of
>> each line.   Any ideas?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Marcos S. Barbeitos
>>
>> Post-Doc Fellow
>>
>> The University of Kansas
>> Department of Ecology and Evolutionary Biology
>> 2041 Haworth Hall
>> 1200 Sunnyside Avenue
>> Lawrence, Kansas 66045
>> p: 785.864.5887
>> f: 785.864.5860
>>
>>
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
>
> <BioPerlTest.gb>


From Russell.Smithies at agresearch.co.nz  Sun Jun  7 16:32:27 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 8 Jun 2009 08:32:27 +1200
Subject: [Bioperl-l] GBroswe2 - feature details
In-Reply-To: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
References: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz>

For the first part of your question, you can use a sub to access values in your annotations:

balloon hover = sub{my $f = shift;
			my %a = $f->attributes;
			my $name = $f->name;
			my $seq = $a{'BS_Seq'};
			return "<font size=small color=red>Motif name: $name, Sequence: $seq</font>" if defined $seq;
			return "<font size=small color=red>Motif name: $name, No sequence defined</font>";
			}


For the second bit, here's the formatting rules I'm using to create hyperlinks:

[Dbxref:DETAILS]
URL = sub {
      my ($tag,$value)=@_;
      if ($value =~ /NCBI_gi:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1";
       }
      if ($value =~ /NCBI_Gene:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1";
       }
       return;
     }

And this is what the gff looks like:
BTA10	refseq	mRNA	10011147	10176454	0	-	.	ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311;
BTA10	refseq	mRNA	10241506	10301142	0	+	.	ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862;

Hopefully, this will get you going :-)


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz
> Sent: Saturday, 6 June 2009 6:51 a.m.
> To: BioPerl List
> Subject: [Bioperl-l] GBroswe2 - feature details
> 
> Dear all,
> 
> I have a question about utilizing the tag/value pairs that were used
> in 9th of GFF. If my 9th column is like this:
> 
> ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22
> 
> How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
> print name and sequence of a BindingSite, what do I need to replace
> question marks below?
> 
> balloon hover = <font size=small color=red>Motif name: $name,
> Sequence: ???????</font>
> 
> 
> The manual is mentioning that it's possible to use user defined
> tag/value pairs, but I couldn't figure out how. The manual is
> mentioning:
>  [feature_type:details]
>  tag1 = formatting rule
>  tag2 = formatting rule
>  tag3 = formatting rule
> 
> can be used to adjust formatting of a tag, but I don't how this can be
> used to assign value to a tag? I tried ;
> [cis-elements:details]
> bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
> mentioned, tags are case-insensitive)
>  OR
> $bs_seq = <b>$value</b>
> 
> but, I cannot use $bs_seq in hover link option after doing this. What
> am I doing wrong?
> 
> thanks,
> 
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> www.grassius.org
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From bernd.jagla at pasteur.fr  Mon Jun  8 12:24:12 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 8 Jun 2009 18:24:12 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>

Hi, 

 
I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
'install Bio::Das'
This is perl, v5.8.9 built for darwin-2level
(please let me know if you need anything else)

 
I am trying to install Bio::Das 1.11

 
I get the following error:

 
not ok 3

not ok 4

Can't call method "description" on an undefined value at t/01das.t line 62.

 
When going into the sources for 01das.t and printing out $db I get:

 
$VAR1 = \bless( {

                   'autotypes' => undef,

                   'default_dsn' => undef,

                   'autocategories' => undef,

                   'sockets' => {},

                   'aggregators' => [

                                      bless( {

                                               'sub_parts' => [

 
'coding_exon'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'CDS',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator'
),

                                      bless( {

                                               'sub_parts' => [

                                                                'EST_match'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'alignment',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator' )

                                    ],

                   'timeout' => undef,

                   'oldstyle_api' => 1,

                   'default_server' => 'http://www.wormbase.org/db/seq/das'

                 }, 'Bio::Das' );

 
@sources is empty

And test(3, at sources) fails.

 
Please advise.

 
Thanks,

 
Bernd

 
From lincoln.stein at gmail.com  Mon Jun  8 13:00:48 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 8 Jun 2009 13:00:48 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>

Hi,

The regression tests require an active Internet connection, as well as the
DAS test server being up and running. It may be there was a temporary
failure of one of those two. I just tested on my end and the regression
tests ran ok, so could you try it again?

Lincoln

On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:

> Hi,
>
>
>
> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
> 'install Bio::Das'
> This is perl, v5.8.9 built for darwin-2level
> (please let me know if you need anything else)
>
>
>
> I am trying to install Bio::Das 1.11
>
>
>
> I get the following error:
>
>
>
> not ok 3
>
> not ok 4
>
> Can't call method "description" on an undefined value at t/01das.t line 62.
>
>
>
> When going into the sources for 01das.t and printing out $db I get:
>
>
>
> $VAR1 = \bless( {
>
>                   'autotypes' => undef,
>
>                   'default_dsn' => undef,
>
>                   'autocategories' => undef,
>
>                   'sockets' => {},
>
>                   'aggregators' => [
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>
> 'coding_exon'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'CDS',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator'
> ),
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>                                                                'EST_match'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'alignment',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator' )
>
>                                    ],
>
>                   'timeout' => undef,
>
>                   'oldstyle_api' => 1,
>
>                   'default_server' => 'http://www.wormbase.org/db/seq/das'
>
>                 }, 'Bio::Das' );
>
>
>
>
>
> @sources is empty
>
> And test(3, at sources) fails.
>
>
>
> Please advise.
>
>
>
> Thanks,
>
>
>
> Bernd
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>

From lsbrath at gmail.com  Mon Jun  8 16:28:46 2009
From: lsbrath at gmail.com (lsbrath at gmail.com)
Date: Mon, 08 Jun 2009 20:28:46 +0000
Subject: [Bioperl-l] fasta conversion
Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com>

Hello!

I am running into trouble while trying to convert a text file to fasta. It  
should be simple enough but I am getting a wierd error message.

This is my script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
use Bio::SeqIO;


my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
my $maid = '13063';

opendir my $dh, "$maid_dir"; # directory to search
my @files = readdir $dh;
#find the _fasta file
for my $f (@files){
my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
my $r = $maid_dir."/".$maid."_hu_1kb.txt";
open (my $in,$r);
if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta

print Dumper($f);
my $hu_1kb = $maid.'_hu_1kb'; #file to convert
my $in = Bio::SeqIO->new(-file => $r,
-format => 'raw');
my $out = Bio::SeqIO->new(-file => ">$fa",
-format => 'Fasta');
while ( my $seq = $in->next_seq()) {
$out->write_seq($seq);
}
}
}

I keep getting the following error message:

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 13063
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [13063HU] which does not look healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
STACK: Bio::Seq::SeqFactory::create  
C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
-----------------------------------------------------------

Anyone out there that can help me solve this?

From kjaja27 at yahoo.com  Fri Jun  5 19:42:13 2009
From: kjaja27 at yahoo.com (kayj)
Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT)
Subject: [Bioperl-l]  finding SNPs in a given region
Message-ID: <23897107.post@talk.nabble.com>


Hi All,

Is there a way to find the SNPs in a given region, I have the start and the
end base pair position, I am looking to download the SNPs in different
regions, Is that possible ?
 This is my first time using bioperl and any help will be greatly
appreciated

Thanks

-- 
View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From kjaja27 at yahoo.com  Mon Jun  8 09:49:24 2009
From: kjaja27 at yahoo.com (kayj)
Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT)
Subject: [Bioperl-l]  How to extract SNPs
Message-ID: <23924432.post@talk.nabble.com>


Hi All,
I have several regions on the genome each is defined with the start and the
end base pair position. I am looking into using HapMap
http://hapmart.hapmap.org/BioMart/martview

 to extract the SNPs in these region given a population. I am new to bioperl
and any help will be greatly appreciated.


-- 
View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bernd at pasteur.fr  Mon Jun  8 16:31:57 2009
From: bernd at pasteur.fr (bernd at pasteur.fr)
Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST)
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>

I tested the connection with wget and everything works fine.
I suspect that our proxy might be the problem but all variables are set
correctly (ftp_proxy, http_proxy and many more) I am not sure which
environment variable are being used...
I am not too familiar with all this and don't know where to look for the
right configurations.

Thanks,

Bernd

> Hi,
>
> The regression tests require an active Internet connection, as well as the
> DAS test server being up and running. It may be there was a temporary
> failure of one of those two. I just tested on my end and the regression
> tests ran ok, so could you try it again?
>
> Lincoln
>
> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>> -e
>> 'install Bio::Das'
>> This is perl, v5.8.9 built for darwin-2level
>> (please let me know if you need anything else)
>>
>>
>>
>> I am trying to install Bio::Das 1.11
>>
>>
>>
>> I get the following error:
>>
>>
>>
>> not ok 3
>>
>> not ok 4
>>
>> Can't call method "description" on an undefined value at t/01das.t line
>> 62.
>>
>>
>>
>> When going into the sources for 01das.t and printing out $db I get:
>>
>>
>>
>> $VAR1 = \bless( {
>>
>>                   'autotypes' => undef,
>>
>>                   'default_dsn' => undef,
>>
>>                   'autocategories' => undef,
>>
>>                   'sockets' => {},
>>
>>                   'aggregators' => [
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>
>> 'coding_exon'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' => 'CDS',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator'
>> ),
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>                                                                'EST_match'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' =>
>> 'alignment',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator' )
>>
>>                                    ],
>>
>>                   'timeout' => undef,
>>
>>                   'oldstyle_api' => 1,
>>
>>                   'default_server' =>
>> 'http://www.wormbase.org/db/seq/das'
>>
>>                 }, 'Bio::Das' );
>>
>>
>>
>>
>>
>> @sources is empty
>>
>> And test(3, at sources) fails.
>>
>>
>>
>> Please advise.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Bernd
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Mon Jun  8 17:12:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 8 Jun 2009 17:12:03 -0400
Subject: [Bioperl-l] fasta conversion
In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com>
References: <000e0cd6aa4cd53993046bdc1675@google.com>
Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife>

you're getting the sequence descriptor rather than the sequence in the return 
from
$in->next_seq. Read up on what the 'raw' format actually entails in the 
Bio::SeqIO pod..
cheers MAJ
----- Original Message ----- 
From: <lsbrath at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, June 08, 2009 4:28 PM
Subject: [Bioperl-l] fasta conversion


> Hello!
>
> I am running into trouble while trying to convert a text file to fasta. It 
> should be simple enough but I am getting a wierd error message.
>
> This is my script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use File::Copy;
> use Bio::SeqIO;
>
>
> my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
> my $maid = '13063';
>
> opendir my $dh, "$maid_dir"; # directory to search
> my @files = readdir $dh;
> #find the _fasta file
> for my $f (@files){
> my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
> my $r = $maid_dir."/".$maid."_hu_1kb.txt";
> open (my $in,$r);
> if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta
>
> print Dumper($f);
> my $hu_1kb = $maid.'_hu_1kb'; #file to convert
> my $in = Bio::SeqIO->new(-file => $r,
> -format => 'raw');
> my $out = Bio::SeqIO->new(-file => ">$fa",
> -format => 'Fasta');
> while ( my $seq = $in->next_seq()) {
> $out->write_seq($seq);
> }
> }
> }
>
> I keep getting the following error message:
>
> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 13063
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Attempting to set the sequence to [13063HU] which does not look healthy
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
> STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
> STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
> STACK: Bio::Seq::SeqFactory::create 
> C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
> STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
> -----------------------------------------------------------
>
> Anyone out there that can help me solve this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From stefan.kirov at bms.com  Mon Jun  8 17:26:17 2009
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Mon, 08 Jun 2009 17:26:17 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
	<47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
Message-ID: <4A2D81F9.8060509@bms.com>

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>                                                                'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bernd.jagla at pasteur.fr  Tue Jun  9 03:05:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Tue, 9 Jun 2009 09:05:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <4A2D81F9.8060509@bms.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
	<4A2D81F9.8060509@bms.com>
Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>

Great, that works!!!
But since I am using Bio::Das within GBrowse I can't/don't want to  change
those sources. I tried setting some environment variable but that doesn't
seem to work either...
So far I have the set the following:
FTP_PROXY=http://...
HTTP_PROXY=http://...
PROXYFTP=http://...
PROXYHTTP=http://...
ftp_proxy=http://...
http_proxy=http://...
PROXY=http://...

Any suggestions are welcome.

Thanks,

Bernd


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov
Sent: Monday, June 08, 2009 11:26 PM
To: bernd at pasteur.fr
Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as
the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Tue Jun  9 07:20:35 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 9 Jun 2009 12:20:35 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>

Hi,

I have been experimenting with the Bio::DB::EUtilities module, with  
help from the Cookbook. But I can't seem to figure out how to get the  
DNA sequence of a gene; all the examples seem to be fetching protein  
sequence.

How would i go about fetching a sequence using an Entrez GeneID?

thanks for any help

adam

From Kevin.M.Brown at asu.edu  Tue Jun  9 11:25:45 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 9 Jun 2009 08:25:45 -0700
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com>
	<19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Tue Jun  9 12:08:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 11:08:46 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>

All,

I've noticed a few methods in bioperl with names like 'no_Foo' that  
mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
problem I foresee are possible ambiguities, particularly with negative  
boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
Foo'), something that BioPerl also has with various settings.

I suggest we alias these as num_* to disambiguate that.  There's no  
easy way to change already in-place flag setting w/o going through a  
deprecation cycle, but we can promote using positive booleans where  
possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
the older 'no_*' methods as is for the time being and maybe deprecate  
them later.

If no one has objections I'll add these in as needed.

chris

From SMarkel at accelrys.com  Tue Jun  9 12:26:08 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 9 Jun 2009 12:26:08 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>

Chris,

I just checked our code for the Sequence Analysis Collection in
Pipeline Pilot.  We've got a few places we'd need to make code
changes, but we like your suggestion.  So, no objections from us.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, 09 June 2009 9:09 AM
> To: BioPerl List
> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
> 
> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
> problem I foresee are possible ambiguities, particularly with negative
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no
> easy way to change already in-place flag setting w/o going through a
> deprecation cycle, but we can promote using positive booleans where
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave
> the older 'no_*' methods as is for the time being and maybe deprecate
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jun  9 13:03:16 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 12:03:16 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
Message-ID: <A5461F02-AA81-4A02-88DA-181B33EE41FE@illinois.edu>

I don't think it would require code changes right away; for the time  
being no_* will just alias num_*.  We can probably have deprecation  
warnings activate when we reach a particular version.

chris

On Jun 9, 2009, at 11:26 AM, Scott Markel wrote:

> Chris,
>
> I just checked our code for the Sequence Analysis Collection in
> Pipeline Pilot.  We've got a few places we'd need to make code
> changes, but we like your suggestion.  So, no objections from us.
>
> Scott
>
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Tuesday, 09 June 2009 9:09 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative  
>> booleans
>>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
>> problem I foresee are possible ambiguities, particularly with  
>> negative
>> boolean checks (eg 'no_Foo' could also mean 'this instance contains  
>> no
>> Foo'), something that BioPerl also has with various settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no
>> easy way to change already in-place flag setting w/o going through a
>> deprecation cycle, but we can promote using positive booleans where
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
>> leave
>> the older 'no_*' methods as is for the time being and maybe deprecate
>> them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun  9 12:32:51 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 9 Jun 2009 12:32:51 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife>

+1, absolutely- MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 09, 2009 12:08 PM
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans


> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with negative  
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
> the older 'no_*' methods as is for the time being and maybe deprecate  
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From hlapp at gmx.net  Tue Jun  9 13:18:05 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 13:18:05 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>

Great suggestions, I'm all for it.

	-hilmar

On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:

> All,
>
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with  
> negative boolean checks (eg 'no_Foo' could also mean 'this instance  
> contains no Foo'), something that BioPerl also has with various  
> settings.
>
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
> leave the older 'no_*' methods as is for the time being and maybe  
> deprecate them later.
>
> If no one has objections I'll add these in as needed.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florent.angly at gmail.com  Tue Jun  9 14:41:51 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 09 Jun 2009 11:41:51 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
Message-ID: <4A2EACEF.3090809@gmail.com>

Agree! no_* is prone to misunderstandings.
Also, some BioPerl code uses nof_*, which I quite like.
Florent

Hilmar Lapp wrote:
> Great suggestions, I'm all for it.
>
>     -hilmar
>
> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>> problem I foresee are possible ambiguities, particularly with 
>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>> contains no Foo'), something that BioPerl also has with various 
>> settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no 
>> easy way to change already in-place flag setting w/o going through a 
>> deprecation cycle, but we can promote using positive booleans where 
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>> leave the older 'no_*' methods as is for the time being and maybe 
>> deprecate them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Jun  9 14:55:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 13:55:48 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EACEF.3090809@gmail.com>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
	<4A2EACEF.3090809@gmail.com>
Message-ID: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>

We could probably alias nof_* with num_* just for consistency, but  
leave nof_* as is and not deprecate it (I don't think anyone would  
confuse nof* with no*).

chris

On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:

> Agree! no_* is prone to misunderstandings.
> Also, some BioPerl code uses nof_*, which I quite like.
> Florent
>
> Hilmar Lapp wrote:
>> Great suggestions, I'm all for it.
>>
>>    -hilmar
>>
>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>
>>> All,
>>>
>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>> The problem I foresee are possible ambiguities, particularly with  
>>> negative boolean checks (eg 'no_Foo' could also mean 'this  
>>> instance contains no Foo'), something that BioPerl also has with  
>>> various settings.
>>>
>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>> no easy way to change already in-place flag setting w/o going  
>>> through a deprecation cycle, but we can promote using positive  
>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>> time being and maybe deprecate them later.
>>>
>>> If no one has objections I'll add these in as needed.
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mauricio at open-bio.org  Tue Jun  9 15:33:18 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Tue, 09 Jun 2009 14:33:18 -0500
Subject: [Bioperl-l] Project Help
In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
Message-ID: <4A2EB8FE.4080402@open-bio.org>

Hi Chirag,

The OBF applied for the GSoC 2009 but unfortunately we were not 
accepted. However, other organizations/projects made their way into it 
and have been kind enough to adopt some of the ideas originally proposed 
under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list 
so the people involved with those projects can give you more details.

Regards,
Mauricio.


chirag matkar wrote:
> Hello,
> THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation.
> Also as i am myself a perl developer can i can some stipend or internship for building perl modules?.
> 
> Thanking You,
> Regards Chirag.
> 
> 
>       Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/
> 

From rmb32 at cornell.edu  Tue Jun  9 15:12:54 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 12:12:54 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
Message-ID: <4A2EB436.8020506@cornell.edu>

Why not just add deprecation warnings now?  Or you could add deprecation 
warnings now that only print if $Bio::Root::Version::VERSION >= 
something.  Best to do it while one is thinking about it, I always say. 
  Cause I always forget to do it later.  ;-)

Rob

Chris Fields wrote:
> We could probably alias nof_* with num_* just for consistency, but leave 
> nof_* as is and not deprecate it (I don't think anyone would confuse 
> nof* with no*).
> 
> chris
> 
> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
> 
>> Agree! no_* is prone to misunderstandings.
>> Also, some BioPerl code uses nof_*, which I quite like.
>> Florent
>>
>> Hilmar Lapp wrote:
>>> Great suggestions, I'm all for it.
>>>
>>>    -hilmar
>>>
>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>
>>>> All,
>>>>
>>>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>>>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>>>> problem I foresee are possible ambiguities, particularly with 
>>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>>>> contains no Foo'), something that BioPerl also has with various 
>>>> settings.
>>>>
>>>> I suggest we alias these as num_* to disambiguate that.  There's no 
>>>> easy way to change already in-place flag setting w/o going through a 
>>>> deprecation cycle, but we can promote using positive booleans where 
>>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>>>> leave the older 'no_*' methods as is for the time being and maybe 
>>>> deprecate them later.
>>>>
>>>> If no one has objections I'll add these in as needed.
>>>>
>>>> chris
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu

From cjfields at illinois.edu  Tue Jun  9 16:19:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:19:03 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EB436.8020506@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
Message-ID: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>

On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:

> Why not just add deprecation warnings now?  Or you could add  
> deprecation warnings now that only print if  
> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
> is thinking about it, I always say.  Cause I always forget to do it  
> later.  ;-)
>
> Rob

Actually, that's one thing I want to implement within Root, namely the  
ability to do this:

$self->deprecated(-message     => 'method Foo is deprecated',
                   -start_ver   => $version1,
                   -throw_ver   => $version2
);

So it's essentially a noop and invisible up to start_ver (upon where  
it warns), then throws after, well, throw_ver.  I could probably  
finagle that in w/o destroying things...

chris

> Chris Fields wrote:
>> We could probably alias nof_* with num_* just for consistency, but  
>> leave nof_* as is and not deprecate it (I don't think anyone would  
>> confuse nof* with no*).
>> chris
>> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
>>> Agree! no_* is prone to misunderstandings.
>>> Also, some BioPerl code uses nof_*, which I quite like.
>>> Florent
>>>
>>> Hilmar Lapp wrote:
>>>> Great suggestions, I'm all for it.
>>>>
>>>>   -hilmar
>>>>
>>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>>>> The problem I foresee are possible ambiguities, particularly  
>>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this  
>>>>> instance contains no Foo'), something that BioPerl also has with  
>>>>> various settings.
>>>>>
>>>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>>>> no easy way to change already in-place flag setting w/o going  
>>>>> through a deprecation cycle, but we can promote using positive  
>>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>>>> time being and maybe deprecate them later.
>>>>>
>>>>> If no one has objections I'll add these in as needed.
>>>>>
>>>>> chris
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 16:45:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:45:37 -0500
Subject: [Bioperl-l] deprecated(), was Re:  use of no_* to mean 'number_of',
	negative booleans
In-Reply-To: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
Message-ID: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>

On Jun 9, 2009, at 3:19 PM, Chris Fields wrote:

> On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:
>
>> Why not just add deprecation warnings now?  Or you could add  
>> deprecation warnings now that only print if  
>> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
>> is thinking about it, I always say.  Cause I always forget to do it  
>> later.  ;-)
>>
>> Rob
>
> Actually, that's one thing I want to implement within Root, namely  
> the ability to do this:
>
> $self->deprecated(-message     => 'method Foo is deprecated',
>                  -start_ver   => $version1,
>                  -throw_ver   => $version2
> );
>
> So it's essentially a noop and invisible up to start_ver (upon where  
> it warns), then throws after, well, throw_ver.  I could probably  
> finagle that in w/o destroying things...
>
> chris

Just to note, this is mainly to allow us devs the opportunity to add  
these to main trunk w/o having to worry about merges over to the 1.6  
branch (where the version is different).  We don't want the dep  
warnings showing up there right away, but maybe in a point release or  
minor version.

chris

From hlapp at gmx.net  Tue Jun  9 19:09:26 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 19:09:26 -0400
Subject: [Bioperl-l] Project Help
In-Reply-To: <4A2EB8FE.4080402@open-bio.org>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
	<4A2EB8FE.4080402@open-bio.org>
Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net>

Hi Chirag,

check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted  
Projects') at

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

	-hilmar

On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote:

> Hi Chirag,
>
> The OBF applied for the GSoC 2009 but unfortunately we were not  
> accepted. However, other organizations/projects made their way into  
> it and have been kind enough to adopt some of the ideas originally  
> proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl  
> mailing list so the people involved with those projects can give you  
> more details.
>
> Regards,
> Mauricio.
>
>
> chirag matkar wrote:
>> Hello,
>> THis is Chirag Matkar wanting to know whether there were any GSOC  
>> 2009 projects underway in open Bioinformatics Foundation.
>> Also as i am myself a perl developer can i can some stipend or  
>> internship for building perl modules?.
>> Thanking You,
>> Regards Chirag.
>>      Explore and discover exciting holidays and getaways with  
>> Yahoo! India Travel http://in.travel.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rmb32 at cornell.edu  Tue Jun  9 21:13:36 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 18:13:36 -0700
Subject: [Bioperl-l] deprecated(),
 was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
Message-ID: <4A2F08C0.3010609@cornell.edu>

Chris Fields wrote:
>> Actually, that's one thing I want to implement within Root, namely the 
>> ability to do this:
>>
>> $self->deprecated(-message     => 'method Foo is deprecated',
>>                  -start_ver   => $version1,
>>                  -throw_ver   => $version2
>> );

Here's a patch with tests against the svn trunk head.  Is this what you 
had in mind?

-- 
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deprecated.patch
Type: text/x-diff
Size: 5601 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090609/431738da/attachment-0001.bin>

From cjfields at illinois.edu  Tue Jun  9 22:54:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 21:54:47 -0500
Subject: [Bioperl-l] deprecated(),
	was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2F08C0.3010609@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
	<4A2F08C0.3010609@cornell.edu>
Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu>

On Jun 9, 2009, at 8:13 PM, Robert Buels wrote:

> Chris Fields wrote:
>>> Actually, that's one thing I want to implement within Root, namely  
>>> the ability to do this:
>>>
>>> $self->deprecated(-message     => 'method Foo is deprecated',
>>>                 -start_ver   => $version1,
>>>                 -throw_ver   => $version2
>>> );
>
> Here's a patch with tests against the svn trunk head.  Is this what  
> you had in mind?
>
> -- 
> Rob

Funny, I had written up almost exactly the same code, just a little  
rearranged.  I've modified mine to follow your use of -warn_version (I  
also had -throw_version as a synonym of -version, JIC).  Also, for the  
tests I created a temp class in the tests and ran tests off that.   
Thanks for the patch!

chris

From maj at fortinbras.us  Wed Jun 10 00:10:12 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:10:12 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>

Hi All, 

I've built a public Amazon machine image, loaded with many many 
goodies, including the most recent (r15747) trunks of 
- bioperl-live
- bioperl-run
- bioperl-db/biosql
The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, 
emboss, and more are all there (and most even pass bioperl-run tests), and 
perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
(r1071) and others. This is *not* a lean mean fighting machine. 

Please give it a try if you're so inclined. Fuller details (including 
image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max.

Ping me if it doesn't work.

Cheers, 
Mark


From cjfields at illinois.edu  Wed Jun 10 00:36:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 23:36:40 -0500
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>

I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
do you have mysql or pg?

Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
rakudo and we could do some damage...

chris

On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jun 10 00:39:36 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:39:36 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife>

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
> do you have mysql or pg?

-both (I'm all about options...)


> 
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
> rakudo and we could do some damage...
> 

bioperl-max-0.1.1, here we come...


> chris
> 

cheers MAJ

> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
> 
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  
>> tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
>> .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>

From bernd.jagla at pasteur.fr  Wed Jun 10 03:43:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 09:43:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina>

I wrote a small test program to test the environment variables and I have
them:

          'SSH_CLIENT' => '157.
          'FTP_PROXY' => 'http://
          'HTTP_PROXY' => 'http://cache.past
          'SSH_TTY' => '/dev/ttys002',
          'ftp_proxy' => 'http://
          'http_proxy' => 'http://

Using the "-proxy" works, without it doesn't. 

(and yes, I export the variables..)

Thanks for any suggestions.

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.jagla at pasteur.fr  Wed Jun 10 04:16:08 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 10:16:08 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <F5844533CFCB425DA400C888A9995F70@zillumina>

To whom it may concern:

I added 
  $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'};

Around line 72 before:
  $self->proxy($proxy) if $proxy;

In Das.pm. This did the trick.

For completeness I also edited Fetch.pm:
Around line 134:
  $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'};
Before:
  my $dest = $proxy || $request->url;

Best,

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ron at ron.dk  Wed Jun 10 03:35:09 2009
From: ron at ron.dk (Rasmus Ory Nielsen)
Date: Wed, 10 Jun 2009 09:35:09 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase
	file.
Message-ID: <4A2F622D.5060500@ron.dk>

Hi,

This is my first time using bioperl for restriction analysis, so please bear 
with me, if this is a FAQ.

I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
script shown at the bottom of the mail.
My bioperl version is bioperl-live nightly from 09-Jun-2009.

The scripts throws an exception - see below. But, if I comment out the 
'-enzymes' argument, so it uses the built-in collection of enzymes, it works.

My problem is, that I need to use some of the enzymes that are only available 
in rebase. So how do I get this working?

Thanks for your attention.

Best regards,
Rasmus Ory Nielsen


############################################################
Output from the script:
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------

------------- EXCEPTION -------------
MSG: Bad end parameter (11). End must be less than the total length of 
sequence (total=7)
STACK Bio::PrimarySeq::subseq 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
STACK Bio::Restriction::Analysis::_enzyme_sites 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
STACK Bio::Restriction::Analysis::_cuts 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
STACK Bio::Restriction::Analysis::cut 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
STACK Bio::Restriction::Analysis::fragment_maps 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
STACK toplevel ./restriction_test.pl:30
-------------------------------------

[roni at ksdhcp ~]$


############################################################
Output from the script with the '-enzymes' argument commented out
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------
$VAR1 = [
           {
             'seq' => 'CTCGACCGTTAGCAA',
             'end' => 15,
             'start' => '1'
           },
           {
             'seq' => 'AGCTTTCTACCGTTATCGT',
             'end' => 34,
             'start' => '16'
           }
         ];
[roni at ksdhcp ~]$

############################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::PrimarySeq;
use Bio::Restriction::IO;
use Bio::Restriction::Analysis;
use Data::Dumper;

# create seq obj
my $seqobj = new Bio::PrimarySeq(
     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
     -primary_id => 'test',
     -molecule   => 'dna'
);

# read rebase file
my $rebase_io = Bio::Restriction::IO->new(
     -file   => 'withrefm.906',
     -format => 'withrefm',
);
my $rebase_collection = $rebase_io->read;

# start restriction analysis
my $restriction_analysis = Bio::Restriction::Analysis->new(
     -seq     => $seqobj,
     -enzymes => $rebase_collection,    # it works with this line commented out
);

# retrieve fragment maps
my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
print Dumper \@fragment_maps;

From awitney at sgul.ac.uk  Wed Jun 10 07:19:55 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 12:19:55 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>

Hi,

I am going through the EUtilities Cookbook, but the last example (in  
section 2.3.1) fails with:

Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.

This is with BioPerl 1.6.0, perl v5.8.8

thanks for any help

adam

From hlapp at gmx.net  Wed Jun 10 08:08:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 10 Jun 2009 08:08:54 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net>

Very cool! -hilmar

On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at illinois.edu  Wed Jun 10 08:28:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 07:28:44 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>

I can reproduce that; I'll look into it.

chris

On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:

> Hi,
>
> I am going through the EUtilities Cookbook, but the last example (in  
> section 2.3.1) fails with:
>
> Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
> site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>
> This is with BioPerl 1.6.0, perl v5.8.8
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:20:43 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:20:43 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>

EntrezGene doesn't contain the sequence information; I believe it just  
links to the sequence in a specified nuc record with given  
coordinates.  You can get to it, but it takes a little trickery; in  
essence you need to use the UID to get the gene summary information,  
extract that, then grab the sequence record using seqstart, seqend,  
and seqstrand.

A dump of esummary info for UID 18131, for instance, (using $eutil- 
 >print_all) gives this info (abbreviated somewhat):

UID                 :18131
Name                :Notch3
Description         :Notch gene homolog 3 (Drosophila)
Orgname             :Mus musculus
...
GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837
GeneWeight          :23049

The genomic info section gives the accession.version, start, end, and  
(implicitly) the strand (ChrStop is less that ChrStart). I have added  
an example to the cookbook:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

chris

On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:

> Hi,
>
> I have been experimenting with the Bio::DB::EUtilities module, with  
> help from the Cookbook. But I can't seem to figure out how to get  
> the DNA sequence of a gene; all the examples seem to be fetching  
> protein sequence.
>
> How would i go about fetching a sequence using an Entrez GeneID?
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:33:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:33:51 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
	<98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu>

Adam,

Okay, fixed that and the previous issue with 'use an undefined value  
as an ARRAY reference'.  The previous issue appears to be due to a  
change in the XML output from NCBI (it used to give the IDs at one  
point).  Also made the wiki changes for this; didn't take long to find  
everything.

Thanks for pointing that out!  If you find any more issues feel free  
to make the necessary changes on the wiki or point them out if they're  
in code.

chris

On Jun 10, 2009, at 8:12 AM, Adam Witney wrote:

> Hi Chris,
>
> not sure if I should start a new thread for this, but it is related  
> to the EUtilities Cookbook and LinkSet.pm.
>
> There are several references in the Cookbook to the method  
> "get_linkname", however this seems to have changed in the recent  
> version of LinkSet.pm to "get_link_name". But one reference to the  
> old method name still exists in LinkSet.pm, as shown by this patch:
>
> --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
> LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
> +++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
> @@ -220,7 +220,7 @@
> =cut
>
> sub get_link_name {
> -    return ($_[0]->get_linknames)[0];
> +    return ($_[0]->get_link_names)[0];
> }
>
> =head2 get_submitted_ids
>
> If i haven't got this all wrong entirely, I could go through and fix  
> the Cookbook entries if that was useful?
>
> adam
>
>
> On 10 Jun 2009, at 13:28, Chris Fields wrote:
>
>> I can reproduce that; I'll look into it.
>>
>> chris
>>
>> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I am going through the EUtilities Cookbook, but the last example  
>>> (in section 2.3.1) fails with:
>>>
>>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>>
>>> This is with BioPerl 1.6.0, perl v5.8.8
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From awitney at sgul.ac.uk  Wed Jun 10 09:12:05 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 14:12:05 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>


Hi Chris,

not sure if I should start a new thread for this, but it is related to  
the EUtilities Cookbook and LinkSet.pm.

There are several references in the Cookbook to the method  
"get_linkname", however this seems to have changed in the recent  
version of LinkSet.pm to "get_link_name". But one reference to the old  
method name still exists in LinkSet.pm, as shown by this patch:

--- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
+++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
@@ -220,7 +220,7 @@
  =cut

  sub get_link_name {
-    return ($_[0]->get_linknames)[0];
+    return ($_[0]->get_link_names)[0];
  }

  =head2 get_submitted_ids

If i haven't got this all wrong entirely, I could go through and fix  
the Cookbook entries if that was useful?

adam


On 10 Jun 2009, at 13:28, Chris Fields wrote:

> I can reproduce that; I'll look into it.
>
> chris
>
> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I am going through the EUtilities Cookbook, but the last example  
>> (in section 2.3.1) fails with:
>>
>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>
>> This is with BioPerl 1.6.0, perl v5.8.8
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Wed Jun 10 10:10:21 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 15:10:21 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
Message-ID: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>


Thanks for the pointers Chris.

The new example on the Cookbook doesn't quite work for me as ChrStart  
seems to appear in the DocSum twice, thus  
get_contents_by_name('ChrStart') returns a list of two values (which  
writes the second ChrStart into $end). Also the $start and $end seem  
to be out by 1, so I needed to change it to this:

my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
my ($start) = ($docsum->get_contents_by_name('ChrStart'));
my ($end) = ($docsum->get_contents_by_name('ChrStop'));

  $start += 1;
  $end += 1;

Ah, looking at this further there appears to be something going on in  
the response from Entrez. Compare these two gene records:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? 
db=gene&id=18131		(your example below)
http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
		(my gene)

In both cases you can see that ChrStart appears twice, once as part of  
the GenomicInfo list and once on its own at the bottom. In my example  
above the two ChrStart values match, but in the Notch3 example you  
posted the 2nd ChrStart seems to be the same as the ChrStop in the  
GenomicInfo list. Do you know if the second ChrStart has a separate  
meaning?

I guess in the Cookbook example we would need to make sure that the  
get_contents_by_name('ChrStart') picks up the value from the  
GenomicInfo list, is this possible?

thanks again

adam


On 10 Jun 2009, at 14:20, Chris Fields wrote:

> EntrezGene doesn't contain the sequence information; I believe it  
> just links to the sequence in a specified nuc record with given  
> coordinates.  You can get to it, but it takes a little trickery; in  
> essence you need to use the UID to get the gene summary information,  
> extract that, then grab the sequence record using seqstart, seqend,  
> and seqstrand.
>
> A dump of esummary info for UID 18131, for instance, (using $eutil- 
> >print_all) gives this info (abbreviated somewhat):
>
> UID                 :18131
> Name                :Notch3
> Description         :Notch gene homolog 3 (Drosophila)
> Orgname             :Mus musculus
> ...
> GenomicInfo
>    GenomicInfoType
>        ChrLoc      :17
>        ChrAccVer   :NC_000083.5
>        ChrStart    :32303796
>        ChrStop     :32257837
> GeneWeight          :23049
>
> The genomic info section gives the accession.version, start, end,  
> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
> added an example to the cookbook:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>
> chris
>
> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I have been experimenting with the Bio::DB::EUtilities module, with  
>> help from the Cookbook. But I can't seem to figure out how to get  
>> the DNA sequence of a gene; all the examples seem to be fetching  
>> protein sequence.
>>
>> How would i go about fetching a sequence using an Entrez GeneID?
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 13:56:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 12:56:46 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
	<B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
Message-ID: <CD8513A6-0872-4174-9333-94D76D5711F8@illinois.edu>

Adam,

That's really odd that they do that (both the duplication of ChrStart  
and the coordinates being off-by-one, which means they appear to be 0- 
based).  It's possible that the second ChrStart is meant to represent  
the actual first base for the gene irrespective of start/end.  My  
example is on the opposite strand, so the second ChrStart == end.

The fact that they use the same element name is slightly annoying (and  
seemingly redundant), but there is a workaround.  We grab only the  
layered information specifically; in this case we want everything  
below 'GenomicInfoType':

GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837

So, we can do this in the DocSum loop (that appears to work for your  
example):

############################

for my $docsum ($eutil->next_DocSum) {
     # to ensure we grab the right ChrStart information, we grab the  
Item above
     # it in the Item hierarchy (visible via print_all from the eutil  
instance)
     my ($item) = $docsum->get_Items_by_name('GenomicInfoType');

     my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop);

     while (my $sub_item = $item->next_subItem) {
         if (exists $item_data{$sub_item->get_name}) {
             $item_data{$sub_item->get_name} = $sub_item->get_content;
         }
     }
     # check to make sure everything is set
     for my $check (qw(ChrAccVer ChrStart ChrStop)) {
         die "$check not set" unless $item_data{$check};
     }

     my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1;
     $fetcher->set_parameters(-id => $item_data{ChrAccVer},
                              -seq_start => $item_data{ChrStart} + 1,
                              -seq_stop  => $item_data{ChrStop} + 1,
                              -strand    => $strand);
     print $fetcher->get_Response->content;
}

############################

That's to retain compatibility with 1.6; I'll update the wiki.  I can  
add some common Item container methods to grab information for any  
Items contained in the current instance (be it a DocSum or another  
Item).  I'll add that in bioperl-live.

chris

On Jun 10, 2009, at 9:10 AM, Adam Witney wrote:

> Thanks for the pointers Chris.
>
> The new example on the Cookbook doesn't quite work for me as  
> ChrStart seems to appear in the DocSum twice, thus  
> get_contents_by_name('ChrStart') returns a list of two values (which  
> writes the second ChrStart into $end). Also the $start and $end seem  
> to be out by 1, so I needed to change it to this:
>
> my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
> my ($start) = ($docsum->get_contents_by_name('ChrStart'));
> my ($end) = ($docsum->get_contents_by_name('ChrStop'));
>
> $start += 1;
> $end += 1;
>
> Ah, looking at this further there appears to be something going on  
> in the response from Entrez. Compare these two gene records:
>
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 
> 		(your example below)
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
> 		(my gene)
>
> In both cases you can see that ChrStart appears twice, once as part  
> of the GenomicInfo list and once on its own at the bottom. In my  
> example above the two ChrStart values match, but in the Notch3  
> example you posted the 2nd ChrStart seems to be the same as the  
> ChrStop in the GenomicInfo list. Do you know if the second ChrStart  
> has a separate meaning?
>
> I guess in the Cookbook example we would need to make sure that the  
> get_contents_by_name('ChrStart') picks up the value from the  
> GenomicInfo list, is this possible?
>
> thanks again
>
> adam
>
>
> On 10 Jun 2009, at 14:20, Chris Fields wrote:
>
>> EntrezGene doesn't contain the sequence information; I believe it  
>> just links to the sequence in a specified nuc record with given  
>> coordinates.  You can get to it, but it takes a little trickery; in  
>> essence you need to use the UID to get the gene summary  
>> information, extract that, then grab the sequence record using  
>> seqstart, seqend, and seqstrand.
>>
>> A dump of esummary info for UID 18131, for instance, (using $eutil- 
>> >print_all) gives this info (abbreviated somewhat):
>>
>> UID                 :18131
>> Name                :Notch3
>> Description         :Notch gene homolog 3 (Drosophila)
>> Orgname             :Mus musculus
>> ...
>> GenomicInfo
>>   GenomicInfoType
>>       ChrLoc      :17
>>       ChrAccVer   :NC_000083.5
>>       ChrStart    :32303796
>>       ChrStop     :32257837
>> GeneWeight          :23049
>>
>> The genomic info section gives the accession.version, start, end,  
>> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
>> added an example to the cookbook:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>>
>> chris
>>
>> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I have been experimenting with the Bio::DB::EUtilities module,  
>>> with help from the Cookbook. But I can't seem to figure out how to  
>>> get the DNA sequence of a gene; all the examples seem to be  
>>> fetching protein sequence.
>>>
>>> How would i go about fetching a sequence using an Entrez GeneID?
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 07:36:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 07:36:40 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife>

Hi Chris and list-
Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
I notice also that autogenerated documentation for bioperl-live doesn't contain
new modules (or HIVQuery & Tiling, anyway ;) )--
cheers, Mark

From maj at fortinbras.us  Thu Jun 11 09:17:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 09:17:25 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>

Rasmus et al-

This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles 
through
all enzymes apparently creating a global cut map). AarI has a recognition 
sequence of

CACCTGC (in $enz->seq->seq)

but a cut site of

CACCTGCNNNN^ (in $enz->seq->site)

The bad parm '11' refers to the end of the cut site sequence, but the routine
B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence,
and so throws.

This surprises me. Core, let me know if you want me to take this on, or
if the module author can fix it quicker.

cheers,
Mark

----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 11 10:19:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 11 Jun 2009 09:19:51 -0500
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>

Mark,

Feel free to take it up.  It's probably a good idea to start a bug  
report for tracking if it proves to be thornier to fix than expected.

chris

On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:

> Rasmus et al-
>
> This looks like a bug. A quick debug shows it's barfing on  
> 'AarI' (as it cycles through
> all enzymes apparently creating a global cut map). AarI has a  
> recognition sequence of
>
> CACCTGC (in $enz->seq->seq)
>
> but a cut site of
>
> CACCTGCNNNN^ (in $enz->seq->site)
>
> The bad parm '11' refers to the end of the cut site sequence, but  
> the routine
> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition  
> sequence,
> and so throws.
>
> This surprises me. Core, let me know if you want me to take this on,  
> or
> if the module author can fix it quicker.
>
> cheers,
> Mark
>
> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
> using rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so  
>> please bear with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>> created the script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out  
>> the '-enzymes' argument, so it uses the built-in collection of  
>> enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only  
>> available in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length  
>> of sequence (total=7)
>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>> Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>          {
>>            'seq' => 'CTCGACCGTTAGCAA',
>>            'end' => 15,
>>            'start' => '1'
>>          },
>>          {
>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>            'end' => 34,
>>            'start' => '16'
>>          }
>>        ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>    -primary_id => 'test',
>>    -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>    -file   => 'withrefm.906',
>>    -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>    -seq     => $seqobj,
>>    -enzymes => $rebase_collection,    # it works with this line  
>> commented out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 10:26:19 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 10:26:19 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <CD6C392C39CD4287B3619FCDBC1D19CF@NewLife>

All-righty-- thanks MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From mauricio at open-bio.org  Thu Jun 11 12:46:35 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 11 Jun 2009 11:46:35 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
Message-ID: <4A3134EB.4080702@open-bio.org>

Hi Mark,

I'll take a look into this sometime between today and tomorrow. Will 
keep you posted. Thanks for the heads up :)

Mauricio.


Mark A. Jensen wrote:
> Hi Chris and list-
> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
> I notice also that autogenerated documentation for bioperl-live doesn't contain
> new modules (or HIVQuery & Tiling, anyway ;) )--
> cheers, Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From maj at fortinbras.us  Thu Jun 11 14:41:26 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 14:41:26 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3134EB.4080702@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
Message-ID: <A53006055C854297AAA58F6650F4F867@NewLife>

cheers Mauricio! MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Thursday, June 11, 2009 12:46 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Hi Mark,
>
> I'll take a look into this sometime between today and tomorrow. Will keep you 
> posted. Thanks for the heads up :)
>
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> Hi Chris and list-
>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>> I notice also that autogenerated documentation for bioperl-live doesn't 
>> contain
>> new modules (or HIVQuery & Tiling, anyway ;) )--
>> cheers, Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> 


From Xianjun.Dong at bccs.uib.no  Fri Jun 12 16:38:50 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Fri, 12 Jun 2009 22:38:50 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for
	Bio::Graphics::Glyph
Message-ID: <4A32BCDA.4080605@ii.uib.no>

HI,

I am not sure this is the right place I can get help.

I've suffered by a problem for several days: I want to highlight parts 
of regions in my track, using a different background color. To do that, 
I defined a glyph named "background", based on the 
'Bio::Graphics::Glyph::generic' module. I override the draw_component() 
method, by adding code like below:

$gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));

# the script is pasted at the end

This will draw a rectangle with top=0, bottom=$gd->height. I made the 
highlight regions into a list of features, and add_track with 
-glyph=>'background'. (see the following script, test.pl) This really 
works as I expect, which will add a colored block at background of all 
tracks in a panel (including the ruler arrow). You can see the output 
image in attached file "test.bioperl1.2.3.png"

Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does 
not work. Well, it works, but the highlight part only shrink to a low 
height, instead of covering all tracks in the panel. I also attached the 
output here, see the file "test.bioperl1.6.png".

I tried to think about the reason, the 'background' module is based on 
the generic module. What can cause the difference? Is it because 
$gd->height is different, or the tracks followed with 'background' track 
can not draw from the first position?

Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
person solve problem, wise person avoid problem"...) But another problem 
is coming: Bio::Graphics in Bioperl 1.2.3 does not support 
$panel->create_web_map() function, which means I have to use some higher 
version if I want to create web map for my graphics, but then I have to 
give up using highlight background.

OK. It's long enough for my first-time submission here. Hope someone can 
throw me some clue.

Thanks ahead!!

Xianjun


==================== test.pl =======================
#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12);

# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
$panel->add_track([$trans41,$trans31],
          -glyph   => 'background',
                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();

1;

==================== background.pm =======================
package Bio::Graphics::Glyph::background;
 
use strict;
use base 'Bio::Graphics::Glyph::generic';
sub pad_top{
  return 0;
}

sub draw_component {
  my $self = shift;
  #$self->SUPER::draw_component(@_);
  my ($gd,$dx,$dy) = @_;
  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
 
  # draw an arrow to indicate the direction of transcript
  my $color = $self->option('block_bgcolor') || '#cccccc';
  $gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));
}
 
1;

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0001.png>

From scott at scottcain.net  Fri Jun 12 21:29:09 2009
From: scott at scottcain.net (Scott Cain)
Date: Fri, 12 Jun 2009 21:29:09 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A32BCDA.4080605@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>

Hello Xianjun,

I don't think that approach will work.  What you almost certainly need
to do is a postgrid callback that does the drawing of the highlighted
region.  For example code of how to do this, take a look at the
make_postgrid_callback subroutine in GBrowse 1.69.  The option
-postgrid is a method of Bio::Graphics::Panel.

Scott


On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
> ? ? ? ? -glyph ? => 'background',
> ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
> ? ? ? ? ? ? ? ? );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
> ? ? ? ? ? ? ? ? -glyph=>'arrow',
> ? ? ? ? ? ? ? ? -double=>1,
> ? ? ? ? ? ? ? ? -tick=>2);
>
> $panel->add_track($trans,
> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -title => '$source',
> ? ? ? ? ? ? ? ? -link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
> ? ? ? ? ? ? ? ? );
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Jun 13 09:27:39 2009
From: scott at scottcain.net (Scott Cain)
Date: Sat, 13 Jun 2009 09:27:39 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A339621.2060702@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>

Hi Xianjun,

I understand what you want to do, as the current version of gbrowse
does this, which uses bioperl 1.6.  Without digging through the code,
I can't tell you exactly how this works and you didn't send your code
that uses this callback, so I can't try it either.

One thing that is different between your code and gbrowse is that each
of the tracks is actually a seperate panel (to allow track dragging),
so it possible that this sort of callback doesn't work for
Bio::Graphics any more.

Scott

On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott
>
> Thanks for your reply first.
>
> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>
> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>
> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>
> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>
> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>
> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
> test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>
> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>
> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>
> Thanks
>
> Xianjun
> =============================================
>
> # this generates the callback for highlighting a region
> sub make_postgrid_callback {
> ?my $settings = shift;
> ?return unless ref $settings->{h_region};
>
> ?my @h_regions = map {
>  ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>  ? defined($h_ref) && $h_ref eq $settings->{ref}
>  ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>  ? ? ? ? ? ? ? ?: ()
> ?}
>  ? @{$settings->{h_region}};
>
> ?return unless @h_regions;
> ?return hilite_regions_closure(@h_regions);
> }
>
> # this subroutine generates a Bio::Graphics::Panel callback closure
> # suitable for hilighting a region of a panel.
> # The args are a list of [start,end,color]
> sub hilite_regions_closure {
> ?my @h_regions = @_;
>
> ?return sub {
>  ? my $gd ? ? = shift;
>  ? my $panel ?= shift;
>  ? my $left ? = $panel->pad_left;
>  ? my $top ? ?= $panel->top;
>  ? my $bottom = $panel->bottom;
>  ? for my $r (@h_regions) {
>  ? ? my ($h_start,$h_end,$h_color) = @$r;
>  ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>  ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>  ? ? # assuming top is 0 so as to ignore top padding
>  ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>  ? }
> ?};
> }
>
>
> Scott Cain wrote:
>
> Hello Xianjun,
>
> I don't think that approach will work. ?What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region. ?For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>
>
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
>  ? ? ? ?-glyph ? => 'background',
>  ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
>  ? ? ? ? ? ? ? ?);
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>  ? ? ? ? ? ? ? ?-glyph=>'arrow',
>  ? ? ? ? ? ? ? ?-double=>1,
>  ? ? ? ? ? ? ? ?-tick=>2);
>
> $panel->add_track($trans,
>  ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>  ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-title => '$source',
>  ? ? ? ? ? ? ? ?-link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>  ? ? ? ? ? ? ? ?);
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 12:48:16 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 18:48:16 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
Message-ID: <4A33D850.1020203@ii.uib.no>

Hi, Scott

Before I gave up my own whole solution to use GBrowse, I still want to 
bother you once:

As you suggested, I put -postgrid option when the panel, which will call 
a function to draw the background. The code below is almost copied from 
the online POD of Bio::Graphics::Panel (see 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
)

But it still does not work. Could you help to have a look? I paste it 
below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the 
gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)

  my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment,
                                        -grid=>1,
                                        -width=>600,
                                        -postgrid=> \&draw_gap);
  sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $panel->bottom;
     my $gray                 = $panel->translate_color('gray');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}

THanks

Xianjun

-----------------------------------------------

#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12
                                             -postgrid=>\&gap_it);

sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $gd->height, #panel->bottom;
     my $gray                 = $panel->translate_color('red');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}
# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
#$panel->add_track([$trans41,$trans31],
#          -glyph   => 'background',
#                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
#                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();


Scott Cain wrote:
> Hi Xianjun,
>
> I understand what you want to do, as the current version of gbrowse
> does this, which uses bioperl 1.6.  Without digging through the code,
> I can't tell you exactly how this works and you didn't send your code
> that uses this callback, so I can't try it either.
>
> One thing that is different between your code and gbrowse is that each
> of the tracks is actually a seperate panel (to allow track dragging),
> so it possible that this sort of callback doesn't work for
> Bio::Graphics any more.
>
> Scott
>
> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
>   
>> Hi, Scott
>>
>> Thanks for your reply first.
>>
>> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>>
>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>
>> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>
>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>
>> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>>
>> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>> test.bioperl1.2.3.png:    http://translog.genereg.net/test.bioperl1.2.3.png ]
>>
>> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>>
>> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>>
>> Thanks
>>
>> Xianjun
>> =============================================
>>
>> # this generates the callback for highlighting a region
>> sub make_postgrid_callback {
>>  my $settings = shift;
>>  return unless ref $settings->{h_region};
>>
>>  my @h_regions = map {
>>    my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>                 : ()
>>  }
>>    @{$settings->{h_region}};
>>
>>  return unless @h_regions;
>>  return hilite_regions_closure(@h_regions);
>> }
>>
>> # this subroutine generates a Bio::Graphics::Panel callback closure
>> # suitable for hilighting a region of a panel.
>> # The args are a list of [start,end,color]
>> sub hilite_regions_closure {
>>  my @h_regions = @_;
>>
>>  return sub {
>>    my $gd     = shift;
>>    my $panel  = shift;
>>    my $left   = $panel->pad_left;
>>    my $top    = $panel->top;
>>    my $bottom = $panel->bottom;
>>    for my $r (@h_regions) {
>>      my ($h_start,$h_end,$h_color) = @$r;
>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>>      # assuming top is 0 so as to ignore top padding
>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>    }
>>  };
>> }
>>
>>
>> Scott Cain wrote:
>>
>> Hello Xianjun,
>>
>> I don't think that approach will work.  What you almost certainly need
>> to do is a postgrid callback that does the drawing of the highlighted
>> region.  For example code of how to do this, take a look at the
>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>> -postgrid is a method of Bio::Graphics::Panel.
>>
>> Scott
>>
>>
>>
>>
>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>
>>
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>>     
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From maj at fortinbras.us  Sun Jun 14 00:35:18 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 00:35:18 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when
	usingrebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <A9819F7FF3894C768CF89C36CB689942@NewLife>

All-

I'm finding this is requiring a pretty substantial refactor and
rationalization. I have opened a branch at
REPOS/bioperl-live/branches/restriction-refactor
and am making commits at will there (won't Rob be pleased!).
When it appears to be passing tests, I'll let Chris know (on list),
and he can decide on its mergability, and brave users could try
it out by downloading Bio/Restriction (deeply) via subversion.

My running commentary is at Bug #2855.
MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when 
usingrebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Sun Jun 14 21:57:45 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 14 Jun 2009 18:57:45 -0700
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception
	when	usingrebasefile.
In-Reply-To: <A9819F7FF3894C768CF89C36CB689942@NewLife>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
	<A9819F7FF3894C768CF89C36CB689942@NewLife>
Message-ID: <4A35AA99.2080305@cornell.edu>

Mark A. Jensen wrote:
> I'm finding this is requiring a pretty substantial refactor and
> rationalization. I have opened a branch at
> REPOS/bioperl-live/branches/restriction-refactor
> and am making commits at will there (won't Rob be pleased!).
Oh Mark, you are so agile!

> When it appears to be passing tests, I'll let Chris know (on list),
> and he can decide on its mergability, and brave users could try
> it out by downloading Bio/Restriction (deeply) via subversion.
If it's passing tests but still has bugs, make sure you add tests for 
the additional bugs you find!

Rob

From maj at fortinbras.us  Sun Jun 14 22:02:37 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 22:02:37 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis.
	Exceptionwhen	usingrebasefile.
In-Reply-To: <4A35AA99.2080305@cornell.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu><A9819F7FF3894C768CF89C36CB689942@NewLife>
	<4A35AA99.2080305@cornell.edu>
Message-ID: <FFDC29BB104149BE95840F1AD1B61827@NewLife>


----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Sunday, June 14, 2009 9:57 PM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen 
usingrebasefile.


> Mark A. Jensen wrote:
>> I'm finding this is requiring a pretty substantial refactor and
>> rationalization. I have opened a branch at
>> REPOS/bioperl-live/branches/restriction-refactor
>> and am making commits at will there (won't Rob be pleased!).
> Oh Mark, you are so agile!
ha!
>
>> When it appears to be passing tests, I'll let Chris know (on list),
>> and he can decide on its mergability, and brave users could try
>> it out by downloading Bio/Restriction (deeply) via subversion.
> If it's passing tests but still has bugs, make sure you add tests for the 
> additional bugs you find!

mais, bien sur; plenty new tests coming-- thanks Rob-
MAJ

>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From shalabh.sharma7 at gmail.com  Mon Jun 15 16:06:31 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 15 Jun 2009 16:06:31 -0400
Subject: [Bioperl-l] sub sampling
Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>

Hi All,           I was just wondering that is there any module is bioperl
that do subsampling?
I have a file like this:

369859  0477    93
163417  1348    92
228122  0176    88
232792  0050    93
239636  1850    95
300069  0048    96
244108  0046    91
199087  0055    93
206209  0048    96
-              -         -
-              -         -

which contain around 100,000 lines and i want to take out a sample of 25%
from this file. Is there any way i can do this in Bioperl?

Thanks
Shalabh

From maj at fortinbras.us  Mon Jun 15 19:49:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 19:49:58 -0400
Subject: [Bioperl-l] Bio::Restriction refactor [Was:
	Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>

Dear All,

The revamped Bio::Restriction::* in branch

REPOS/bioperl-live/branches/restriction-refactor

passes all existing tests, including those in t/Restriction.
New tests will be added within the next day or so.
The original bug occurred because only a subset of
the possible rebase withrefm-formatted enzymes were
handled; it choked on freshly-downloaded rebase
files because of this.

The refactored version now handles *all* rebase types,
including those of rebase forms

XXX^X                [ intrasite cutters, the main types
                               built in to base.pm]
XXXX(m/n)          [ right-end extrasite cutters ]
(s/t)XXXX            [ left-end ditto ]
(s/t)XXXX(m/n)    [ double-end ditto],

palindromic and non-palindromic, as well as multisite
enzymes that string together combinations of these
forms. Much rationalization (well, seems rational to me
anyway) and cruft removal in the affected code has also
occurred. itype2.pm has been updated as well, to
conform to the refactoring.

If you're dying to try this now, get a working copy
of the branch like so

$ svn co 
svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
bioperl-rr
$ cd bioperl-rr
$ perl Build.PL
$ ./Build test
$ ./Build install

This will only hammer your current installation in the
$SITE_LIB/Bio/Restriction path; I worked only on
a sparse checkout of the necessary files. To revert to your
old install, do

$ cd $MY_OLD_BIOPERL_WORKINGDIR
$ ./Build install

[In the possible event that these instructions are in error,
there will be a response on this list in a matter of
milliseconds, so stand by.]

Happy coding-
Mark


----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 15 20:07:21 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 20:07:21 -0400
Subject: [Bioperl-l] sub sampling
In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
Message-ID: <A030148C139446DAB1DEE791A4EC2D3B@NewLife>

Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):

 # open your file into $my_infile, then
 @lines = <$my_infile>;

 my $num_samps = 10;
 my $sample_size_pc = 0.25;
 my @samples;

 for (1..$num_samps) {
    push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * 
@lines) ) ];
 }

# now, do something, fr'instance
 my @sample_pc;
 foreach (@samples) {
    my $pct=0;
    foreach my $line (@lines[ @$_ ]) {
        @a = split(/\s+/,$line);
        $pct += $a[2];
    }
    $pct /= @$_;
    push @sample_pc, $pct;
 }

R's just better for some things, ain't it?
MAJ


----- Original Message ----- 
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling


> Hi All,           I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859  0477    93
> 163417  1348    92
> 228122  0176    88
> 232792  0050    93
> 239636  1850    95
> 300069  0048    96
> 244108  0046    91
> 199087  0055    93
> 206209  0048    96
> -              -         -
> -              -         -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 08:05:53 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 14:05:53 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
Message-ID: <4A339621.2060702@ii.uib.no>

Hi, Scott

Thanks for your reply first.

I still have question: I dig out the code from GBrowse (which I paste 
below). Method make_postgrid_callback gets all highlight region and then 
use hilite_regions_closure function to draw them out, using the 
following GD function:

$gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));

where the $bottom=$panel->bottom. This is the only difference from my 
code, where I use $gd->height. I guess they are almost same (except the 
pad_bottom), we can see this in the code of 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22

OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for 
my highlight regions. The output is same, when using the library of 
Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")

OK. I might have not explained my question explicitly. My question is: 
if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I 
can get the right image I want (see the attached file 
"test.bioperl1.2.3.png"), where the highlight range will go from the 
roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
highlight region in its own track, not the whole panel. OK, did I 
explain clearly now? you can see the difference of the two images.

[I am not sure the mailist allow to attach image, otherwise, I put them 
in the following links:
test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
test.bioperl1.2.3.png:    
http://translog.genereg.net/test.bioperl1.2.3.png ]

You can test it and see the difference if you have both 1.2.3 and 1.6 on 
your computer?

Really want to know how this works in bioperl 1.2.3 (Even though this 
might be a bug at that version, or whatever)

Thanks

Xianjun
=============================================

# this generates the callback for highlighting a region
sub make_postgrid_callback {
  my $settings = shift;
  return unless ref $settings->{h_region};

  my @h_regions = map {
    my ($h_ref,$h_start,$h_end,$h_color) = 
/^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
    defined($h_ref) && $h_ref eq $settings->{ref}
                 ? [$h_start,$h_end,$h_color||'lightgrey']
                 : ()
  }
    @{$settings->{h_region}};

  return unless @h_regions;
  return hilite_regions_closure(@h_regions);
}

# this subroutine generates a Bio::Graphics::Panel callback closure
# suitable for hilighting a region of a panel.
# The args are a list of [start,end,color]
sub hilite_regions_closure {
  my @h_regions = @_;

  return sub {
    my $gd     = shift;
    my $panel  = shift;
    my $left   = $panel->pad_left;
    my $top    = $panel->top;
    my $bottom = $panel->bottom;
    for my $r (@h_regions) {
      my ($h_start,$h_end,$h_color) = @$r;
      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
      if ($end-$start <= 1) { $end++; $start-- } # so that we always see 
something
      # assuming top is 0 so as to ignore top padding
      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));
    }
  };
}


Scott Cain wrote:
> Hello Xianjun,
>
> I don't think that approach will work.  What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region.  For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69.  The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>   
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0001.png>

From malcolm.cook at gmail.com  Tue Jun 16 04:06:36 2009
From: malcolm.cook at gmail.com (Malcolm Cook)
Date: Tue, 16 Jun 2009 03:06:36 -0500
Subject: [Bioperl-l]  Alignment->slice() issue?
Message-ID: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>

Kevin,

I'm getting struck by this old issue you once coded around.

      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html

Any chance you could share your implementation with  fellow traveller...

??

Thanks,

Malcolm Cook
Stowers insitute for Medical research

From remi.planel at free.fr  Tue Jun 16 10:57:27 2009
From: remi.planel at free.fr (Remi Planel)
Date: Tue, 16 Jun 2009 16:57:27 +0200
Subject: [Bioperl-l] Hits Object
Message-ID: <4A37B2D7.70807@free.fr>

Hi all,

I couldn't find out from a Bio::Search::Result::ResultI object (obtain 
after parsing a blast report) a way to filter some of the hsps associated ?
By filter I mean eliminate for each hit some hsps I'm not interested in ?

Can I modify directly the Result object ?

Thanks,


From lsbrath at gmail.com  Tue Jun 16 11:42:37 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Tue, 16 Jun 2009 11:42:37 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
	undefined value
Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

sub hu_bl2seq_parser{
	my ($maid, $maid_dir) = @_;
	# Get the report
	my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
						   -report_type => 'blastn');
	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
	my $result=$in->next_result;
	my($hu_aln,$hu_mismatches);
	# Get info about the first hit
	my $hit = $result->next_hit;
	my $name = $hit->name;
	# get info about the first hsp of the first hit
	my $hsp = $hit->next_hsp;
	# get the alignment object
	my $aln = $hsp->get_aln;
	#my $percent_id = $hsp->percent_identity;
	#my $aln_length = $hsp->length('total');
	my @mismatches = $hsp->seq_inds('query','nomatch');
	my $aln_str="";
	# access the alignment string
	my $strIO=IO::String->new($aln_str);
	#  write the string alignio in clustalw format
	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
	# now the actual alignment string is accessable for printing or in
this case moving to a db table
	$alnio->write_aln($aln);
	$hu_aln=$aln_str;
	$hu_mismatches = scalar @mismatches;
	return($hu_aln, $hu_mismatches);
}

The problem is at "my $hit = $result->next_hit;"
Any help will be appreciated.
LomSpace

From cjfields at illinois.edu  Tue Jun 16 14:14:18 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:14:18 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu>

I'll check out the branch sometime today and run tests on it.  Thanks  
for the hard work Mark!

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From maj at fortinbras.us  Tue Jun 16 13:58:56 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:58:56 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>

Dear All,

There are tests for the new functionality of Bio::Restriction
now in t/Restriction on the branch, along with the withrefm.906
in t/data that revealed the bug in RON's post. All tests pass without
warnings on my machine (which is bioperl live, perl 5.10.10,
under Vista/cygwin - yes, I still don't have a real computer).
We're ready for a merge on my end.

Thanks all for your silent assent to these machinations.
cheers
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 16 13:51:14 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:51:14 -0400
Subject: [Bioperl-l] Hits Object
In-Reply-To: <4A37B2D7.70807@free.fr>
Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife>

Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe
http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will 
be answered there-
cheers, Mark

From cjfields at illinois.edu  Tue Jun 16 14:31:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:31:10 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>

Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  
merge.

Also (as mentioned some time back w/ Hilmar among others), we can  
probably delete this branch seeing as the code will be merged to trunk  
(it being a feature branch and all).  Worth doing the same for a few  
other feature branches as well.

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Tue Jun 16 15:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 14:07:44 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>

Sounds to me like a BioPerl bug.  Do you have some example data  
demonstrating the problem?

chris

On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:

> Kevin,
>
> I'm getting struck by this old issue you once coded around.
>
>      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>
> Any chance you could share your implementation with  fellow  
> traveller...
>
> ??
>
> Thanks,
>
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun 16 15:32:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 15:32:02 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on
	andundefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife>

lomspace-
this

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

should be

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

if you're reading the file. Then $result will have something in it when
you do $in->next_result

cheers, MAJ
----- Original Message ----- 
From: "Mgavi Brathwaite" <lsbrath at gmail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 16, 2009 11:42 AM
Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined 
value


> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> sub hu_bl2seq_parser{
> my ($maid, $maid_dir) = @_;
> # Get the report
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
>    -report_type => 'blastn');
> #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");
> #my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> my $result=$in->next_result;
> my($hu_aln,$hu_mismatches);
> # Get info about the first hit
> my $hit = $result->next_hit;
> my $name = $hit->name;
> # get info about the first hsp of the first hit
> my $hsp = $hit->next_hsp;
> # get the alignment object
> my $aln = $hsp->get_aln;
> #my $percent_id = $hsp->percent_identity;
> #my $aln_length = $hsp->length('total');
> my @mismatches = $hsp->seq_inds('query','nomatch');
> my $aln_str="";
> # access the alignment string
> my $strIO=IO::String->new($aln_str);
> #  write the string alignio in clustalw format
> my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> # now the actual alignment string is accessable for printing or in
> this case moving to a db table
> $alnio->write_aln($aln);
> $hu_aln=$aln_str;
> $hu_mismatches = scalar @mismatches;
> return($hu_aln, $hu_mismatches);
> }
>
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Tue Jun 16 15:46:40 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 16 Jun 2009 12:46:40 -0700
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
 undefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <4A37F6A0.1080907@cornell.edu>

Mgavi Brathwaite wrote:
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.

Your proximate problem seems to be that you are prepending a '>' to the 
filename in your invocation of Bio::SearchIO::new, which I think might 
cause it to write to the file instead of reading from it.  But also, you 
probably want to use next_result and next_hit in while loops, since they 
return undef when there are no more hits or hsps to parse.  This is what 
is causing your "can't call next_hit on undefined value" error. 
next_result() returns undef when there are no results to parse.

by while loops, I mean something like:

while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
      # insert the rest of your operations here
      }
}

Hope this helps.

Rob

> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
> 
> sub hu_bl2seq_parser{
> 	my ($maid, $maid_dir) = @_;
> 	# Get the report
> 	my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
> 						   -report_type => 'blastn');
> 	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
> 	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> 	my $result=$in->next_result;
> 	my($hu_aln,$hu_mismatches);
> 	# Get info about the first hit
> 	my $hit = $result->next_hit;
> 	my $name = $hit->name;
> 	# get info about the first hsp of the first hit
> 	my $hsp = $hit->next_hsp;
> 	# get the alignment object
> 	my $aln = $hsp->get_aln;
> 	#my $percent_id = $hsp->percent_identity;
> 	#my $aln_length = $hsp->length('total');
> 	my @mismatches = $hsp->seq_inds('query','nomatch');
> 	my $aln_str="";
> 	# access the alignment string
> 	my $strIO=IO::String->new($aln_str);
> 	#  write the string alignio in clustalw format
> 	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> 	# now the actual alignment string is accessable for printing or in
> this case moving to a db table
> 	$alnio->write_aln($aln);
> 	$hu_aln=$aln_str;
> 	$hu_mismatches = scalar @mismatches;
> 	return($hu_aln, $hu_mismatches);
> }
> 
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From maj at fortinbras.us  Tue Jun 16 16:10:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 16:10:34 -0400
Subject: [Bioperl-l] Bio::Restriction
	refactor[Was:Bio::Restriction::Analysis. Exception when using
	rebasefile.]
In-Reply-To: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
References: <4A2F622D.5060500@ron.dk><E80E6C1BC08D4E338739148BFE9BFAC0@NewLife><D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
	<A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife>

Right; will remove branch. Will go ahead with merge at 21:20 UTC.
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Tuesday, June 16, 2009 2:31 PM
Subject: Re: [Bioperl-l] Bio::Restriction 
refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]


> Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  merge.
>
> Also (as mentioned some time back w/ Hilmar among others), we can  probably 
> delete this branch seeing as the code will be merged to trunk  (it being a 
> feature branch and all).  Worth doing the same for a few  other feature 
> branches as well.
>
> chris
>
> On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:
>
>> Dear All,
>>
>> There are tests for the new functionality of Bio::Restriction
>> now in t/Restriction on the branch, along with the withrefm.906
>> in t/data that revealed the bug in RON's post. All tests pass without
>> warnings on my machine (which is bioperl live, perl 5.10.10,
>> under Vista/cygwin - yes, I still don't have a real computer).
>> We're ready for a merge on my end.
>>
>> Thanks all for your silent assent to these machinations.
>> cheers
>> Mark
>>
>> ----- Original Message ----- From: "Mark A. Jensen"  <maj at fortinbras.us>
>> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
>> Sent: Monday, June 15, 2009 7:49 PM
>> Subject: [Bioperl-l] Bio::Restriction refactor 
>> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>>
>>
>>> Dear All,
>>>
>>> The revamped Bio::Restriction::* in branch
>>>
>>> REPOS/bioperl-live/branches/restriction-refactor
>>>
>>> passes all existing tests, including those in t/Restriction.
>>> New tests will be added within the next day or so.
>>> The original bug occurred because only a subset of
>>> the possible rebase withrefm-formatted enzymes were
>>> handled; it choked on freshly-downloaded rebase
>>> files because of this.
>>>
>>> The refactored version now handles *all* rebase types,
>>> including those of rebase forms
>>>
>>> XXX^X                [ intrasite cutters, the main types
>>>                              built in to base.pm]
>>> XXXX(m/n)          [ right-end extrasite cutters ]
>>> (s/t)XXXX            [ left-end ditto ]
>>> (s/t)XXXX(m/n)    [ double-end ditto],
>>>
>>> palindromic and non-palindromic, as well as multisite
>>> enzymes that string together combinations of these
>>> forms. Much rationalization (well, seems rational to me
>>> anyway) and cruft removal in the affected code has also
>>> occurred. itype2.pm has been updated as well, to
>>> conform to the refactoring.
>>>
>>> If you're dying to try this now, get a working copy
>>> of the branch like so
>>>
>>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>>> restriction-refactor bioperl-rr
>>> $ cd bioperl-rr
>>> $ perl Build.PL
>>> $ ./Build test
>>> $ ./Build install
>>>
>>> This will only hammer your current installation in the
>>> $SITE_LIB/Bio/Restriction path; I worked only on
>>> a sparse checkout of the necessary files. To revert to your
>>> old install, do
>>>
>>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>>> $ ./Build install
>>>
>>> [In the possible event that these instructions are in error,
>>> there will be a response on this list in a matter of
>>> milliseconds, so stand by.]
>>>
>>> Happy coding-
>>> Mark
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, June 10, 2009 3:35 AM
>>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>>> rebasefile.
>>>
>>>
>>>> Hi,
>>>>
>>>> This is my first time using bioperl for restriction analysis, so  please 
>>>> bear with me, if this is a FAQ.
>>>>
>>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>>> the script shown at the bottom of the mail.
>>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>>
>>>> The scripts throws an exception - see below. But, if I comment out  the 
>>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>>> works.
>>>>
>>>> My problem is, that I need to use some of the enzymes that are  only 
>>>> available in rebase. So how do I get this working?
>>>>
>>>> Thanks for your attention.
>>>>
>>>> Best regards,
>>>> Rasmus Ory Nielsen
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script:
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>>
>>>> ------------- EXCEPTION -------------
>>>> MSG: Bad end parameter (11). End must be less than the total  length of 
>>>> sequence (total=7)
>>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>>> 5.10.0/Bio/PrimarySeq.pm:401
>>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>>> STACK toplevel ./restriction_test.pl:30
>>>> -------------------------------------
>>>>
>>>> [roni at ksdhcp ~]$
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script with the '-enzymes' argument commented out
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>> $VAR1 = [
>>>>          {
>>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>>            'end' => 15,
>>>>            'start' => '1'
>>>>          },
>>>>          {
>>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>>            'end' => 34,
>>>>            'start' => '16'
>>>>          }
>>>>        ];
>>>> [roni at ksdhcp ~]$
>>>>
>>>> ############################################################
>>>>
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::PrimarySeq;
>>>> use Bio::Restriction::IO;
>>>> use Bio::Restriction::Analysis;
>>>> use Data::Dumper;
>>>>
>>>> # create seq obj
>>>> my $seqobj = new Bio::PrimarySeq(
>>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>>    -primary_id => 'test',
>>>>    -molecule   => 'dna'
>>>> );
>>>>
>>>> # read rebase file
>>>> my $rebase_io = Bio::Restriction::IO->new(
>>>>    -file   => 'withrefm.906',
>>>>    -format => 'withrefm',
>>>> );
>>>> my $rebase_collection = $rebase_io->read;
>>>>
>>>> # start restriction analysis
>>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>>    -seq     => $seqobj,
>>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>>> out
>>>> );
>>>>
>>>> # retrieve fragment maps
>>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>>> print Dumper \@fragment_maps;
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From MEC at stowers.org  Tue Jun 16 16:13:33 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Tue, 16 Jun 2009 15:13:33 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A389@exchmb-02.stowers-institute.org>

Chris!

erm, yeah, I do....

... and I will schedule some time to code up a test and add it to AlignI's suite....

Malcolm
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Tuesday, June 16, 2009 2:08 PM
> To: Malcolm Cook
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Alignment->slice() issue?
> 
> Sounds to me like a BioPerl bug.  Do you have some example 
> data demonstrating the problem?
> 
> chris
> 
> On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:
> 
> > Kevin,
> >
> > I'm getting struck by this old issue you once coded around.
> >
> >      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> >
> > Any chance you could share your implementation with  fellow 
> > traveller...
> >
> > ??
> >
> > Thanks,
> >
> > Malcolm Cook
> > Stowers insitute for Medical research
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From maj at fortinbras.us  Tue Jun 16 22:47:39 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 22:47:39 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>

Dear All,

The refactored Bio::Restriction::* has been merged to trunk, with all
tests passing. [Anyone got a cigarette?]

cheers,
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Russell.Smithies at agresearch.co.nz  Tue Jun 16 23:21:22 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 17 Jun 2009 15:21:22 +1200
Subject: [Bioperl-l] Bio::Restriction
	refactor	[Was:Bio::Restriction::Analysis. Exception when
	using rebasefile.]
In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz>

Cigarettes are post-coitus and pre-firing squad.
What you'd be needing is a cigar (proud father)

;-)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Wednesday, 17 June 2009 2:48 p.m.
> To: bioperl-l at lists.open-bio.org
> Cc: Rasmus Ory Nielsen
> Subject: Re: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
> 
> Dear All,
> 
> The refactored Bio::Restriction::* has been merged to trunk, with all
> tests passing. [Anyone got a cigarette?]
> 
> cheers,
> Mark
> 
> ----- Original Message -----
> From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis.
> Exception when using rebasefile.]
> 
> 
> > Dear All,
> >
> > The revamped Bio::Restriction::* in branch
> >
> > REPOS/bioperl-live/branches/restriction-refactor
> >
> > passes all existing tests, including those in t/Restriction.
> > New tests will be added within the next day or so.
> > The original bug occurred because only a subset of
> > the possible rebase withrefm-formatted enzymes were
> > handled; it choked on freshly-downloaded rebase
> > files because of this.
> >
> > The refactored version now handles *all* rebase types,
> > including those of rebase forms
> >
> > XXX^X                [ intrasite cutters, the main types
> >                               built in to base.pm]
> > XXXX(m/n)          [ right-end extrasite cutters ]
> > (s/t)XXXX            [ left-end ditto ]
> > (s/t)XXXX(m/n)    [ double-end ditto],
> >
> > palindromic and non-palindromic, as well as multisite
> > enzymes that string together combinations of these
> > forms. Much rationalization (well, seems rational to me
> > anyway) and cruft removal in the affected code has also
> > occurred. itype2.pm has been updated as well, to
> > conform to the refactoring.
> >
> > If you're dying to try this now, get a working copy
> > of the branch like so
> >
> > $ svn co
> > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor
> > bioperl-rr
> > $ cd bioperl-rr
> > $ perl Build.PL
> > $ ./Build test
> > $ ./Build install
> >
> > This will only hammer your current installation in the
> > $SITE_LIB/Bio/Restriction path; I worked only on
> > a sparse checkout of the necessary files. To revert to your
> > old install, do
> >
> > $ cd $MY_OLD_BIOPERL_WORKINGDIR
> > $ ./Build install
> >
> > [In the possible event that these instructions are in error,
> > there will be a response on this list in a matter of
> > milliseconds, so stand by.]
> >
> > Happy coding-
> > Mark
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Rasmus Ory Nielsen" <ron at ron.dk>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, June 10, 2009 3:35 AM
> > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
> > rebasefile.
> >
> >
> >> Hi,
> >>
> >> This is my first time using bioperl for restriction analysis, so please
> bear
> >> with me, if this is a FAQ.
> >>
> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created
> the
> >> script shown at the bottom of the mail.
> >> My bioperl version is bioperl-live nightly from 09-Jun-2009.
> >>
> >> The scripts throws an exception - see below. But, if I comment out the
> >> '-enzymes' argument, so it uses the built-in collection of enzymes, it
> works.
> >>
> >> My problem is, that I need to use some of the enzymes that are only
> available
> >> in rebase. So how do I get this working?
> >>
> >> Thanks for your attention.
> >>
> >> Best regards,
> >> Rasmus Ory Nielsen
> >>
> >>
> >> ############################################################
> >> Output from the script:
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >>
> >> ------------- EXCEPTION -------------
> >> MSG: Bad end parameter (11). End must be less than the total length of
> >> sequence (total=7)
> >> STACK Bio::PrimarySeq::subseq
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> >> STACK Bio::Restriction::Analysis::_enzyme_sites
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> >> STACK Bio::Restriction::Analysis::_cuts
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> >> STACK Bio::Restriction::Analysis::cut
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> >> STACK Bio::Restriction::Analysis::fragment_maps
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> >> STACK toplevel ./restriction_test.pl:30
> >> -------------------------------------
> >>
> >> [roni at ksdhcp ~]$
> >>
> >>
> >> ############################################################
> >> Output from the script with the '-enzymes' argument commented out
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >> $VAR1 = [
> >>           {
> >>             'seq' => 'CTCGACCGTTAGCAA',
> >>             'end' => 15,
> >>             'start' => '1'
> >>           },
> >>           {
> >>             'seq' => 'AGCTTTCTACCGTTATCGT',
> >>             'end' => 34,
> >>             'start' => '16'
> >>           }
> >>         ];
> >> [roni at ksdhcp ~]$
> >>
> >> ############################################################
> >>
> >> #!/usr/bin/perl
> >> use strict;
> >> use warnings;
> >> use Bio::PrimarySeq;
> >> use Bio::Restriction::IO;
> >> use Bio::Restriction::Analysis;
> >> use Data::Dumper;
> >>
> >> # create seq obj
> >> my $seqobj = new Bio::PrimarySeq(
> >>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
> >>     -primary_id => 'test',
> >>     -molecule   => 'dna'
> >> );
> >>
> >> # read rebase file
> >> my $rebase_io = Bio::Restriction::IO->new(
> >>     -file   => 'withrefm.906',
> >>     -format => 'withrefm',
> >> );
> >> my $rebase_collection = $rebase_io->read;
> >>
> >> # start restriction analysis
> >> my $restriction_analysis = Bio::Restriction::Analysis->new(
> >>     -seq     => $seqobj,
> >>     -enzymes => $rebase_collection,    # it works with this line commented
> >> out
> >> );
> >>
> >> # retrieve fragment maps
> >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> >> print Dumper \@fragment_maps;
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From e.stupka at ucl.ac.uk  Wed Jun 17 07:29:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 12:29:08 +0100
Subject: [Bioperl-l] Next-gen modules
Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>

Dear all,

after several years of absence I am slowly coming back to Bioperl, and  
hope to contribute again to its development.

One area that I was thinking of starting from, since we are actively  
involved with it, is to improve BIoperl's support fo next-gen  
sequencing data, tools, etc. Since I am sure I have missed out on a  
lot of recent developments, do let me know if/what is useful.

One example that comes to mind is that the conversion of various  
formats to/from FASTQ does not seem to be supported. Some code can be  
found within Li Heng's script: http://maq.sourceforge.net/ 
fq_all2std.pl but it would be good if it could make its way into  
SeqIO? And similarly, potentially, for other next-gen sequence formats?

Similarly, there seems to be little in bioperl-run to support tools  
that have been developed in this area, such as Maq, BowTie, TopHat, etc?

Do let me know if there is a past thread on this, or other people  
actively developing, etc. so that I can find out what priorities are.

thanks and best regards to all (old friends and new),

Elia

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 08:19:04 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:19:04 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>

[ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ]
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From biopython at maubp.freeserve.co.uk  Wed Jun 17 08:21:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 13:21:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com>

On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?

If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)

I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO

Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.

For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.

> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.

Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html

Regards,

Peter (at Biopython)

From maj at fortinbras.us  Wed Jun 17 08:02:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:02:11 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <92C15E3391F64BAF801754E924122540@NewLife>

Elia--
I say a definite +1; in fact, this sounds like it should be a Hot Topic 
(see http://www.bioperl.org/wiki/Category:Hot_Topics for some others
you might have missed in your hiatus...). I will create a page that 
can be a central point for wish lists, discussion, etc.

There has been much discussion of late about FASTQ 
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html

cheers from a newbie, 
Mark

----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Wed Jun 17 08:57:52 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 07:57:52 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>

Elia,

As Mark indicated, we recently discussed the lack of support for next- 
gen on list, at least re: fastq.  I may be hit with the same thing in  
a few months time myself, and I recall Jason and a few others also  
mentioning the same.  Heikki wrote some code for Illumina FASTQ for  
SeqIO and related modules but I don't believe it has been committed to  
trunk yet, so maybe he can answer.

 From prior discussions IIRC the issues were:

1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
Illumina 1.3) from one another (so maybe some optional validation), and
2) having a way for the Seq object to either 'know' what format is  
contained, or we use phred score and convert back and forth from that  
(I think the latter makes more sense).

Peter's suggestions also are reasonable, though does biopython have a  
separate module for each of these variations?  Our version (I believe)  
mainly varied the conversion within Bio::SeqIO::fastq itself based on  
the fastq variant passed in as a separate named argument.

As for the wrappers, we would most certainly welcome them!

chris

On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl,  
> and hope to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can  
> be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl 
>  but it would be good if it could make its way into SeqIO? And  
> similarly, potentially, for other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?
>
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 08:54:22 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 13:54:22 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>

Dear Mark,

thanks a lot for the pointers.

With regards to FASTQ parsing:

-my understanding by reading past threads is to work on a single  
format, i.e. FASTQ and to interpet the quality "flavours" as just  
quality conversions, right?

-However, I assume we would still want to support a simple way for the  
user to say format => 'fastq-solexa' using the nomenclature adopted in  
BioPython suggested by Peter, right?

-I also saw Heikki's "long essay", but did not yet compare to Heng  
Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we  
would hope they would produce identical outputs, will be a good check.

Finally, I saw Tristan's reply to Heikki's thread, so what is the  
status quo? Is it moving forward?

cheers

Elia


On 17 Jun 2009, at 13:02, Mark A. Jensen wrote:

> Elia--
> I say a definite +1; in fact, this sounds like it should be a Hot  
> Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some  
> others
> you might have missed in your hiatus...). I will create a page that  
> can be a central point for wish lists, discussion, etc.
>
> There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html
>
> cheers from a newbie, Mark
>
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:25:59 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:25:59 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
> Elia,
>
> As Mark indicated, we recently discussed the lack of support for next-gen on
> list, at least re: fastq. ?I may be hit with the same thing in a few months
> time myself, and I recall Jason and a few others also mentioning the same.
> ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but
> I don't believe it has been committed to trunk yet, so maybe he can answer.
>
> From prior discussions IIRC the issues were:
>
> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina
> 1.3) from one another (so maybe some optional validation), and

Following the python rule of thumb for being explicit, Biopython makes
the user specify which FASTQ variant is being used. I don't think you
can do anything else. Any attempted validation would have to be
heuristic based on the ASCII characters found, and would risk false
positive warnings.

> 2) having a way for the Seq object to either 'know' what format is
> contained, or we use phred score and convert back and forth from that (I
> think the latter makes more sense).

I think it could make sense for BioPerl to convert Solexa scores to/from
PHRED scores on the fly (especially now that Illumina is abandoning
the Solexa score system). Python style tries to avoid implicit conversions,
so Biopython doesn't automatically do a conversion from Solexa to
PHRED scores on parsing (but will on writing if the requested output
format requires this).

> Peter's suggestions also are reasonable, though does biopython have a
> separate module for each of these variations? ?Our version (I believe)
> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
> fastq variant passed in as a separate named argument.

Biopython's SeqIO gives the three FASTQ variants their own unique
names. This format name is a required argument for parsing/writing
(we don't try and guess the file format from the data contents). Internally
we have three separate FASTQ parsers/writers although they do share
code.

Other issues to keep in mind:

(3) There should be no warning parsing files where the optional repeated
title is missing on the "+" lines (as discussed earlier on the BioPerl list).

(4) When writing FASTQ files should BioPerl omit the optional repeated
title on the "+" line? Biopython omits this as I understand this to be
common practice, and can make a big different to file sizes - especially
on short read data from Solexa/Illumina.

(5) Also test reading and writing files with an optional description (as well
as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples,
e.g.

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC


(6) Test reading and writing files where the encoded quality string starts
with a "@" or a "+" character, e.g.
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html

Peter


From tristan.lefebure at gmail.com  Wed Jun 17 09:27:12 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 09:27:12 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <200906170927.13273.tristan.lefebure@gmail.com>

Hello,
Regarding next-gen sequences and bioperl, following my 
experience, another issue is bioperl speed. For example, if 
you want to trim bad quality bases at ends of 1E6 Solexa 
reads using Bio::SeqIO::fastq and some methods in 
Bio::Seq::Quality, well, you've got to be patient (but may 
be I missed some shortcuts...).

A pure perl solution will be between 100 to 1000x faster... 
Would it be possible to have an ultra-light quality object 
with few simple methods for next-gen reads?

I can contribute some tests if that sounds like an important 
point.

-Tristan


On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote:
> Elia--
> I say a definite +1; in fact, this sounds like it should
> be a Hot Topic (see
> http://www.bioperl.org/wiki/Category:Hot_Topics for some
> others you might have missed in your hiatus...). I will
> create a page that can be a central point for wish lists,
> discussion, etc.
>
> There has been much discussion of late about FASTQ
> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0
>30187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/
>029765.html
>
> cheers from a newbie,
> Mark
>
> ----- Original Message -----
> From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
> > Dear all,
> >
> > after several years of absence I am slowly coming back
> > to Bioperl, and hope to contribute again to its
> > development.
> >
> > One area that I was thinking of starting from, since we
> > are actively involved with it, is to improve BIoperl's
> > support fo next-gen sequencing data, tools, etc. Since
> > I am sure I have missed out on a lot of recent
> > developments, do let me know if/what is useful.
> >
> > One example that comes to mind is that the conversion
> > of various formats to/from FASTQ does not seem to be
> > supported. Some code can be found within Li Heng's
> > script: http://maq.sourceforge.net/ fq_all2std.pl but
> > it would be good if it could make its way into SeqIO?
> > And similarly, potentially, for other next-gen sequence
> > formats?
> >
> > Similarly, there seems to be little in bioperl-run to
> > support tools that have been developed in this area,
> > such as Maq, BowTie, TopHat, etc?
> >
> > Do let me know if there is a past thread on this, or
> > other people actively developing, etc. so that I can
> > find out what priorities are.
> >
> > thanks and best regards to all (old friends and new),
> >
> > Elia
> >
> > ---
> > Senior Lecturer, Bioinformatics
> > UCL Cancer Institute
> > Paul O' Gorman Building
> > University College London
> > Gower Street
> > WC1E 6BT
> > London
> > UK
> >
> > Office (UCL): +44 207 679 6493
> > Office (ICMS): +44 0207 8822374
> >
> > Mobile: +44 7597 566 194
> > Mobile (Italy): +39 338 8448801
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:54:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:54:45 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear Mark,
>
> thanks a lot for the pointers.
>
> With regards to FASTQ parsing:
>
> -my understanding by reading past threads is to work on a single format,
> i.e. FASTQ and to interpet the quality "flavours" as just quality
> conversions, right?
> -However, I assume we would still want to support a simple way for the user
> to say format => 'fastq-solexa' using the nomenclature adopted in BioPython
> suggested by Peter, right?

I think you will need a way for the user to say they have a Solexa, or
an Illumina 1.3+, or an original Sanger standard FASTQ file.

>From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I
assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format
and the variant
"flybase_chadoxml" format). Does BioPerl's SeqIO format system have any
concept of flavour that I am not aware of?

> -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code
> at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they
> would produce identical outputs, will be a good check.

Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful
guide (although it doesn't yet cope with the new Illumina 1.3+ variant),
but I don't trust it 100%. See e.g.
http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html
http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html

Peter

From john.marshall at sanger.ac.uk  Wed Jun 17 09:28:12 2009
From: john.marshall at sanger.ac.uk (John Marshall)
Date: Wed, 17 Jun 2009 14:28:12 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>

On 17 Jun 2009, at 12:29, Elia Stupka wrote:
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?

FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit  
in the not too distant future.  (First it needs some "blah blah"  
replaced with actual documentation and a test suite.)

Cheers,

     John

[1] http://www.ebi.ac.uk/~zerbino/velvet/


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From Kevin.M.Brown at asu.edu  Wed Jun 17 11:41:18 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 17 Jun 2009 08:41:18 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>

Warning: This is very ugly code and makes a few assumptions, such as the
alignment objects are stored in order of their start position. I made
this assumption as that is how I put them into the object to begin with.

=head1 C<slice>

Function to slice up an alignment sequence based on start and end
parameters
and returns a new alignment object.

slice($alignment, $start, $end)

=cut

sub slice
{
	my ($alignment, $start, $end, $new_align) = @_;

	$$new_align = new Bio::SimpleAlign;
	print $$alignment->no_sequences() . "\n";

	$$new_align->add_seq(
			   new Bio::LocatableSeq(
				   -seq =>
					 substr(
	
$$alignment->get_seq_by_pos(1)->seq(),
							$start - 1, $end
- $start + 1
						   ),
				   -id    =>
$$alignment->get_seq_by_pos(1)->display_id(),
				   -start =>
	
max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
				   -end => min(
	
$$alignment->get_seq_by_pos(1)->end - $start + 1,
							   $end - $start
+ 1
							  ),
				   -alphabet => 'dna',
				   -strand   =>
$$alignment->get_seq_by_pos(1)->strand()
			   )
	);

	# implement a binary search to determine a decent offset into
the alignment
	my $probe;
	
	if ($$alignment->no_sequences() <= 2) {
		$probe = $$alignment->no_sequences();
	}
	else {
	my ($L, $R) = (1, $$alignment->no_sequences());
	while (($R - $L) > 1)
	{
		$probe = floor(($R + $L) / 2);

		# gotta watch this.  Had the check backwards and so was
never going
		# in the right direction for the search.  If I reverse
these two
		# variables, then I have to either reverse the
conditions or change
		# the > to a <.
		if ($$alignment->get_seq_by_pos($probe)->start() >
$start)
		{
			$R = $probe;
		}
		else
		{
			$L = $probe;
		}
	}
	}
	# now go through the results that are after that point
	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
	{
		my $seq = $$alignment->get_seq_by_pos($i);
		last if ($seq->start() > $end);

		# Only concern ourselves with primers that land inside
the desired region
		# other primers will show up in the image maps for each
gene.
		if ($seq->start() >= $start && $seq->end() <= $end)
		{

			# values for the substr pullout of a given
sequence
			my $offset = max($start - $seq->start(), 0);
			my $length =
			  min($end, $seq->end()) - max($start,
$seq->start()) + 1;
			$$new_align->add_seq(
					 new Bio::LocatableSeq(
						 -seq   => $seq->seq(),
						 -id    =>
$seq->display_id(),
						 -start =>
max($seq->start - $start + 1, 1),
						 -end => min($seq->end -
$start + 1, $end - $start + 1),
						 -alphabet => 'dna',
						 -strand   =>
$seq->strand()
					 )
			);
		}
	}
	return 1;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Malcolm Cook
> Sent: Tuesday, June 16, 2009 1:07 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Alignment->slice() issue?
> 
> Kevin,
> 
> I'm getting struck by this old issue you once coded around.
> 
>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> 
> Any chance you could share your implementation with  fellow 
> traveller...
> 
> ??
> 
> Thanks,
> 
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jun 17 12:47:38 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 12:47:38 -0400
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>


Hi All, 

I thought I'd revisit this thread, since in the last couple weeks,
have used both techniques (bioperl-dev and branch from trunk) to
produce completed projects. My thoughts:

Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
new addition to the core api. There was no pressure to conform to the
existing api there. In particular, there was no implicit insistence to
make things work through Bio::Search::Utils, and I was free to factor
it out. The Tiling api was definitely unstable until the end, when it
was ported to the core. As I made regular reports to bioperl-l,
everything was transparent and up front, and I received excellent
suggestions there (as usual). 

For Bio::Restriction, using the branch was just as natural. Here, the
existing structure was well established, and all the work needed to
happen beneath the api. All old t/Restriction tests needed to pass,
and additional ones created for the new functionality. So here, using
bioperl-dev wasn't natural, even though some "experiments" needed to
be tried (some succeeded and some failed, as you can see in the
commentary at Bug #2855). Even though the new code turned out to
require substantial effort, the effort was required to fix a true bug
in the working core, and any fixes needed to work transparently with
respect to the users for whom this bug had not been an issue. Using
the branch made it relatively easy to merge quickly back into the core
when done, and there is a certain psychological pressure too provided
by an open branch which is helpful.

Hilmar raised the very good point in the previous discussion that
(essentially) bioperl-dev shouldn't become a sandbox with lots of
unfinished code scraps and derelict stuff that doesn't work. My view
is bioperl-dev will become a sandbox only if we treat it like
one. I've filled out the Bioperl-dev page on the wiki
(http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
some recognition to devs there whose modules become part of the
core may be a better way to insure that projects that are started on
bioperl-dev actually get finished, than to prescribe beforehand what
kinds of projects may get started. I believe this follows the adage of
liberality on what is accepted, and strictness on what is emitted.

cheers, 
MAJ


----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at duke.edu>
To: "Chase Miller" <chmille4 at gmail.com>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, May 21, 2009 4:00 PM
Subject: Re: [Bioperl-l] bioperl-dev or branch?


> Moving this question to the BioPerl list, which is where we need to  
> discuss this I think. Can someone refresh my memory on what the  
> Bioperl-dev repository is or was meant for? It doesn't seem documented  
> on the wiki.
> 
> My (admittedly vague) recollection is that bioperl-dev is basically  
> for highly experimental changes or functionality.
> 
> I'm not clear why everything else shouldn't go either into the main  
> trunk or into a branch. If there is a realistic expectation for  
> something to be folded into the main trunk sooner or later, what would  
> be the reasons for not putting it into a branch of the main  
> repository? If we are putting it into a separate repository, we're  
> waiving a lot of svn's support for merging and resolving concurrent  
> edits.
> 
> I would also go actually go a step further and suggest that even if  
> this GSoC project starts out on a branch (which I can see good reasons  
> for, such as eliminating fear to disrupt something), there should be a  
> plan to move to main trunk before the end of the project. We've had a  
> good tradition in BioPerl of developing directly on the main trunk. It  
> sometimes leads to occasional disruptions when lots of tests seem  
> failing, but it also encourages development discipline and make new  
> code to melt into the BioPerl code base without requiring any extra  
> steps by someone.
> 
> Any and all thoughts or comments welcome and appreciated!
> 
> -hilmar
> 
> On May 21, 2009, at 11:26 AM, Chase Miller wrote:
> 
>> This brings me to a question about where I should have my code  
>> repository.  Originally, I was going to use Bioperl-dev, but it was  
>> brought to my attention that that repository does not normally  
>> receive daily updates and it might not be the right place for my day  
>> to day development.  An alternative would be to use something like  
>> google code on a daily basis and commit to Bioperl-dev on a weekly  
>> basis.
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Wed Jun 17 13:06:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:06:44 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>


On Jun 17, 2009, at 8:25 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>
>> Elia,
>>
>> As Mark indicated, we recently discussed the lack of support for  
>> next-gen on
>> list, at least re: fastq.  I may be hit with the same thing in a  
>> few months
>> time myself, and I recall Jason and a few others also mentioning  
>> the same.
>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  
>> modules but
>> I don't believe it has been committed to trunk yet, so maybe he can  
>> answer.
>>
>> From prior discussions IIRC the issues were:
>>
>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
>> Illumina
>> 1.3) from one another (so maybe some optional validation), and
>
> Following the python rule of thumb for being explicit, Biopython makes
> the user specify which FASTQ variant is being used. I don't think you
> can do anything else. Any attempted validation would have to be
> heuristic based on the ASCII characters found, and would risk false
> positive warnings.

Right; I'm thinking along the same lines.  If anything the most we  
would allow is some level of validation, so if there were a degree of  
uncertainty about the format one could set a validation flag to check  
bounds during the parse and warn if they are exceeded.

>> 2) having a way for the Seq object to either 'know' what format is
>> contained, or we use phred score and convert back and forth from  
>> that (I
>> think the latter makes more sense).
>
> I think it could make sense for BioPerl to convert Solexa scores to/ 
> from
> PHRED scores on the fly (especially now that Illumina is abandoning
> the Solexa score system). Python style tries to avoid implicit  
> conversions,
> so Biopython doesn't automatically do a conversion from Solexa to
> PHRED scores on parsing (but will on writing if the requested output
> format requires this).
>
>> Peter's suggestions also are reasonable, though does biopython have a
>> separate module for each of these variations?  Our version (I  
>> believe)
>> mainly varied the conversion within Bio::SeqIO::fastq itself based  
>> on the
>> fastq variant passed in as a separate named argument.
>
> Biopython's SeqIO gives the three FASTQ variants their own unique
> names. This format name is a required argument for parsing/writing
> (we don't try and guess the file format from the data contents).  
> Internally
> we have three separate FASTQ parsers/writers although they do share
> code.

We could easily do the same if others agree.  Actually, if we  
specified that shorthand for a variant on a format would be designated  
as -format => 'format-variant', I think we could easily hack SeqIO to  
deal with that by splitting on '-' and passing everything to the  
constructor as (-format => 'format', -variant => 'variant').  Very  
little repeated code in this case, just an additional named parameter  
indicating the format variant (and the SeqIO class can do the type  
checking on that within the constructor).

> Other issues to keep in mind:
>
> (3) There should be no warning parsing files where the optional  
> repeated
> title is missing on the "+" lines (as discussed earlier on the  
> BioPerl list).

Agreed, though we'll have to check the current fastq parser to see if  
that's currently the case.  I thought that was fixed but maybe not?

> (4) When writing FASTQ files should BioPerl omit the optional repeated
> title on the "+" line? Biopython omits this as I understand this to be
> common practice, and can make a big different to file sizes -  
> especially
> on short read data from Solexa/Illumina.

Agreed, particularly if it's commonly encountered.

> (5) Also test reading and writing files with an optional description  
> (as well
> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  
> examples,
> e.g.
>
> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

Should be easy enough to implement with a simple regex.

> (6) Test reading and writing files where the encoded quality string  
> starts
> with a "@" or a "+" character, e.g.
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>
> Peter

Mark, getting all that? ;>

chris


From cjfields at illinois.edu  Wed Jun 17 13:09:54 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:09:54 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>


On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

The key issues affecting speed in bioperl are contained object  
instantiation and inheritance (and between those two, the latter much  
more so as it plays a role with contained objects as well as the  
container).

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

Moose/Perl6 roles/traits are one way around that issue, but we are a  
ways off from getting that running.  I think to get that working  
decently would be a from-ground-up endeavor (see my past posts on  
biomoose/bioperl6).

> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan

The quality objects themselves I don't think are that heavy; I think  
the main impediment is inheritance.  One could get around that a bit  
by using a direct_new method to create a blessed hash directly, then  
reimplement methods to lazily create any objects contained on the fly.

chris


From bill at genenformics.com  Wed Jun 17 13:03:16 2009
From: bill at genenformics.com (bill at genenformics.com)
Date: Wed, 17 Jun 2009 10:03:16 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com>

Hopefully this is helpful.

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648

Bill at genenformics

> Warning: This is very ugly code and makes a few assumptions, such as the
> alignment objects are stored in order of their start position. I made
> this assumption as that is how I put them into the object to begin with.
>
> =head1 C<slice>
>
> Function to slice up an alignment sequence based on start and end
> parameters
> and returns a new alignment object.
>
> slice($alignment, $start, $end)
>
> =cut
>
> sub slice
> {
> 	my ($alignment, $start, $end, $new_align) = @_;
>
> 	$$new_align = new Bio::SimpleAlign;
> 	print $$alignment->no_sequences() . "\n";
>
> 	$$new_align->add_seq(
> 			   new Bio::LocatableSeq(
> 				   -seq =>
> 					 substr(
>
> $$alignment->get_seq_by_pos(1)->seq(),
> 							$start - 1, $end
> - $start + 1
> 						   ),
> 				   -id    =>
> $$alignment->get_seq_by_pos(1)->display_id(),
> 				   -start =>
>
> max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
> 				   -end => min(
>
> $$alignment->get_seq_by_pos(1)->end - $start + 1,
> 							   $end - $start
> + 1
> 							  ),
> 				   -alphabet => 'dna',
> 				   -strand   =>
> $$alignment->get_seq_by_pos(1)->strand()
> 			   )
> 	);
>
> 	# implement a binary search to determine a decent offset into
> the alignment
> 	my $probe;
>
> 	if ($$alignment->no_sequences() <= 2) {
> 		$probe = $$alignment->no_sequences();
> 	}
> 	else {
> 	my ($L, $R) = (1, $$alignment->no_sequences());
> 	while (($R - $L) > 1)
> 	{
> 		$probe = floor(($R + $L) / 2);
>
> 		# gotta watch this.  Had the check backwards and so was
> never going
> 		# in the right direction for the search.  If I reverse
> these two
> 		# variables, then I have to either reverse the
> conditions or change
> 		# the > to a <.
> 		if ($$alignment->get_seq_by_pos($probe)->start() >
> $start)
> 		{
> 			$R = $probe;
> 		}
> 		else
> 		{
> 			$L = $probe;
> 		}
> 	}
> 	}
> 	# now go through the results that are after that point
> 	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
> 	{
> 		my $seq = $$alignment->get_seq_by_pos($i);
> 		last if ($seq->start() > $end);
>
> 		# Only concern ourselves with primers that land inside
> the desired region
> 		# other primers will show up in the image maps for each
> gene.
> 		if ($seq->start() >= $start && $seq->end() <= $end)
> 		{
>
> 			# values for the substr pullout of a given
> sequence
> 			my $offset = max($start - $seq->start(), 0);
> 			my $length =
> 			  min($end, $seq->end()) - max($start,
> $seq->start()) + 1;
> 			$$new_align->add_seq(
> 					 new Bio::LocatableSeq(
> 						 -seq   => $seq->seq(),
> 						 -id    =>
> $seq->display_id(),
> 						 -start =>
> max($seq->start - $start + 1, 1),
> 						 -end => min($seq->end -
> $start + 1, $end - $start + 1),
> 						 -alphabet => 'dna',
> 						 -strand   =>
> $seq->strand()
> 					 )
> 			);
> 		}
> 	}
> 	return 1;
> }
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Malcolm Cook
>> Sent: Tuesday, June 16, 2009 1:07 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Alignment->slice() issue?
>>
>> Kevin,
>>
>> I'm getting struck by this old issue you once coded around.
>>
>>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>>
>> Any chance you could share your implementation with  fellow
>> traveller...
>>
>> ??
>>
>> Thanks,
>>
>> Malcolm Cook
>> Stowers insitute for Medical research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Wed Jun 17 13:13:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 13:13:23 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>

I'm on the case! (but maybe not in realtime, today!)

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" 
<e.stupka at ucl.ac.uk>; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
Sent: Wednesday, June 17, 2009 1:06 PM
Subject: Re: [Bioperl-l] Next-gen modules


>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  wrote:
>>>
>>> Elia,
>>>
>>> As Mark indicated, we recently discussed the lack of support for  next-gen 
>>> on
>>> list, at least re: fastq.  I may be hit with the same thing in a  few months
>>> time myself, and I recall Jason and a few others also mentioning  the same.
>>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  modules 
>>> but
>>> I don't believe it has been committed to trunk yet, so maybe he can  answer.
>>>
>>> From prior discussions IIRC the issues were:
>>>
>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, 
>>> Illumina
>>> 1.3) from one another (so maybe some optional validation), and
>>
>> Following the python rule of thumb for being explicit, Biopython makes
>> the user specify which FASTQ variant is being used. I don't think you
>> can do anything else. Any attempted validation would have to be
>> heuristic based on the ASCII characters found, and would risk false
>> positive warnings.
>
> Right; I'm thinking along the same lines.  If anything the most we  would 
> allow is some level of validation, so if there were a degree of  uncertainty 
> about the format one could set a validation flag to check  bounds during the 
> parse and warn if they are exceeded.
>
>>> 2) having a way for the Seq object to either 'know' what format is
>>> contained, or we use phred score and convert back and forth from  that (I
>>> think the latter makes more sense).
>>
>> I think it could make sense for BioPerl to convert Solexa scores to/ from
>> PHRED scores on the fly (especially now that Illumina is abandoning
>> the Solexa score system). Python style tries to avoid implicit  conversions,
>> so Biopython doesn't automatically do a conversion from Solexa to
>> PHRED scores on parsing (but will on writing if the requested output
>> format requires this).
>>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations?  Our version (I  believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based  on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).  Internally
>> we have three separate FASTQ parsers/writers although they do share
>> code.
>
> We could easily do the same if others agree.  Actually, if we  specified that 
> shorthand for a variant on a format would be designated  as -format => 
> 'format-variant', I think we could easily hack SeqIO to  deal with that by 
> splitting on '-' and passing everything to the  constructor as (-format => 
> 'format', -variant => 'variant').  Very  little repeated code in this case, 
> just an additional named parameter  indicating the format variant (and the 
> SeqIO class can do the type  checking on that within the constructor).
>
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional  repeated
>> title is missing on the "+" lines (as discussed earlier on the  BioPerl 
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if  that's 
> currently the case.  I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes -  especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description  (as 
>> well
>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  examples,
>> e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string  starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From e.stupka at ucl.ac.uk  Wed Jun 17 13:49:38 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 18:49:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
Message-ID: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>

I would suggest developing the "standard" version first, then moving  
onto potential optimizations.

When we went through a similar argument in Ensembl about 8 years ago  
we ended up dropping Bio::Root completely...

If one is truly after performance for these large next-gen projects,  
it'd be down to pure piping, shell, and worrying about location and  
copying of files, sticking to systems-level as much as possible, and  
quite far from Bioperl altogether, so I think it's a whole different  
level of optimization issues, probably outside the scope of Bioperl.

Elia

On 17 Jun 2009, at 18:09, Chris Fields wrote:

>
> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>
>> Hello,
>> Regarding next-gen sequences and bioperl, following my
>> experience, another issue is bioperl speed. For example, if
>> you want to trim bad quality bases at ends of 1E6 Solexa
>> reads using Bio::SeqIO::fastq and some methods in
>> Bio::Seq::Quality, well, you've got to be patient (but may
>> be I missed some shortcuts...).
>
> The key issues affecting speed in bioperl are contained object  
> instantiation and inheritance (and between those two, the latter  
> much more so as it plays a role with contained objects as well as  
> the container).
>
> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>
> Moose/Perl6 roles/traits are one way around that issue, but we are a  
> ways off from getting that running.  I think to get that working  
> decently would be a from-ground-up endeavor (see my past posts on  
> biomoose/bioperl6).
>
>> A pure perl solution will be between 100 to 1000x faster...
>> Would it be possible to have an ultra-light quality object
>> with few simple methods for next-gen reads?
>>
>> I can contribute some tests if that sounds like an important
>> point.
>>
>> -Tristan
>
> The quality objects themselves I don't think are that heavy; I think  
> the main impediment is inheritance.  One could get around that a bit  
> by using a direct_new method to create a blessed hash directly, then  
> reimplement methods to lazily create any objects contained on the fly.
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 13:52:49 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:52:49 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>

I think this is a top priority for a fall BioPerl release, maybe 1.6.2  
(I am planning on a summer 1.6.1 release still).  Made it into a bug  
report for tracking:

http://bugzilla.open-bio.org/show_bug.cgi?id=2857

If no one works on this I may take it up after the 1.6.1 release.

chris

On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:

> I'm on the case! (but maybe not in realtime, today!)
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
> Sent: Wednesday, June 17, 2009 1:06 PM
> Subject: Re: [Bioperl-l] Next-gen modules
>
>
>>
>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>
>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>> Fields<cjfields at illinois.edu>  wrote:
>>>>
>>>> Elia,
>>>>
>>>> As Mark indicated, we recently discussed the lack of support for   
>>>> next-gen on
>>>> list, at least re: fastq.  I may be hit with the same thing in a   
>>>> few months
>>>> time myself, and I recall Jason and a few others also mentioning   
>>>> the same.
>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>> modules but
>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>> can  answer.
>>>>
>>>> From prior discussions IIRC the issues were:
>>>>
>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>> 1.0, Illumina
>>>> 1.3) from one another (so maybe some optional validation), and
>>>
>>> Following the python rule of thumb for being explicit, Biopython  
>>> makes
>>> the user specify which FASTQ variant is being used. I don't think  
>>> you
>>> can do anything else. Any attempted validation would have to be
>>> heuristic based on the ASCII characters found, and would risk false
>>> positive warnings.
>>
>> Right; I'm thinking along the same lines.  If anything the most we   
>> would allow is some level of validation, so if there were a degree  
>> of  uncertainty about the format one could set a validation flag to  
>> check  bounds during the parse and warn if they are exceeded.
>>
>>>> 2) having a way for the Seq object to either 'know' what format is
>>>> contained, or we use phred score and convert back and forth from   
>>>> that (I
>>>> think the latter makes more sense).
>>>
>>> I think it could make sense for BioPerl to convert Solexa scores  
>>> to/ from
>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>> the Solexa score system). Python style tries to avoid implicit   
>>> conversions,
>>> so Biopython doesn't automatically do a conversion from Solexa to
>>> PHRED scores on parsing (but will on writing if the requested output
>>> format requires this).
>>>
>>>> Peter's suggestions also are reasonable, though does biopython  
>>>> have a
>>>> separate module for each of these variations?  Our version (I   
>>>> believe)
>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>> based  on the
>>>> fastq variant passed in as a separate named argument.
>>>
>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>> names. This format name is a required argument for parsing/writing
>>> (we don't try and guess the file format from the data contents).   
>>> Internally
>>> we have three separate FASTQ parsers/writers although they do share
>>> code.
>>
>> We could easily do the same if others agree.  Actually, if we   
>> specified that shorthand for a variant on a format would be  
>> designated  as -format => 'format-variant', I think we could easily  
>> hack SeqIO to  deal with that by splitting on '-' and passing  
>> everything to the  constructor as (-format => 'format', -variant =>  
>> 'variant').  Very  little repeated code in this case, just an  
>> additional named parameter  indicating the format variant (and the  
>> SeqIO class can do the type  checking on that within the  
>> constructor).
>>
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional   
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the   
>>> BioPerl list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if  that's currently the case.  I thought that was fixed but maybe  
>> not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -   
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description  (as well
>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>> for  examples,
>>> e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string  starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 14:01:28 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:01:28 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
	<16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
Message-ID: <E0FAC5DB-470E-48E1-A30F-B64E2E63EB86@ucl.ac.uk>

If we reach a consensus on how/who/what, I will be happy to contribute  
some coding time in the coming days.

Would it be a good starting point to start adding the different  
formats as named in BioPython, and test support for reading/wrting  
them? I could start playing with that.

regards,

Elia

On 17 Jun 2009, at 18:52, Chris Fields wrote:

> I think this is a top priority for a fall BioPerl release, maybe  
> 1.6.2 (I am planning on a summer 1.6.1 release still).  Made it into  
> a bug report for tracking:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2857
>
> If no one works on this I may take it up after the 1.6.1 release.
>
> chris
>
> On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:
>
>> I'm on the case! (but maybe not in realtime, today!)
>>
>> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
>> >
>> To: "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
>> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
>> Sent: Wednesday, June 17, 2009 1:06 PM
>> Subject: Re: [Bioperl-l] Next-gen modules
>>
>>
>>>
>>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>>> Fields<cjfields at illinois.edu>  wrote:
>>>>>
>>>>> Elia,
>>>>>
>>>>> As Mark indicated, we recently discussed the lack of support  
>>>>> for  next-gen on
>>>>> list, at least re: fastq.  I may be hit with the same thing in  
>>>>> a  few months
>>>>> time myself, and I recall Jason and a few others also  
>>>>> mentioning  the same.
>>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>>> modules but
>>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>>> can  answer.
>>>>>
>>>>> From prior discussions IIRC the issues were:
>>>>>
>>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>>> 1.0, Illumina
>>>>> 1.3) from one another (so maybe some optional validation), and
>>>>
>>>> Following the python rule of thumb for being explicit, Biopython  
>>>> makes
>>>> the user specify which FASTQ variant is being used. I don't think  
>>>> you
>>>> can do anything else. Any attempted validation would have to be
>>>> heuristic based on the ASCII characters found, and would risk false
>>>> positive warnings.
>>>
>>> Right; I'm thinking along the same lines.  If anything the most  
>>> we  would allow is some level of validation, so if there were a  
>>> degree of  uncertainty about the format one could set a validation  
>>> flag to check  bounds during the parse and warn if they are  
>>> exceeded.
>>>
>>>>> 2) having a way for the Seq object to either 'know' what format is
>>>>> contained, or we use phred score and convert back and forth  
>>>>> from  that (I
>>>>> think the latter makes more sense).
>>>>
>>>> I think it could make sense for BioPerl to convert Solexa scores  
>>>> to/ from
>>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>>> the Solexa score system). Python style tries to avoid implicit   
>>>> conversions,
>>>> so Biopython doesn't automatically do a conversion from Solexa to
>>>> PHRED scores on parsing (but will on writing if the requested  
>>>> output
>>>> format requires this).
>>>>
>>>>> Peter's suggestions also are reasonable, though does biopython  
>>>>> have a
>>>>> separate module for each of these variations?  Our version (I   
>>>>> believe)
>>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>>> based  on the
>>>>> fastq variant passed in as a separate named argument.
>>>>
>>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>>> names. This format name is a required argument for parsing/writing
>>>> (we don't try and guess the file format from the data contents).   
>>>> Internally
>>>> we have three separate FASTQ parsers/writers although they do share
>>>> code.
>>>
>>> We could easily do the same if others agree.  Actually, if we   
>>> specified that shorthand for a variant on a format would be  
>>> designated  as -format => 'format-variant', I think we could  
>>> easily hack SeqIO to  deal with that by splitting on '-' and  
>>> passing everything to the  constructor as (-format => 'format', - 
>>> variant => 'variant').  Very  little repeated code in this case,  
>>> just an additional named parameter  indicating the format variant  
>>> (and the SeqIO class can do the type  checking on that within the  
>>> constructor).
>>>
>>>> Other issues to keep in mind:
>>>>
>>>> (3) There should be no warning parsing files where the optional   
>>>> repeated
>>>> title is missing on the "+" lines (as discussed earlier on the   
>>>> BioPerl list).
>>>
>>> Agreed, though we'll have to check the current fastq parser to see  
>>> if  that's currently the case.  I thought that was fixed but maybe  
>>> not?
>>>
>>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>>> repeated
>>>> title on the "+" line? Biopython omits this as I understand this  
>>>> to be
>>>> common practice, and can make a big different to file sizes -   
>>>> especially
>>>> on short read data from Solexa/Illumina.
>>>
>>> Agreed, particularly if it's commonly encountered.
>>>
>>>> (5) Also test reading and writing files with an optional  
>>>> description  (as well
>>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>>> for  examples,
>>>> e.g.
>>>>
>>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>>
>>> Should be easy enough to implement with a simple regex.
>>>
>>>> (6) Test reading and writing files where the encoded quality  
>>>> string  starts
>>>> with a "@" or a "+" character, e.g.
>>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>>
>>>> Peter
>>>
>>> Mark, getting all that? ;>
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From tristan.lefebure at gmail.com  Wed Jun 17 14:09:42 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 14:09:42 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <200906171409.42558.tristan.lefebure@gmail.com>

Thanks both for the light.

That probably means that the place bioperl will take in the 
handling of the next-gen sequencing raw data (i.e. reads) is 
very limited, nope? (at least until bioperl6). A single GA2 
solexa lane generates about 9 million reads, and I would 
really not called that a big project...

BTW, is there a simple way to see object instantiation and 
inheritance, as well as time consumption for each, when once 
calls next_seq() (or any other method)?

-Tristan

On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
> I would suggest developing the "standard" version first,
> then moving onto potential optimizations.
>
> When we went through a similar argument in Ensembl about
> 8 years ago we ended up dropping Bio::Root completely...
>
> If one is truly after performance for these large
> next-gen projects, it'd be down to pure piping, shell,
> and worrying about location and copying of files,
> sticking to systems-level as much as possible, and quite
> far from Bioperl altogether, so I think it's a whole
> different level of optimization issues, probably outside
> the scope of Bioperl.
>
> Elia
>
> On 17 Jun 2009, at 18:09, Chris Fields wrote:
> > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
> >> Hello,
> >> Regarding next-gen sequences and bioperl, following my
> >> experience, another issue is bioperl speed. For
> >> example, if you want to trim bad quality bases at ends
> >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
> >> methods in Bio::Seq::Quality, well, you've got to be
> >> patient (but may be I missed some shortcuts...).
> >
> > The key issues affecting speed in bioperl are contained
> > object instantiation and inheritance (and between those
> > two, the latter much more so as it plays a role with
> > contained objects as well as the container).
> >
> > http://www.bioperl.org/wiki/Why_BioPerl_is_slow
> >
> > Moose/Perl6 roles/traits are one way around that issue,
> > but we are a ways off from getting that running.  I
> > think to get that working decently would be a
> > from-ground-up endeavor (see my past posts on
> > biomoose/bioperl6).
> >
> >> A pure perl solution will be between 100 to 1000x
> >> faster... Would it be possible to have an ultra-light
> >> quality object with few simple methods for next-gen
> >> reads?
> >>
> >> I can contribute some tests if that sounds like an
> >> important point.
> >>
> >> -Tristan
> >
> > The quality objects themselves I don't think are that
> > heavy; I think the main impediment is inheritance.  One
> > could get around that a bit by using a direct_new
> > method to create a blessed hash directly, then
> > reimplement methods to lazily create any objects
> > contained on the fly.
> >
> > chris
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801


From bix at sendu.me.uk  Wed Jun 17 14:20:00 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 19:20:00 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <4A3933D0.4040808@sendu.me.uk>

Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my 
> experience, another issue is bioperl speed. For example, if 
> you want to trim bad quality bases at ends of 1E6 Solexa 
> reads using Bio::SeqIO::fastq and some methods in 
> Bio::Seq::Quality, well, you've got to be patient (but may 
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant 
set of users out there who are dealing with next-gen sequencing and 
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at 
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster... 
> Would it be possible to have an ultra-light quality object 
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the 
speedup is to not create any Bio::Seq* objects but just return the data 
directly. At that point it's not taking much advantage of BioPerl. But 
certainly it could be done...

From e.stupka at ucl.ac.uk  Wed Jun 17 14:39:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:39:08 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk>

We are using bioperl for simple pre and post-processing of data for  
full Solexa runs, and although it might not be ideal, the scripting  
with Bioperl is not a major killer. When I was referring to large,  
heavy pipelines I was thinking of pipelines that deal with many Solexa  
runs as one project (e.g. 1000 genomes) who really cannot afford any  
bottleneck in their pipelines, because that affects directly their  
storage.

cheers

Elia


On 17 Jun 2009, at 19:09, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...
>
> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan
>
> On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
>> I would suggest developing the "standard" version first,
>> then moving onto potential optimizations.
>>
>> When we went through a similar argument in Ensembl about
>> 8 years ago we ended up dropping Bio::Root completely...
>>
>> If one is truly after performance for these large
>> next-gen projects, it'd be down to pure piping, shell,
>> and worrying about location and copying of files,
>> sticking to systems-level as much as possible, and quite
>> far from Bioperl altogether, so I think it's a whole
>> different level of optimization issues, probably outside
>> the scope of Bioperl.
>>
>> Elia
>>
>> On 17 Jun 2009, at 18:09, Chris Fields wrote:
>>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my
>>>> experience, another issue is bioperl speed. For
>>>> example, if you want to trim bad quality bases at ends
>>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
>>>> methods in Bio::Seq::Quality, well, you've got to be
>>>> patient (but may be I missed some shortcuts...).
>>>
>>> The key issues affecting speed in bioperl are contained
>>> object instantiation and inheritance (and between those
>>> two, the latter much more so as it plays a role with
>>> contained objects as well as the container).
>>>
>>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>>>
>>> Moose/Perl6 roles/traits are one way around that issue,
>>> but we are a ways off from getting that running.  I
>>> think to get that working decently would be a
>>> from-ground-up endeavor (see my past posts on
>>> biomoose/bioperl6).
>>>
>>>> A pure perl solution will be between 100 to 1000x
>>>> faster... Would it be possible to have an ultra-light
>>>> quality object with few simple methods for next-gen
>>>> reads?
>>>>
>>>> I can contribute some tests if that sounds like an
>>>> important point.
>>>>
>>>> -Tristan
>>>
>>> The quality objects themselves I don't think are that
>>> heavy; I think the main impediment is inheritance.  One
>>> could get around that a bit by using a direct_new
>>> method to create a blessed hash directly, then
>>> reimplement methods to lazily create any objects
>>> contained on the fly.
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 14:40:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 13:40:05 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu>

On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...

I don't think it's impossible.  If you parse any very long list of  
sequences in order it will be very slow, yes, but if they were indexed  
or loaded into a DB lookups would of course be magnitudes faster.

We already have perl-based indexing for fastq (Bio::Index::Fastq), so  
maybe something could be built on top of that. I haven't looked but we  
can also wrap other C/C++-based parsers as well. BioLib, for instance,  
has bindings to io_lib, so maybe that could be (ab)used in some way.

> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan

As a simple benchmark, at one point all feature tag information was  
converted into Bio::Annotations.  I reverted that behavior to be  
simple tag/value again and had a pretty decent bump:

http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark

Also, I tried reimplementing some parsers as generic 'event'-based  
driver/handler and they were slightly faster, the key roadblock being  
instantation again.  If I didn't create Features/Annotations I saw a  
significant speedup.  That's not entirely unexpected, as SeqFeatures  
also contain Locations (in turn that can contain subLocations) and  
(until recently) tag-based Bio::Annotation by default.  Annotations  
are collected in an Annotation::Collection and can contain other  
objects I believe (Ontology terms, etc).

The overall lesson is, if you don't have very heavy objects being  
created the overhead is actually quite small; it's only when you  
greedily instantiate everything that you run into problems.

chris

From cjfields at illinois.edu  Wed Jun 17 15:05:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:05:03 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <E92652A7-7622-4183-8DC3-596E6593C587@illinois.edu>

On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote:

> I would suggest developing the "standard" version first, then moving  
> onto potential optimizations.

Yes, agreed.

> When we went through a similar argument in Ensembl about 8 years ago  
> we ended up dropping Bio::Root completely...

They (strangely enough) still use it in a few modules and require  
bioperl 1.2.3, but (in my experience) the latest bioperl works just  
fine.  I asked about that and never got a response.

> If one is truly after performance for these large next-gen projects,  
> it'd be down to pure piping, shell, and worrying about location and  
> copying of files, sticking to systems-level as much as possible, and  
> quite far from Bioperl altogether, so I think it's a whole different  
> level of optimization issues, probably outside the scope of Bioperl.
>
> Elia

In the end I don't think we can run it using perl alone, no, and I  
believe using BioPerl by itself will not be the optimal solution, but  
it can probably interface with something that is.

chris

From e.stupka at ucl.ac.uk  Wed Jun 17 15:14:04 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:14:04 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk>

Excellent, I was thinking of working on Maq and BowTie as priorities.

Elia

On 17 Jun 2009, at 14:28, John Marshall wrote:

> On 17 Jun 2009, at 12:29, Elia Stupka wrote:
>> Similarly, there seems to be little in bioperl-run to support tools  
>> that have been developed in this area, such as Maq, BowTie, TopHat,  
>> etc?
>
> FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to  
> submit in the not too distant future.  (First it needs some "blah  
> blah" replaced with actual documentation and a test suite.)
>
> Cheers,
>
>    John
>
> [1] http://www.ebi.ac.uk/~zerbino/velvet/
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From michael.watson at bbsrc.ac.uk  Wed Jun 17 15:15:20 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 17 Jun 2009 20:15:20 +0100
Subject: [Bioperl-l] Next-gen modules
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk>

In answer to your question, yes!  We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results.  This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java.  Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem.

 
________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala
Sent: Wed 17/06/2009 7:20 PM
To: tristan.lefebure at gmail.com
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Next-gen modules


Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 17 15:30:15 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:30:15 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>

On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> Hello,
>> Regarding next-gen sequences and bioperl, following my experience,  
>> another issue is bioperl speed. For example, if you want to trim  
>> bad quality bases at ends of 1E6 Solexa reads using  
>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>> you've got to be patient (but may be I missed some shortcuts...).
>
> This is my concern as well. Or, rather, is there actually a  
> significant set of users out there who are dealing with next-gen  
> sequencing and would consider using BioPerl for their work?
>
> I'm working with all the 1000-genomes data at the Sanger, and we at  
> least are probably never going to use BioPerl for the work.

Are you using pure perl or (gasp) something else?  ;>

Judging by the feedback there are definitely a set of users who would  
like to integrate nextgen into bioperl somehow, probably to take  
advantage of other aspects of bioperl.

>> A pure perl solution will be between 100 to 1000x faster... Would  
>> it be possible to have an ultra-light quality object with few  
>> simple methods for next-gen reads?
>
> The fastq parser itself already seems pretty fast. The way to get  
> the speedup is to not create any Bio::Seq* objects but just return  
> the data directly. At that point it's not taking much advantage of  
> BioPerl. But certainly it could be done...


I suppose the best way to assess what needs to be done is come up with  
a set of 'use cases' specifying what users want so we can design  
around them, otherwise we're shooting in the dark.

I'm personally wondering if this could be done as a sequence database,  
something similar in theme to Lincoln's SeqFeature::Store, but  
sequence only, and returns quality objects in a similar manner (ala  
Storable)?  Not sure whether that's feasible, but it's appears at  
least scalable.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 15:37:26 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:37:26 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<4C3D793879C64A5E84C67FE313C86FA4@NewLife>
Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>

Dear all,

I tried to summarize today's discussion with what seems to be the  
"shaping consensus" on the Wiki page:

http://www.bioperl.org/wiki/Nextgen_in_Bioperl

good night,

Elia


On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:

> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>  ]
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From e.stupka at ucl.ac.uk  Wed Jun 17 16:06:35 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:06:35 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>

Interesting that you mention the database issue. We found that for  
specific memory/CPU intenstive things we also switch to using dbs. For  
example, after many years of loyal use of disconnected_ranges we  
switched to a simple SQL implementation of it, because of the large  
performance gains it would give us.  Similarly in Ensembl as well as  
in the old days of bioperl-db we opted for doing subseq within SQL  
where possible.

Some lean way of SQL'izing specific components could be less  
"disruptive" than avoiding object creation and provide significant  
gains in performance. Could be set as an optional flag, and could use  
temporary ad hoc SQL databases?

Still, priority now is to make SeqIO compliant with all those formats,  
than we can worry about performance :)

Elia

On 17 Jun 2009, at 20:30, Chris Fields wrote:

> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience,  
>>> another issue is bioperl speed. For example, if you want to trim  
>>> bad quality bases at ends of 1E6 Solexa reads using  
>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>> you've got to be patient (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a  
>> significant set of users out there who are dealing with next-gen  
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at  
>> least are probably never going to use BioPerl for the work.
>
> Are you using pure perl or (gasp) something else?  ;>
>
> Judging by the feedback there are definitely a set of users who  
> would like to integrate nextgen into bioperl somehow, probably to  
> take advantage of other aspects of bioperl.
>
>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>> it be possible to have an ultra-light quality object with few  
>>> simple methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get  
>> the speedup is to not create any Bio::Seq* objects but just return  
>> the data directly. At that point it's not taking much advantage of  
>> BioPerl. But certainly it could be done...
>
>
> I suppose the best way to assess what needs to be done is come up  
> with a set of 'use cases' specifying what users want so we can  
> design around them, otherwise we're shooting in the dark.
>
> I'm personally wondering if this could be done as a sequence  
> database, something similar in theme to Lincoln's SeqFeature::Store,  
> but sequence only, and returns quality objects in a similar manner  
> (ala Storable)?  Not sure whether that's feasible, but it's appears  
> at least scalable.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:29:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:29:31 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife>
	<540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife>

Thanks Elia for those wiki notes--
[I would say you received an enthusiatic 'welcome back'!]
cheers, 
Mark
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 3:37 PM
Subject: Re: [Bioperl-l] Next-gen modules


> Dear all,
> 
> I tried to summarize today's discussion with what seems to be the  
> "shaping consensus" on the Wiki page:
> 
> http://www.bioperl.org/wiki/Nextgen_in_Bioperl
> 
> good night,
> 
> Elia
> 
> 
> On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:
> 
>> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>>  ]
>> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 17, 2009 7:29 AM
>> Subject: [Bioperl-l] Next-gen modules
>>
>>
>>> Dear all,
>>> after several years of absence I am slowly coming back to Bioperl,  
>>> and  hope to contribute again to its development.
>>> One area that I was thinking of starting from, since we are  
>>> actively  involved with it, is to improve BIoperl's support fo next- 
>>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>>> on a  lot of recent developments, do let me know if/what is useful.
>>> One example that comes to mind is that the conversion of various   
>>> formats to/from FASTQ does not seem to be supported. Some code can  
>>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>>> fq_all2std.pl but it would be good if it could make its way into   
>>> SeqIO? And similarly, potentially, for other next-gen sequence  
>>> formats?
>>> Similarly, there seems to be little in bioperl-run to support  
>>> tools  that have been developed in this area, such as Maq, BowTie,  
>>> TopHat, etc?
>>> Do let me know if there is a past thread on this, or other people   
>>> actively developing, etc. so that I can find out what priorities are.
>>> thanks and best regards to all (old friends and new),
>>> Elia
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Wed Jun 17 16:35:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 15:35:38 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>

So, #1 priority is to get fastq up-to-speed, then maybe assess other  
options.

Illuminating discussion, thanks Elia!

urgh, excuse unintended bad pun above...

chris

On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Interesting that you mention the database issue. We found that for  
> specific memory/CPU intenstive things we also switch to using dbs.  
> For example, after many years of loyal use of disconnected_ranges we  
> switched to a simple SQL implementation of it, because of the large  
> performance gains it would give us.  Similarly in Ensembl as well as  
> in the old days of bioperl-db we opted for doing subseq within SQL  
> where possible.
>
> Some lean way of SQL'izing specific components could be less  
> "disruptive" than avoiding object creation and provide significant  
> gains in performance. Could be set as an optional flag, and could  
> use temporary ad hoc SQL databases?
>
> Still, priority now is to make SeqIO compliant with all those  
> formats, than we can worry about performance :)
>
> Elia
>
> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>>
>> Are you using pure perl or (gasp) something else?  ;>
>>
>> Judging by the feedback there are definitely a set of users who  
>> would like to integrate nextgen into bioperl somehow, probably to  
>> take advantage of other aspects of bioperl.
>>
>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>>
>>
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>>
>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 16:36:31 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:36:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>

Better than colorspaced discussions for sure ;)

Elia

On 17 Jun 2009, at 21:35, Chris Fields wrote:

> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
>
> Illuminating discussion, thanks Elia!
>
> urgh, excuse unintended bad pun above...
>
> chris
>
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges  
>> we switched to a simple SQL implementation of it, because of the  
>> large performance gains it would give us.  Similarly in Ensembl as  
>> well as in the old days of bioperl-db we opted for doing subseq  
>> within SQL where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>> Would it be possible to have an ultra-light quality object with  
>>>>> few simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just  
>>>> return the data directly. At that point it's not taking much  
>>>> advantage of BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:54:00 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife>

unintended! Does that mean your delete key's broke...?
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Elia Stupka" <e.stupka at ucl.ac.uk>
Cc: <bioperl-l at lists.open-bio.org>; <tristan.lefebure at gmail.com>
Sent: Wednesday, June 17, 2009 4:35 PM
Subject: Re: [Bioperl-l] Next-gen modules


> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
> 
> Illuminating discussion, thanks Elia!
> 
> urgh, excuse unintended bad pun above...
> 
> chris
> 
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
> 
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges we  
>> switched to a simple SQL implementation of it, because of the large  
>> performance gains it would give us.  Similarly in Ensembl as well as  
>> in the old days of bioperl-db we opted for doing subseq within SQL  
>> where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>>> it be possible to have an ultra-light quality object with few  
>>>>> simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just return  
>>>> the data directly. At that point it's not taking much advantage of  
>>>> BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From hartzell at alerce.com  Wed Jun 17 16:40:03 2009
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 17 Jun 2009 13:40:03 -0700
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com>

Sendu Bala writes:
 > Tristan Lefebure wrote:
 > > Hello,
 > > Regarding next-gen sequences and bioperl, following my 
 > > experience, another issue is bioperl speed. For example, if 
 > > you want to trim bad quality bases at ends of 1E6 Solexa 
 > > reads using Bio::SeqIO::fastq and some methods in 
 > > Bio::Seq::Quality, well, you've got to be patient (but may 
 > > be I missed some shortcuts...).
 > 
 > This is my concern as well. Or, rather, is there actually a significant 
 > set of users out there who are dealing with next-gen sequencing and 
 > would consider using BioPerl for their work?
 > 
 > I'm working with all the 1000-genomes data at the Sanger, and we at 
 > least are probably never going to use BioPerl for the work.
 > [...]

Is it purely a speed issue, or are there other issues (e.g. stability,
correctness, compatibility) that are contributing to your decision?

What *are* you using?

g.


From bix at sendu.me.uk  Wed Jun 17 18:10:57 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:10:57 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <4A3969F1.8080002@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience, 
>>> another issue is bioperl speed. For example, if you want to trim bad 
>>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and 
>>> some methods in Bio::Seq::Quality, well, you've got to be patient 
>>> (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a 
>> significant set of users out there who are dealing with next-gen 
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at 
>> least are probably never going to use BioPerl for the work.
> 
> Are you using pure perl or (gasp) something else?  ;>

We use some perl stuff, some C stuff. My own stuff is OO perl, but much 
lighter weight than BioPerl. Absolute minimal object creation.


>>> A pure perl solution will be between 100 to 1000x faster... Would it 
>>> be possible to have an ultra-light quality object with few simple 
>>> methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get the 
>> speedup is to not create any Bio::Seq* objects but just return the 
>> data directly. At that point it's not taking much advantage of 
>> BioPerl. But certainly it could be done...
> 
> I suppose the best way to assess what needs to be done is come up with a 
> set of 'use cases' specifying what users want so we can design around 
> them, otherwise we're shooting in the dark.

Indeed. Though at least I think we can all agree it would be nice to 
have the functionality there even if it's slow. There will always be at 
least some use-cases where the run speed doesn't matter.


> I'm personally wondering if this could be done as a sequence database, 
> something similar in theme to Lincoln's SeqFeature::Store, but sequence 
> only, and returns quality objects in a similar manner (ala Storable)?  
> Not sure whether that's feasible, but it's appears at least scalable.

I think not. Well, at least SeqFeature::Store doesn't scale. Try storing 
millions of features in a database and watch it crawl to complete 
unusability. I can't imagine a db scaling to holding hundreds of TB of 
data either. I'm also not sure what the benefit is. There are already 
high-speed ways of indexing your fastq or bam files.

From bix at sendu.me.uk  Wed Jun 17 18:24:50 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:24:50 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
Message-ID: <4A396D32.5070909@sendu.me.uk>

George Hartzell wrote:
> Sendu Bala writes:
>  > Tristan Lefebure wrote:
>  > > Hello,
>  > > Regarding next-gen sequences and bioperl, following my 
>  > > experience, another issue is bioperl speed. For example, if 
>  > > you want to trim bad quality bases at ends of 1E6 Solexa 
>  > > reads using Bio::SeqIO::fastq and some methods in 
>  > > Bio::Seq::Quality, well, you've got to be patient (but may 
>  > > be I missed some shortcuts...).
>  > 
>  > This is my concern as well. Or, rather, is there actually a significant 
>  > set of users out there who are dealing with next-gen sequencing and 
>  > would consider using BioPerl for their work?
>  > 
>  > I'm working with all the 1000-genomes data at the Sanger, and we at 
>  > least are probably never going to use BioPerl for the work.
>  > [...]
> 
> Is it purely a speed issue, or are there other issues (e.g. stability,
> correctness, compatibility) that are contributing to your decision?

Too heavy-weight, too slow, too memory intensive, missing too much 
functionality in any case. If I have to write new parsers and wrappers, 
I may as well make them fast (which means they don't "fit" into BioPerl).


> What *are* you using?

There are already great tools written in C that do all the heavy lifting 
and the rest is done in perl written for speed and low memory.

From cjfields at illinois.edu  Wed Jun 17 18:38:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 17:38:26 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3969F1.8080002@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>

On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>> Are you using pure perl or (gasp) something else?  ;>
>
> We use some perl stuff, some C stuff. My own stuff is OO perl, but  
> much lighter weight than BioPerl. Absolute minimal object creation.

Makes sense.

>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>
> Indeed. Though at least I think we can all agree it would be nice to  
> have the functionality there even if it's slow. There will always be  
> at least some use-cases where the run speed doesn't matter.

Agreed.

>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>
> I think not. Well, at least SeqFeature::Store doesn't scale. Try  
> storing millions of features in a database and watch it crawl to  
> complete unusability. I can't imagine a db scaling to holding  
> hundreds of TB of data either. I'm also not sure what the benefit  
> is. There are already high-speed ways of indexing your fastq or bam  
> files.

Interesting that you ran into issues with SF::Store; wonder if object  
storage is the limiting factor there, or if it is something else.  
Anyone else having this issue?

chris


From cjfields at illinois.edu  Wed Jun 17 21:08:55 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 20:08:55 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A396D32.5070909@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
	<4A396D32.5070909@sendu.me.uk>
Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu>

On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris

From jbarrick at msu.edu  Wed Jun 17 23:10:43 2009
From: jbarrick at msu.edu (Jeffrey Barrick)
Date: Wed, 17 Jun 2009 23:10:43 -0400
Subject: [Bioperl-l] svn error
Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu>

Hi all,

I've been trying to download the latest version of "bioperl-live"  
through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion 
] and I keep getting an "svn: Found malformed header in revision file"  
error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to  
stop prematurely.

I also get the error when trying to browse that directory, for example:
http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB

Any ideas?

Thanks,
   --Jeff

From hlapp at gmx.net  Wed Jun 17 21:51:16 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 17 Jun 2009 20:51:16 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <C8873056-793B-4FEE-94EE-3341087478D1@gmx.net>


On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Similarly in Ensembl as well as in the old days of bioperl-db we  
> opted for doing subseq within SQL where possible.


BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL,  
unless you manipulate the sequence, or make it a non-persistent object.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Jun 18 02:45:17 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 18 Jun 2009 07:45:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
	<550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
Message-ID: <4A39E27D.9040807@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:
 >
>>> I'm personally wondering if this could be done as a sequence 
>>> database, something similar in theme to Lincoln's SeqFeature::Store, 
>>> but sequence only, and returns quality objects in a similar manner 
>>> (ala Storable)?  Not sure whether that's feasible, but it's appears 
>>> at least scalable.
>>
>> I think not. Well, at least SeqFeature::Store doesn't scale. Try 
>> storing millions of features in a database and watch it crawl to 
>> complete unusability. I can't imagine a db scaling to holding hundreds 
>> of TB of data either. I'm also not sure what the benefit is. There are 
>> already high-speed ways of indexing your fastq or bam files.
> 
> Interesting that you ran into issues with SF::Store; wonder if object 
> storage is the limiting factor there, or if it is something else.

Object storage certainly was an issue, which is why I patched it to 
(optionally) not store objects. That helped a great deal, but ultimately 
only increased the number of features you could store before it slowed 
down; it didn't solve the problem completely.

From Xianjun.Dong at bccs.uib.no  Thu Jun 18 06:15:47 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Thu, 18 Jun 2009 12:15:47 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4A33D850.1020203@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no>
Message-ID: <4A3A13D3.7050208@ii.uib.no>

Hi, Scott,

Do you mind to have a look of the code (below my signature) if I use the 
-postgrid callback correctly?
I still cannnot get the background for the whole panel.

Thanks

Xianjun


Xianjun Dong wrote:
> Hi, Scott
>
> Before I gave up my own whole solution to use GBrowse, I still want to 
> bother you once:
>
> As you suggested, I put -postgrid option when the panel, which will 
> call a function to draw the background. The code below is almost 
> copied from the online POD of Bio::Graphics::Panel (see 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
> )
>
> But it still does not work. Could you help to have a look? I paste it 
> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while 
> the gap drawing function is gap_it, not draw_gap. I guess it's a typo. 
> or not?)
>
> THanks
>
> Xianjun
>
> ----------------------------------------------- mytestcode.pl 
> --------------------------
>
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 = 
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = 
> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = 
> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans4 = 
> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans5 = 
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans  = 
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 = 
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
> -source=>'a');
> my $trans41 = 
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>                                             -length=>1050,
>                                             -start =>0,
>                                             -pad_left=>12,
>                                             -pad_right=>12
>                                             -postgrid=>\&gap_it);
>
> sub gap_it {
>     my $gd    = shift;
>     my $panel = shift;
>     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>     my $top                  = $panel->top;
>     my $bottom               = $gd->height, #panel->bottom;
>     my $gray                 = $panel->translate_color('red');
>     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
> }
> # the following track works as I expected in bioperl 1.2.3, but not in 
> 1.5 and 1.6
> #$panel->add_track([$trans41,$trans31],
> #          -glyph   => 'background',
> #                  -block_bgcolor => sub{return (shift->source eq 
> 'a')?'#cccccc':'#fffc22'},
> #                  );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>                  -glyph=>'arrow',
>                  -double=>1,
>                  -tick=>2);
>
> $panel->add_track($trans,
>          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>                  -fgcolor => 'darkred',
>                  -bgcolor => 'darkred',
>                  -title => '$source',
>                  -link => 
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
> #EnsEMBL
>                  );
>   print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in 
> Bioperl 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
>
>
>
>
>
>
>
>
>
> Scott Cain wrote:
>> Hi Xianjun,
>>
>> I understand what you want to do, as the current version of gbrowse
>> does this, which uses bioperl 1.6.  Without digging through the code,
>> I can't tell you exactly how this works and you didn't send your code
>> that uses this callback, so I can't try it either.
>>
>> One thing that is different between your code and gbrowse is that each
>> of the tracks is actually a seperate panel (to allow track dragging),
>> so it possible that this sort of callback doesn't work for
>> Bio::Graphics any more.
>>
>> Scott
>>
>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> 
>> wrote:
>>  
>>> Hi, Scott
>>>
>>> Thanks for your reply first.
>>>
>>> I still have question: I dig out the code from GBrowse (which I 
>>> paste below). Method make_postgrid_callback gets all highlight 
>>> region and then use hilite_regions_closure function to draw them 
>>> out, using the following GD function:
>>>
>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>
>>> where the $bottom=$panel->bottom. This is the only difference from 
>>> my code, where I use $gd->height. I guess they are almost same 
>>> (except the pad_bottom), we can see this in the code of 
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 
>>>
>>>
>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, 
>>> for my highlight regions. The output is same, when using the library 
>>> of Bioperl 1.6 (or 1.5). You can see the attached image 
>>> ("test.bioperl1.6.png")
>>>
>>> OK. I might have not explained my question explicitly. My question 
>>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 
>>> 1.2.3), I can get the right image I want (see the attached file 
>>> "test.bioperl1.2.3.png"), where the highlight range will go from the 
>>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
>>> highlight region in its own track, not the whole panel. OK, did I 
>>> explain clearly now? you can see the difference of the two images.
>>>
>>> [I am not sure the mailist allow to attach image, otherwise, I put 
>>> them in the following links:
>>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>>> test.bioperl1.2.3.png:    
>>> http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>
>>> You can test it and see the difference if you have both 1.2.3 and 
>>> 1.6 on your computer?
>>>
>>> Really want to know how this works in bioperl 1.2.3 (Even though 
>>> this might be a bug at that version, or whatever)
>>>
>>> Thanks
>>>
>>> Xianjun
>>> =============================================
>>>
>>> # this generates the callback for highlighting a region
>>> sub make_postgrid_callback {
>>>  my $settings = shift;
>>>  return unless ref $settings->{h_region};
>>>
>>>  my @h_regions = map {
>>>    my ($h_ref,$h_start,$h_end,$h_color) = 
>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>>                 : ()
>>>  }
>>>    @{$settings->{h_region}};
>>>
>>>  return unless @h_regions;
>>>  return hilite_regions_closure(@h_regions);
>>> }
>>>
>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>> # suitable for hilighting a region of a panel.
>>> # The args are a list of [start,end,color]
>>> sub hilite_regions_closure {
>>>  my @h_regions = @_;
>>>
>>>  return sub {
>>>    my $gd     = shift;
>>>    my $panel  = shift;
>>>    my $left   = $panel->pad_left;
>>>    my $top    = $panel->top;
>>>    my $bottom = $panel->bottom;
>>>    for my $r (@h_regions) {
>>>      my ($h_start,$h_end,$h_color) = @$r;
>>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always 
>>> see something
>>>      # assuming top is 0 so as to ignore top padding
>>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>    }
>>>  };
>>> }
>>>
>>>
>>> Scott Cain wrote:
>>>
>>> Hello Xianjun,
>>>
>>> I don't think that approach will work.  What you almost certainly need
>>> to do is a postgrid callback that does the drawing of the highlighted
>>> region.  For example code of how to do this, take a look at the
>>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>>> -postgrid is a method of Bio::Graphics::Panel.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun 
>>> Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>>
>>>
>>> HI,
>>>
>>> I am not sure this is the right place I can get help.
>>>
>>> I've suffered by a problem for several days: I want to highlight 
>>> parts of
>>> regions in my track, using a different background color. To do that, I
>>> defined a glyph named "background", based on the
>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>> method, by adding code like below:
>>>
>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>>
>>> # the script is pasted at the end
>>>
>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>> highlight regions into a list of features, and add_track with
>>> -glyph=>'background'. (see the following script, test.pl) This 
>>> really works
>>> as I expect, which will add a colored block at background of all 
>>> tracks in a
>>> panel (including the ruler arrow). You can see the output image in 
>>> attached
>>> file "test.bioperl1.2.3.png"
>>>
>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it 
>>> does not
>>> work. Well, it works, but the highlight part only shrink to a low 
>>> height,
>>> instead of covering all tracks in the panel. I also attached the output
>>> here, see the file "test.bioperl1.6.png".
>>>
>>> I tried to think about the reason, the 'background' module is based 
>>> on the
>>> generic module. What can cause the difference? Is it because 
>>> $gd->height is
>>> different, or the tracks followed with 'background' track can not 
>>> draw from
>>> the first position?
>>>
>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
>>> person
>>> solve problem, wise person avoid problem"...) But another problem is 
>>> coming:
>>> Bio::Graphics in Bioperl 1.2.3 does not support 
>>> $panel->create_web_map()
>>> function, which means I have to use some higher version if I want to 
>>> create
>>> web map for my graphics, but then I have to give up using highlight
>>> background.
>>>
>>> OK. It's long enough for my first-time submission here. Hope someone 
>>> can
>>> throw me some clue.
>>>
>>> Thanks ahead!!
>>>
>>> Xianjun
>>>
>>>
>>> ==================== test.pl =======================
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 = 
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 = 
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 = 
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans  =
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
>>>
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
>>>
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>                                            -length=>1050,
>>>                                            -start =>0,
>>>                                            -pad_left=>12,
>>>                                            -pad_right=>12);
>>>
>>> # the following track works as I expected in bioperl 1.2.3, but not 
>>> in 1.5
>>> and 1.6
>>> $panel->add_track([$trans41,$trans31],
>>>         -glyph   => 'background',
>>>                 -block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>>                 );
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>                 -glyph=>'arrow',
>>>                 -double=>1,
>>>                 -tick=>2);
>>>
>>> $panel->add_track($trans,
>>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>>                 -fgcolor => 'darkred',
>>>                 -bgcolor => 'darkred',
>>>                 -title => '$source',
>>>                 -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
>>> #EnsEMBL
>>>                 );
>>>  print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in 
>>> Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>> 1;
>>>
>>> ==================== background.pm =======================
>>> package Bio::Graphics::Glyph::background;
>>>
>>> use strict;
>>> use base 'Bio::Graphics::Glyph::generic';
>>> sub pad_top{
>>>  return 0;
>>> }
>>>
>>> sub draw_component {
>>>  my $self = shift;
>>>  #$self->SUPER::draw_component(@_);
>>>  my ($gd,$dx,$dy) = @_;
>>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>
>>>  # draw an arrow to indicate the direction of transcript
>>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>>  $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>> }
>>>
>>> 1;
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>>     
>>
>>   
>

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From charles.tilford at bms.com  Thu Jun 18 09:38:34 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 09:38:34 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
Message-ID: <4A3A435A.8000505@bms.com>

Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. 
Can anyone confirm?

Hi all,

I'm using the SCF Bio::SeqIO module to parse trace data out of 
chromatograms. The SCF files are being produced by phred using the "-cd" 
parameter. The traces come out great, and the corresponding base calls 
from the .phd files align with the peaks wonderfully when I visualize 
them on a rendered trace. However, only the A bases align to the 
appropriate trace channel, the rest are mixed up. I find that if I do 
the following re-mapping, the phred base calls match the

SeqIO : Remapped
A : A
C : G
G : T
T : C

The relevant part of Bio::SeqIO::scf is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9

... which indicates that it expects the pack()ed trace data to be in 
order ATGC. The base call parsing code is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8

... which is unpacking in order ACGT. As far as I can tell, the relevant 
official SCF documentation is here:

http://staden.sourceforge.net/manual/formats_unix_4.html

... which indicates that both trace and base order should be ACGT 
(matching the SeqIO unpack() for bases, but not traces). My empirical 
channel unscrambling mapping implies order ACTG, which is different from 
either of the two orders above. The sequence from the SCF file (should 
be that from original AB1 file, I think) is not perfectly identical to 
that called by phred, but is very similar (to be expected); that is, I 
don't need to remap C, G and T to get it to align with the phred data.

So it looks like the SeqIO module is not mapping the sections of the 
packed trace data to the appropriate bases. The unpack order is 
different than the staden documentation ... but so is the order I impose 
to correct the problem. I am still unclear as to the differences between 
V2 and V3 of the format. The major difference appears to be coding the 
trace absolutely (V2) or relatively to prior values (V3); I'd expect if 
I was using one format and SeqIO was trying to parse the other that I 
would get garbage out. Running in verbose reports "scf.pm is working 
with a version 2 scf."

Thoughts on this would be appreciated - can anyone confirm a problem 
with trace extraction from SCF?

I'm hoping that once I convince our admin to (properly) install 
staden::read that I can work directly with the ab1 files, but I need to 
stopgap on SCF for the time being....

-CAT

From cjfields at illinois.edu  Thu Jun 18 11:31:08 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 10:31:08 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>

Charles,

The best way to make sure this is addressed is to file a ticket (bug  
report) on it so we can properly track it.  I have a local  
installation of io_lib and I believe we also have Geneious installed  
locally (both of which read SCF), so I can work on confirming that.   
If it stays on the list it may not get answered and a possible bug  
report will be lost (to possibly bite someone else later).

AFAIK this module doesn't use staden::read but is pure perl.  You are  
more than welcome to try out Bio::SeqIO::staden::read, but I have to  
warn you that most of us are looking at replacing it's functionality  
at some point with BioLib bindings to io_lib (more stable) and so we  
don't intend on following up with bug fixes.

Note: there is also Bio::SCF (non-bp):

http://search.cpan.org/~lds/Bio-SCF-1.01/

chris

On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:

> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of  
> chromatograms. The SCF files are being produced by phred using the "- 
> cd" parameter. The traces come out great, and the corresponding base  
> calls from the .phd files align with the peaks wonderfully when I  
> visualize them on a rendered trace. However, only the A bases align  
> to the appropriate trace channel, the rest are mixed up. I find that  
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in  
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the  
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT  
> (matching the SeqIO unpack() for bases, but not traces). My  
> empirical channel unscrambling mapping implies order ACTG, which is  
> different from either of the two orders above. The sequence from the  
> SCF file (should be that from original AB1 file, I think) is not  
> perfectly identical to that called by phred, but is very similar (to  
> be expected); that is, I don't need to remap C, G and T to get it to  
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the  
> packed trace data to the appropriate bases. The unpack order is  
> different than the staden documentation ... but so is the order I  
> impose to correct the problem. I am still unclear as to the  
> differences between V2 and V3 of the format. The major difference  
> appears to be coding the trace absolutely (V2) or relatively to  
> prior values (V3); I'd expect if I was using one format and SeqIO  
> was trying to parse the other that I would get garbage out. Running  
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem  
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install  
> staden::read that I can work directly with the ab1 files, but I need  
> to stopgap on SCF for the time being....
>
> -CAT


From MEC at stowers.org  Thu Jun 18 11:42:48 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Thu, 18 Jun 2009 10:42:48 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>

Charles,

Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF

	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm

It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.

Its not in the bioperl project but it is an easy install from CPAN.

I am familiar with staden::read installation woes.  

Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
  

#!/usr/bin/env perl

# PURPOSE: extract from AB1 files into fasta format the sequence in
# the 'clear range' defined by 3 parameters.  If there is no clear
# range, emit warning and skip the sequence.  The fasta 'defline'
# identifier is taken as the sample name.  Other useful attributes are
# also embedded into the defline using attribute=value syntax.

# USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1

# NOTE: 20 4 20 is ABI default settings

# EXAMPLE:
# ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta

# AUTHOR: malcolm_cook at stowers-institute.org

use strict;
use warnings;
use Bio::Trace::ABIF;
use Text::Wrap qw(wrap);
$Text::Wrap::columns = 72;	# wrap the sequence

use File::Basename;
my ($window_width,
    $bad_bases_threshold,
    $quality_threshold,
    @ARGV) = @ARGV;

my $abif = Bio::Trace::ABIF->new();

sub main {} {
  foreach (@ARGV) {
    $abif->open_abif($_) or die "error opening $_ as ABIF";
    my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
								   $bad_bases_threshold,
								   $quality_threshold
								  );
    my $sample_score = $abif->sample_score(
					   $window_width,
					   $bad_bases_threshold,
					   $quality_threshold
					  );
    #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
    #							       $quality_threshold,
    #							       0, # ==> trim_ends
    #							      );
    #    my $length_of_read = $abif->length_of_read(
    #				    $window_width,
    #				    $quality_threshold,
    #				    # $method
    #				   );
    my $defline = 
      join "\t", 
	$abif->sample_name,
	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
	  (map {my $method = $_;
		"$method=". ($abif->$method() || '')}
	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
	     # sample_tracking_id - don't use this - it is internal to ABI software
	     "clear_range_start=$clear_range_start",
	       "clear_range_stop=$clear_range_stop",
		 "sample_score=$sample_score",
		   #"contiguous_read_length=$contiguous_read_length",
		   #"length_of_read=$length_of_read",
		   ;
    if ($clear_range_start == -1) {
      warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
      next;
    }
    my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
    print ">$defline\n$seq\n";
    $abif->close_abif();

  }
}

main ();


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Charles Tilford
> Sent: Thursday, June 18, 2009 8:39 AM
> To: BioPerl List
> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
> 
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
> channels. 
> Can anyone confirm?
> 
> Hi all,
> 
> I'm using the SCF Bio::SeqIO module to parse trace data out 
> of chromatograms. The SCF files are being produced by phred 
> using the "-cd" 
> parameter. The traces come out great, and the corresponding 
> base calls from the .phd files align with the peaks 
> wonderfully when I visualize them on a rendered trace. 
> However, only the A bases align to the appropriate trace 
> channel, the rest are mixed up. I find that if I do the 
> following re-mapping, the phred base calls match the
> 
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
> 
> The relevant part of Bio::SeqIO::scf is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE9
> 
> ... which indicates that it expects the pack()ed trace data 
> to be in order ATGC. The base call parsing code is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE8
> 
> ... which is unpacking in order ACGT. As far as I can tell, 
> the relevant official SCF documentation is here:
> 
> http://staden.sourceforge.net/manual/formats_unix_4.html
> 
> ... which indicates that both trace and base order should be 
> ACGT (matching the SeqIO unpack() for bases, but not traces). 
> My empirical channel unscrambling mapping implies order ACTG, 
> which is different from either of the two orders above. The 
> sequence from the SCF file (should be that from original AB1 
> file, I think) is not perfectly identical to that called by 
> phred, but is very similar (to be expected); that is, I don't 
> need to remap C, G and T to get it to align with the phred data.
> 
> So it looks like the SeqIO module is not mapping the sections 
> of the packed trace data to the appropriate bases. The unpack 
> order is different than the staden documentation ... but so 
> is the order I impose to correct the problem. I am still 
> unclear as to the differences between
> V2 and V3 of the format. The major difference appears to be 
> coding the trace absolutely (V2) or relatively to prior 
> values (V3); I'd expect if I was using one format and SeqIO 
> was trying to parse the other that I would get garbage out. 
> Running in verbose reports "scf.pm is working with a version 2 scf."
> 
> Thoughts on this would be appreciated - can anyone confirm a 
> problem with trace extraction from SCF?
> 
> I'm hoping that once I convince our admin to (properly) 
> install staden::read that I can work directly with the ab1 
> files, but I need to stopgap on SCF for the time being....
> 
> -CAT
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From carze at som.umaryland.edu  Thu Jun 18 13:51:43 2009
From: carze at som.umaryland.edu (Cesar Arze)
Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT)
Subject: [Bioperl-l]  Problems parsing scientific name from a Genbank file
Message-ID: <24095355.post@talk.nabble.com>


Hi all,
   I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
addressed.

When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
NC_005945:

  ORGANISM  Bacillus anthracis str. Sterne
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus
            cereus group.

Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.

Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
-- 
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From charles.tilford at bms.com  Thu Jun 18 15:59:01 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 15:59:01 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
References: <4A3A435A.8000505@bms.com>
	<49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
Message-ID: <4A3A9C85.4000603@bms.com>

Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug  
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl. 
Yes, that's my understanding too. I'm using the SeqIO module because of 
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>   
I have that installed, but have not tried it out yet.

Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>   
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of  
>> chromatograms. The SCF files are being produced by phred using the "- 
>> cd" parameter. The traces come out great, and the corresponding base  
>> calls from the .phd files align with the peaks wonderfully when I  
>> visualize them on a rendered trace. However, only the A bases align  
>> to the appropriate trace channel, the rest are mixed up. I find that  
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in  
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the  
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT  
>> (matching the SeqIO unpack() for bases, but not traces). My  
>> empirical channel unscrambling mapping implies order ACTG, which is  
>> different from either of the two orders above. The sequence from the  
>> SCF file (should be that from original AB1 file, I think) is not  
>> perfectly identical to that called by phred, but is very similar (to  
>> be expected); that is, I don't need to remap C, G and T to get it to  
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the  
>> packed trace data to the appropriate bases. The unpack order is  
>> different than the staden documentation ... but so is the order I  
>> impose to correct the problem. I am still unclear as to the  
>> differences between V2 and V3 of the format. The major difference  
>> appears to be coding the trace absolutely (V2) or relatively to  
>> prior values (V3); I'd expect if I was using one format and SeqIO  
>> was trying to parse the other that I would get garbage out. Running  
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem  
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install  
>> staden::read that I can work directly with the ab1 files, but I need  
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>     
>
>
>
>   

From charles.tilford at bms.com  Thu Jun 18 16:02:53 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 16:02:53 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
Message-ID: <4A3A9D6D.2010106@bms.com>

Cook, Malcolm wrote:
> Charles,
>
> Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF
>
> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>
> It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.
>
> Its not in the bioperl project but it is an easy install from CPAN.
>   
Thanks - we installed that a few weeks ago, and it was on my list of 
things to try, but I had not gotten to it yet since I was getting data 
out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, 
the fact that I need to unscramble it makes me nervous... Thanks, too, 
for the example code. I'll try out the Bio::Trace::ABIF module and see 
if it works with our files.

Thanks,
CAT
> I am familiar with staden::read installation woes.  
>
> Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
> #!/usr/bin/env perl
>
> # PURPOSE: extract from AB1 files into fasta format the sequence in
> # the 'clear range' defined by 3 parameters.  If there is no clear
> # range, emit warning and skip the sequence.  The fasta 'defline'
> # identifier is taken as the sample name.  Other useful attributes are
> # also embedded into the defline using attribute=value syntax.
>
> # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1
>
> # NOTE: 20 4 20 is ABI default settings
>
> # EXAMPLE:
> # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta
>
> # AUTHOR: malcolm_cook at stowers-institute.org
>
> use strict;
> use warnings;
> use Bio::Trace::ABIF;
> use Text::Wrap qw(wrap);
> $Text::Wrap::columns = 72;	# wrap the sequence
>
> use File::Basename;
> my ($window_width,
>     $bad_bases_threshold,
>     $quality_threshold,
>     @ARGV) = @ARGV;
>
> my $abif = Bio::Trace::ABIF->new();
>
> sub main {} {
>   foreach (@ARGV) {
>     $abif->open_abif($_) or die "error opening $_ as ABIF";
>     my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
> 								   $bad_bases_threshold,
> 								   $quality_threshold
> 								  );
>     my $sample_score = $abif->sample_score(
> 					   $window_width,
> 					   $bad_bases_threshold,
> 					   $quality_threshold
> 					  );
>     #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
>     #							       $quality_threshold,
>     #							       0, # ==> trim_ends
>     #							      );
>     #    my $length_of_read = $abif->length_of_read(
>     #				    $window_width,
>     #				    $quality_threshold,
>     #				    # $method
>     #				   );
>     my $defline = 
>       join "\t", 
> 	$abif->sample_name,
> 	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
> 	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
> 	  (map {my $method = $_;
> 		"$method=". ($abif->$method() || '')}
> 	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
> 	     # sample_tracking_id - don't use this - it is internal to ABI software
> 	     "clear_range_start=$clear_range_start",
> 	       "clear_range_stop=$clear_range_stop",
> 		 "sample_score=$sample_score",
> 		   #"contiguous_read_length=$contiguous_read_length",
> 		   #"length_of_read=$length_of_read",
> 		   ;
>     if ($clear_range_start == -1) {
>       warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
>       next;
>     }
>     my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
>     print ">$defline\n$seq\n";
>     $abif->close_abif();
>
>   }
> }
>
> main ();
>
>
>
>
>
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Charles Tilford
>> Sent: Thursday, June 18, 2009 8:39 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
>>
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
>> channels. 
>> Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out 
>> of chromatograms. The SCF files are being produced by phred 
>> using the "-cd" 
>> parameter. The traces come out great, and the corresponding 
>> base calls from the .phd files align with the peaks 
>> wonderfully when I visualize them on a rendered trace. 
>> However, only the A bases align to the appropriate trace 
>> channel, the rest are mixed up. I find that if I do the 
>> following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data 
>> to be in order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, 
>> the relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be 
>> ACGT (matching the SeqIO unpack() for bases, but not traces). 
>> My empirical channel unscrambling mapping implies order ACTG, 
>> which is different from either of the two orders above. The 
>> sequence from the SCF file (should be that from original AB1 
>> file, I think) is not perfectly identical to that called by 
>> phred, but is very similar (to be expected); that is, I don't 
>> need to remap C, G and T to get it to align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections 
>> of the packed trace data to the appropriate bases. The unpack 
>> order is different than the staden documentation ... but so 
>> is the order I impose to correct the problem. I am still 
>> unclear as to the differences between
>> V2 and V3 of the format. The major difference appears to be 
>> coding the trace absolutely (V2) or relatively to prior 
>> values (V3); I'd expect if I was using one format and SeqIO 
>> was trying to parse the other that I would get garbage out. 
>> Running in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a 
>> problem with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) 
>> install staden::read that I can work directly with the ab1 
>> files, but I need to stopgap on SCF for the time being....
>>
>> -CAT
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     

From cjfields at illinois.edu  Thu Jun 18 16:27:02 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 15:27:02 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A9D6D.2010106@bms.com>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
	<4A3A9D6D.2010106@bms.com>
Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu>


On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote:

> Cook, Malcolm wrote:
>> Charles,
>>
>> Another possible stopgap that might work for you, if you're working  
>> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to  
>> use Bio::Trace::ABIF
>>
>> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>>
>> It works great and includes implementation of ABIs algorithm  
>> allowing to (re)compute trace clear ranges using kc-basecallers  
>> quality scores and any windowing/quality parameters.
>>
>> Its not in the bioperl project but it is an easy install from CPAN.
>>
> Thanks - we installed that a few weeks ago, and it was on my list of  
> things to try, but I had not gotten to it yet since I was getting  
> data out of the SCF SeqIO module. Even though the SeqIO::scf data  
> looks ok, the fact that I need to unscramble it makes me nervous...  
> Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF  
> module and see if it works with our files.
>
> Thanks,
> CAT

You definitely shouldn't need to unscramble it; my guess is this is a  
legit bug that just has gone unnoticed.  I see that you have filed a  
ticket on it so we can at least track it.  Thanks!

chris

From scott at scottcain.net  Thu Jun 18 23:25:35 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:25:35 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A3A13D3.7050208@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>

Hi Xianjun,

The attached script (which is not too different from yours--I only did
a little clean up and made the padding consistent) makes the attached
image, which is what I think you want.  I'm using bioperl-live.

Scott


On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott,
>
> Do you mind to have a look of the code (below my signature) if I use the
> -postgrid callback correctly?
> I still cannnot get the background for the whole panel.
>
> Thanks
>
> Xianjun
>
>
> Xianjun Dong wrote:
>>
>> Hi, Scott
>>
>> Before I gave up my own whole solution to use GBrowse, I still want to
>> bother you once:
>>
>> As you suggested, I put -postgrid option when the panel, which will call a
>> function to draw the background. The code below is almost copied from the
>> online POD of Bio::Graphics::Panel (see
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>> )
>>
>> But it still does not work. Could you help to have a look? I paste it
>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>
>> THanks
>>
>> Xianjun
>>
>> ----------------------------------------------- mytestcode.pl
>> --------------------------
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 =
>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 =
>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 =
>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans ?=
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>
>> sub gap_it {
>> ? ?my $gd ? ?= shift;
>> ? ?my $panel = shift;
>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>> }
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> #$panel->add_track([$trans41,$trans31],
>> # ? ? ? ? ?-glyph ? => 'background',
>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>> # ? ? ? ? ? ? ? ? ?);
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>> ? ? ? ? ? ? ? ? -double=>1,
>> ? ? ? ? ? ? ? ? -tick=>2);
>>
>> $panel->add_track($trans,
>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -title => '$source',
>> ? ? ? ? ? ? ? ? -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>> ? ? ? ? ? ? ? ? );
>> ?print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Scott Cain wrote:
>>>
>>> Hi Xianjun,
>>>
>>> I understand what you want to do, as the current version of gbrowse
>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>> I can't tell you exactly how this works and you didn't send your code
>>> that uses this callback, so I can't try it either.
>>>
>>> One thing that is different between your code and gbrowse is that each
>>> of the tracks is actually a seperate panel (to allow track dragging),
>>> so it possible that this sort of callback doesn't work for
>>> Bio::Graphics any more.
>>>
>>> Scott
>>>
>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>> wrote:
>>>
>>>>
>>>> Hi, Scott
>>>>
>>>> Thanks for your reply first.
>>>>
>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>> hilite_regions_closure function to draw them out, using the following GD
>>>> function:
>>>>
>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>
>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>> pad_bottom), we can see this in the code of
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>
>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>
>>>> OK. I might have not explained my question explicitly. My question is:
>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>> where the highlight range will go from the roof to the floor. While in
>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>> difference of the two images.
>>>>
>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>> in the following links:
>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>> test.bioperl1.2.3.png:
>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>
>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>> your computer?
>>>>
>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>> might be a bug at that version, or whatever)
>>>>
>>>> Thanks
>>>>
>>>> Xianjun
>>>> =============================================
>>>>
>>>> # this generates the callback for highlighting a region
>>>> sub make_postgrid_callback {
>>>> ?my $settings = shift;
>>>> ?return unless ref $settings->{h_region};
>>>>
>>>> ?my @h_regions = map {
>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>> ? ? ? ? ? ? ? ?: ()
>>>> ?}
>>>> ? @{$settings->{h_region}};
>>>>
>>>> ?return unless @h_regions;
>>>> ?return hilite_regions_closure(@h_regions);
>>>> }
>>>>
>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>> # suitable for hilighting a region of a panel.
>>>> # The args are a list of [start,end,color]
>>>> sub hilite_regions_closure {
>>>> ?my @h_regions = @_;
>>>>
>>>> ?return sub {
>>>> ? my $gd ? ? = shift;
>>>> ? my $panel ?= shift;
>>>> ? my $left ? = $panel->pad_left;
>>>> ? my $top ? ?= $panel->top;
>>>> ? my $bottom = $panel->bottom;
>>>> ? for my $r (@h_regions) {
>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>> something
>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>> ? }
>>>> ?};
>>>> }
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>> Hello Xianjun,
>>>>
>>>> I don't think that approach will work. ?What you almost certainly need
>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>> region. ?For example code of how to do this, take a look at the
>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>
>>>> HI,
>>>>
>>>> I am not sure this is the right place I can get help.
>>>>
>>>> I've suffered by a problem for several days: I want to highlight parts
>>>> of
>>>> regions in my track, using a different background color. To do that, I
>>>> defined a glyph named "background", based on the
>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>> method, by adding code like below:
>>>>
>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>>
>>>> # the script is pasted at the end
>>>>
>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>> highlight regions into a list of features, and add_track with
>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>> works
>>>> as I expect, which will add a colored block at background of all tracks
>>>> in a
>>>> panel (including the ruler arrow). You can see the output image in
>>>> attached
>>>> file "test.bioperl1.2.3.png"
>>>>
>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>> not
>>>> work. Well, it works, but the highlight part only shrink to a low
>>>> height,
>>>> instead of covering all tracks in the panel. I also attached the output
>>>> here, see the file "test.bioperl1.6.png".
>>>>
>>>> I tried to think about the reason, the 'background' module is based on
>>>> the
>>>> generic module. What can cause the difference? Is it because $gd->height
>>>> is
>>>> different, or the tracks followed with 'background' track can not draw
>>>> from
>>>> the first position?
>>>>
>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>> person
>>>> solve problem, wise person avoid problem"...) But another problem is
>>>> coming:
>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>> function, which means I have to use some higher version if I want to
>>>> create
>>>> web map for my graphics, but then I have to give up using highlight
>>>> background.
>>>>
>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>> throw me some clue.
>>>>
>>>> Thanks ahead!!
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> ==================== test.pl =======================
>>>> #!/usr/bin/perl
>>>>
>>>> use strict;
>>>> use lib "$ENV{HOME}/lib";
>>>>
>>>> use Bio::Graphics;
>>>> use Bio::Graphics::Feature;
>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>
>>>> # processed_transcript
>>>> my $trans1 =
>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>> my $trans2 =
>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>> my $trans3 =
>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans4 =
>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans5 =
>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>> my $trans ?=
>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>
>>>> # hightlight
>>>> my $trans31 =
>>>>
>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>> -source=>'a');
>>>> my $trans41 =
>>>>
>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>> -source=>'b');
>>>>
>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>
>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>> 1.5
>>>> and 1.6
>>>> $panel->add_track([$trans41,$trans31],
>>>> ? ? ? ?-glyph ? => 'background',
>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>> 'a')?'#cccccc':'#fffc22'},
>>>> ? ? ? ? ? ? ? ?);
>>>>
>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>
>>>> $panel->add_track($trans,
>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>> ? ? ? ? ? ? ? ?-link =>
>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>> ?#EnsEMBL
>>>> ? ? ? ? ? ? ? ?);
>>>> ?print $panel->png;
>>>>
>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>> Bioperl
>>>> 1.2.3
>>>> my $map = $panel->create_web_map("image");
>>>> $panel->finished();
>>>>
>>>> 1;
>>>>
>>>> ==================== background.pm =======================
>>>> package Bio::Graphics::Glyph::background;
>>>>
>>>> use strict;
>>>> use base 'Bio::Graphics::Glyph::generic';
>>>> sub pad_top{
>>>> ?return 0;
>>>> }
>>>>
>>>> sub draw_component {
>>>> ?my $self = shift;
>>>> ?#$self->SUPER::draw_component(@_);
>>>> ?my ($gd,$dx,$dy) = @_;
>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>
>>>> ?# draw an arrow to indicate the direction of transcript
>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>> }
>>>>
>>>> 1;
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>>
>>>
>>>
>>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid.pl
Type: application/x-perl
Size: 2140 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid_highlight.png
Type: image/png
Size: 7195 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment.png>

From scott at scottcain.net  Thu Jun 18 23:30:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:30:37 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
	<4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com>

Actually, to be clear, that's bioperl-live and Bio::Graphics version
1.96 from CPAN.

On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain<scott at scottcain.net> wrote:
> Hi Xianjun,
>
> The attached script (which is not too different from yours--I only did
> a little clean up and made the padding consistent) makes the attached
> image, which is what I think you want. ?I'm using bioperl-live.
>
> Scott
>
>
> On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>> Hi, Scott,
>>
>> Do you mind to have a look of the code (below my signature) if I use the
>> -postgrid callback correctly?
>> I still cannnot get the background for the whole panel.
>>
>> Thanks
>>
>> Xianjun
>>
>>
>> Xianjun Dong wrote:
>>>
>>> Hi, Scott
>>>
>>> Before I gave up my own whole solution to use GBrowse, I still want to
>>> bother you once:
>>>
>>> As you suggested, I put -postgrid option when the panel, which will call a
>>> function to draw the background. The code below is almost copied from the
>>> online POD of Bio::Graphics::Panel (see
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>>> )
>>>
>>> But it still does not work. Could you help to have a look? I paste it
>>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>>
>>> THanks
>>>
>>> Xianjun
>>>
>>> ----------------------------------------------- mytestcode.pl
>>> --------------------------
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 =
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 =
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 =
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans ?=
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>>
>>> sub gap_it {
>>> ? ?my $gd ? ?= shift;
>>> ? ?my $panel = shift;
>>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>>> }
>>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>>> and 1.6
>>> #$panel->add_track([$trans41,$trans31],
>>> # ? ? ? ? ?-glyph ? => 'background',
>>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>> # ? ? ? ? ? ? ? ? ?);
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>>> ? ? ? ? ? ? ? ? -double=>1,
>>> ? ? ? ? ? ? ? ? -tick=>2);
>>>
>>> $panel->add_track($trans,
>>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -title => '$source',
>>> ? ? ? ? ? ? ? ? -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>>> ? ? ? ? ? ? ? ? );
>>> ?print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Scott Cain wrote:
>>>>
>>>> Hi Xianjun,
>>>>
>>>> I understand what you want to do, as the current version of gbrowse
>>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>>> I can't tell you exactly how this works and you didn't send your code
>>>> that uses this callback, so I can't try it either.
>>>>
>>>> One thing that is different between your code and gbrowse is that each
>>>> of the tracks is actually a seperate panel (to allow track dragging),
>>>> so it possible that this sort of callback doesn't work for
>>>> Bio::Graphics any more.
>>>>
>>>> Scott
>>>>
>>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi, Scott
>>>>>
>>>>> Thanks for your reply first.
>>>>>
>>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>>> hilite_regions_closure function to draw them out, using the following GD
>>>>> function:
>>>>>
>>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>>
>>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>>> pad_bottom), we can see this in the code of
>>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>>
>>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>>
>>>>> OK. I might have not explained my question explicitly. My question is:
>>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>>> where the highlight range will go from the roof to the floor. While in
>>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>>> difference of the two images.
>>>>>
>>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>>> in the following links:
>>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>>> test.bioperl1.2.3.png:
>>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>>
>>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>>> your computer?
>>>>>
>>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>>> might be a bug at that version, or whatever)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Xianjun
>>>>> =============================================
>>>>>
>>>>> # this generates the callback for highlighting a region
>>>>> sub make_postgrid_callback {
>>>>> ?my $settings = shift;
>>>>> ?return unless ref $settings->{h_region};
>>>>>
>>>>> ?my @h_regions = map {
>>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>>> ? ? ? ? ? ? ? ?: ()
>>>>> ?}
>>>>> ? @{$settings->{h_region}};
>>>>>
>>>>> ?return unless @h_regions;
>>>>> ?return hilite_regions_closure(@h_regions);
>>>>> }
>>>>>
>>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>>> # suitable for hilighting a region of a panel.
>>>>> # The args are a list of [start,end,color]
>>>>> sub hilite_regions_closure {
>>>>> ?my @h_regions = @_;
>>>>>
>>>>> ?return sub {
>>>>> ? my $gd ? ? = shift;
>>>>> ? my $panel ?= shift;
>>>>> ? my $left ? = $panel->pad_left;
>>>>> ? my $top ? ?= $panel->top;
>>>>> ? my $bottom = $panel->bottom;
>>>>> ? for my $r (@h_regions) {
>>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>>> something
>>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>> ? }
>>>>> ?};
>>>>> }
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>> Hello Xianjun,
>>>>>
>>>>> I don't think that approach will work. ?What you almost certainly need
>>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>>> region. ?For example code of how to do this, take a look at the
>>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>>> wrote:
>>>>>
>>>>>
>>>>> HI,
>>>>>
>>>>> I am not sure this is the right place I can get help.
>>>>>
>>>>> I've suffered by a problem for several days: I want to highlight parts
>>>>> of
>>>>> regions in my track, using a different background color. To do that, I
>>>>> defined a glyph named "background", based on the
>>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>>> method, by adding code like below:
>>>>>
>>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>>
>>>>> # the script is pasted at the end
>>>>>
>>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>>> highlight regions into a list of features, and add_track with
>>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>>> works
>>>>> as I expect, which will add a colored block at background of all tracks
>>>>> in a
>>>>> panel (including the ruler arrow). You can see the output image in
>>>>> attached
>>>>> file "test.bioperl1.2.3.png"
>>>>>
>>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>>> not
>>>>> work. Well, it works, but the highlight part only shrink to a low
>>>>> height,
>>>>> instead of covering all tracks in the panel. I also attached the output
>>>>> here, see the file "test.bioperl1.6.png".
>>>>>
>>>>> I tried to think about the reason, the 'background' module is based on
>>>>> the
>>>>> generic module. What can cause the difference? Is it because $gd->height
>>>>> is
>>>>> different, or the tracks followed with 'background' track can not draw
>>>>> from
>>>>> the first position?
>>>>>
>>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>>> person
>>>>> solve problem, wise person avoid problem"...) But another problem is
>>>>> coming:
>>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>>> function, which means I have to use some higher version if I want to
>>>>> create
>>>>> web map for my graphics, but then I have to give up using highlight
>>>>> background.
>>>>>
>>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>>> throw me some clue.
>>>>>
>>>>> Thanks ahead!!
>>>>>
>>>>> Xianjun
>>>>>
>>>>>
>>>>> ==================== test.pl =======================
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use strict;
>>>>> use lib "$ENV{HOME}/lib";
>>>>>
>>>>> use Bio::Graphics;
>>>>> use Bio::Graphics::Feature;
>>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>>
>>>>> # processed_transcript
>>>>> my $trans1 =
>>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>>> my $trans2 =
>>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>>> my $trans3 =
>>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans4 =
>>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans5 =
>>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>>> my $trans ?=
>>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>>
>>>>> # hightlight
>>>>> my $trans31 =
>>>>>
>>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>>> -source=>'a');
>>>>> my $trans41 =
>>>>>
>>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>>> -source=>'b');
>>>>>
>>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>>
>>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>>> 1.5
>>>>> and 1.6
>>>>> $panel->add_track([$trans41,$trans31],
>>>>> ? ? ? ?-glyph ? => 'background',
>>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>>> 'a')?'#cccccc':'#fffc22'},
>>>>> ? ? ? ? ? ? ? ?);
>>>>>
>>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>>
>>>>> $panel->add_track($trans,
>>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>>> ? ? ? ? ? ? ? ?-link =>
>>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>>> ?#EnsEMBL
>>>>> ? ? ? ? ? ? ? ?);
>>>>> ?print $panel->png;
>>>>>
>>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>>> Bioperl
>>>>> 1.2.3
>>>>> my $map = $panel->create_web_map("image");
>>>>> $panel->finished();
>>>>>
>>>>> 1;
>>>>>
>>>>> ==================== background.pm =======================
>>>>> package Bio::Graphics::Glyph::background;
>>>>>
>>>>> use strict;
>>>>> use base 'Bio::Graphics::Glyph::generic';
>>>>> sub pad_top{
>>>>> ?return 0;
>>>>> }
>>>>>
>>>>> sub draw_component {
>>>>> ?my $self = shift;
>>>>> ?#$self->SUPER::draw_component(@_);
>>>>> ?my ($gd,$dx,$dy) = @_;
>>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>>
>>>>> ?# draw an arrow to indicate the direction of transcript
>>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>> }
>>>>>
>>>>> 1;
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087
> Ontario Institute for Cancer Research
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From roy.chaudhuri at gmail.com  Fri Jun 19 06:34:24 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 19 Jun 2009 11:34:24 +0100
Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file
In-Reply-To: <24095355.post@talk.nabble.com>
References: <24095355.post@talk.nabble.com>
Message-ID: <4A3B69B0.8080305@gmail.com>

Hi Cesar,

I can replicate this using an old Bioperl (version 1.5.2), but it 
appears to be fixed in version 1.6 and bioperl-live - the 
scientific_name method returns "Bacillus anthracis str. Sterne".

Hope this helps.
Roy.

Cesar Arze wrote:
> Hi all,
>    I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
> 
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
> 
>   ORGANISM  Bacillus anthracis str. Sterne
>             Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
>             cereus group.
> 
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
> 
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.


From cjfields at illinois.edu  Fri Jun 19 16:57:36 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 19 Jun 2009 15:57:36 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
Message-ID: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>

So, to follow up (and make sure we don't have any overlapping tuits)  
we should probably determine who wants to work on what (i.e. fastq  
updating, etc). I think it's possible to quickly add in Solexa/ 
Illumina/Sanger fastq similar to BioPython, just don't want to step on  
anyone's toes if they are halfway through doing this.

chris

On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:

> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for  
>>> specific memory/CPU intenstive things we also switch to using dbs.  
>>> For example, after many years of loyal use of disconnected_ranges  
>>> we switched to a simple SQL implementation of it, because of the  
>>> large performance gains it would give us.  Similarly in Ensembl as  
>>> well as in the old days of bioperl-db we opted for doing subseq  
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less  
>>> "disruptive" than avoiding object creation and provide significant  
>>> gains in performance. Could be set as an optional flag, and could  
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those  
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>> significant set of users out there who are dealing with next-gen  
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who  
>>>> would like to integrate nextgen into bioperl somehow, probably to  
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>> Would it be possible to have an ultra-light quality object with  
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>> return the data directly. At that point it's not taking much  
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up  
>>>> with a set of 'use cases' specifying what users want so we can  
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence  
>>>> database, something similar in theme to Lincoln's  
>>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Sat Jun 20 04:46:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 20 Jun 2009 09:46:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations? ?Our version (I believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).
>> Internally we have three separate FASTQ parsers/writers although
>> they do share code.
>
> We could easily do the same if others agree. ?Actually, if we specified that
> shorthand for a variant on a format would be designated as -format =>
> 'format-variant', I think we could easily hack SeqIO to deal with that by
> splitting on '-' and passing everything to the constructor as (-format =>
> 'format', -variant => 'variant'). ?Very little repeated code in this case,
> just an additional named parameter indicating the format variant (and the
> SeqIO class can do the type checking on that within the constructor).

Yes, when I started using names like "fastq-solexa" I did have in mind
"main-variant" naming convention, and potentially Biopython may one
day actually use this structure when allocating a Bio.SeqIO job to the
appropriate parser or writer.

For now, the Biopython list of formats is fairly short (and there are
relatively few of these sub-formats) so to keep things simple we just
have a flat mapping from the format name (e.g. "fasta", "fastq",
"fastq-solexa") to the parser/write code.

Peter


From e.stupka at ucl.ac.uk  Sat Jun 20 16:12:18 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Sat, 20 Jun 2009 21:12:18 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
	<E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
Message-ID: <F99E2F7F-05F7-462B-A3ED-96E09746994B@ucl.ac.uk>

Hi Chris,

I agree. I have not written a single line of code so far, while Heikki  
has some (but has been silent for a while) and you have perhaps some  
code ready to roll. I am happy to help where needed, just let me know  
what you'd like me to focus on. If you want to go ahead and implement  
the fastq staff discussed I can focus on bioperl-run.

cheers

Elia


On 19 Jun 2009, at 21:57, Chris Fields wrote:

> So, to follow up (and make sure we don't have any overlapping tuits)  
> we should probably determine who wants to work on what (i.e. fastq  
> updating, etc). I think it's possible to quickly add in Solexa/ 
> Illumina/Sanger fastq similar to BioPython, just don't want to step  
> on anyone's toes if they are halfway through doing this.
>
> chris
>
> On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
>
>> Better than colorspaced discussions for sure ;)
>>
>> Elia
>>
>> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>>
>>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>>> other options.
>>>
>>> Illuminating discussion, thanks Elia!
>>>
>>> urgh, excuse unintended bad pun above...
>>>
>>> chris
>>>
>>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>>
>>>> Interesting that you mention the database issue. We found that  
>>>> for specific memory/CPU intenstive things we also switch to using  
>>>> dbs. For example, after many years of loyal use of  
>>>> disconnected_ranges we switched to a simple SQL implementation of  
>>>> it, because of the large performance gains it would give us.   
>>>> Similarly in Ensembl as well as in the old days of bioperl-db we  
>>>> opted for doing subseq within SQL where possible.
>>>>
>>>> Some lean way of SQL'izing specific components could be less  
>>>> "disruptive" than avoiding object creation and provide  
>>>> significant gains in performance. Could be set as an optional  
>>>> flag, and could use temporary ad hoc SQL databases?
>>>>
>>>> Still, priority now is to make SeqIO compliant with all those  
>>>> formats, than we can worry about performance :)
>>>>
>>>> Elia
>>>>
>>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>>
>>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>>
>>>>>> Tristan Lefebure wrote:
>>>>>>> Hello,
>>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>>> experience, another issue is bioperl speed. For example, if  
>>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>>> shortcuts...).
>>>>>>
>>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>>> significant set of users out there who are dealing with next- 
>>>>>> gen sequencing and would consider using BioPerl for their work?
>>>>>>
>>>>>> I'm working with all the 1000-genomes data at the Sanger, and  
>>>>>> we at least are probably never going to use BioPerl for the work.
>>>>>
>>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>>
>>>>> Judging by the feedback there are definitely a set of users who  
>>>>> would like to integrate nextgen into bioperl somehow, probably  
>>>>> to take advantage of other aspects of bioperl.
>>>>>
>>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>>> Would it be possible to have an ultra-light quality object  
>>>>>>> with few simple methods for next-gen reads?
>>>>>>
>>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>>> return the data directly. At that point it's not taking much  
>>>>>> advantage of BioPerl. But certainly it could be done...
>>>>>
>>>>>
>>>>> I suppose the best way to assess what needs to be done is come  
>>>>> up with a set of 'use cases' specifying what users want so we  
>>>>> can design around them, otherwise we're shooting in the dark.
>>>>>
>>>>> I'm personally wondering if this could be done as a sequence  
>>>>> database, something similar in theme to Lincoln's  
>>>>> SeqFeature::Store, but sequence only, and returns quality  
>>>>> objects in a similar manner (ala Storable)?  Not sure whether  
>>>>> that's feasible, but it's appears at least scalable.
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> ---
>>>> Senior Lecturer, Bioinformatics
>>>> UCL Cancer Institute
>>>> Paul O' Gorman Building
>>>> University College London
>>>> Gower Street
>>>> WC1E 6BT
>>>> London
>>>> UK
>>>>
>>>> Office (UCL): +44 207 679 6493
>>>> Office (ICMS): +44 0207 8822374
>>>>
>>>> Mobile: +44 7597 566 194
>>>> Mobile (Italy): +39 338 8448801
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From lincoln.stein at gmail.com  Sat Jun 20 17:01:43 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Sat, 20 Jun 2009 17:01:43 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <6dce9a0b0906201401j40175dbdscd71360396fe9f7a@mail.gmail.com>

Hi All,

Apropos of this, I am about to release to CPAN a BioPerl interface to SAM
and BAM files. The documentation is still in progress, but you can get CVS
access here:

% cvs -d :pserver:anonymous at gmod.cvs.sourceforge.net:/cvsroot/gmod co
gbrowse-adaptors/Bio-SamTools

Lincoln

On Wed, Jun 17, 2009 at 7:29 AM, Elia Stupka <e.stupka at ucl.ac.uk> wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>

From hartzell at alerce.com  Mon Jun 22 09:18:20 2009
From: hartzell at alerce.com (George Hartzell)
Date: Mon, 22 Jun 2009 06:18:20 -0700
Subject: [Bioperl-l] Anyone at YAPC?
Message-ID: <19007.33948.411442.197063@already.dhcp.gene.com>


I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.

g.


From cjfields1 at gmail.com  Mon Jun 22 10:05:56 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Mon, 22 Jun 2009 09:05:56 -0500
Subject: [Bioperl-l] changing parameters in Bio::Tools::Run::RemoteBlast
In-Reply-To: <F52FFB80A7304749B467C46E10A2869D@jonas>
References: <F52FFB80A7304749B467C46E10A2869D@jonas>
Message-ID: <67ABC7E3-216E-4F5A-B18E-A775A6B4D8F7@gmail.com>

Jonas,

The best place to send questions is to the mail list (which I've  
cc'd).  If you reply make sure to keep the mail list in the reply-to.

There are two ways to set the parameters you want.  I'll show you what  
I consider the best, but I have no way to test it ATM.

$factory->submit_parameter($foo => 'bar')

is the syntax for setting PUT parameters.  Sad to see they didn't  
provide you with the exact PUT parameter names (as follows):

Max target sequences = 100 # MAX_NUM_SEQ
Expect threshold = 10  # EXPECT
Gap Costs = Existence 11 Extension 1   # GAPCOSTS
Compositional adjustments = Conditional compositional score matrix  
adjustment # COMPOSITION_BASED_STATISTICS

'Compositional adjustments' is as follows (from command-line blastall):

   -C  Use composition-based score adjustments for blastp or tblastn:
       As first character:
       D or d: default (equivalent to T)
       0 or F or f: no composition-based statistics
       2 or T or t: Composition-based score adjustments as in  
Bioinformatics 21:902-911,
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
           2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics  
21:902-911,
           2005, unconditionally
       For programs other than tblastn, must either be absent or be D,  
F or 0.
            As second character, if first character is equivalent to  
1, 2, or 3:

After the factory line and prior to the BLAST call you can add in the  
following (completely untested, excuse any possible mistakes) code:

my %put = (
    MAX_NUM_SEQ => 100,
    EXPECT      => 10,
    GAPCOSTS    => '11 1',
    COMPOSITION_BASED_STATISTICS => 2 # could be 1 as well
);

for my $putName (keys %put) {
    $self->submit_parameter($putName,$put{$putName});
}


chris

On Jun 22, 2009, at 8:14 AM, Jonas Schaer wrote:

> Hi there,
> I hope it's OK to ask you a question about the bio perl module   
> Bio::Tools::Run::RemoteBlast.
> My problem is, that I get different results using this perl-skript:
>
> #######################################################################################################################################################################################
>  use Bio::Seq::SeqFactory;
>  use Bio::Tools::Run::RemoteBlast;
>  use strict;
>  my @blast_report;
>  my $prog = 'blastp';
>  my $db   = 'nr';
>  my $e_val= '1e-10';
>  my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>  #my $input = @_;
>  my  
> $ 
> blast_seq 
> = 
> 'MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE 
> ';
>  #$v is just to turn on and off the messages
>  my $v = 1;
>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
> 'Bio::PrimarySeq');
>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>  
> "$blast_seq");
>  my $filename='temp2.out';
>  my $r = $factory->submit_blast($seq);
>  print STDERR "waiting..." if( $v > 0 );
>    while ( my @rids = $factory->each_rid )
>    {
>        foreach my $rid ( @rids )
>        {
>            my $rc = $factory->retrieve_blast($rid);
>            if( !ref($rc) )
>            {
>                if( $rc < 0 )
>                {
>                    $factory->remove_rid($rid);
>                }
>                print STDERR "." if ( $v > 0 );
>            }
>                else
>                {
>                    my $result = $rc->next_result();
>                    $factory->save_output($filename);
>                    $factory->remove_rid($rid);
>                    print "\nQuery Name: ", $result->query_name(),  
> "\n";
>                    while ( my $hit = $result->next_hit )
>                    {
>                        next unless ( $v > 0);
>                        print "\thit name is ", $hit->name, "\n";
>                        while( my $hsp = $hit->next_hsp )
>                        {
>                            print "\t\tscore is ", $hsp->score, "\n";
>                        }
>                    }
>                }
>        }
>
>
>    }
> @blast_report = get_file_data ($filename);
> return @blast_report;
>
>
> sub get_file_data
> {
>    use strict;
>    my($filename) = @_;
>    use strict;
>    use warnings;
>    # Initialize variables
>    my @filedata = ( );
>    unless( open(GET_FILE_DATA, $filename) )
>    {
>        print STDERR "Cannot open file \"$filename\"\n\n";
>        exit;
>    }
>    @filedata = <GET_FILE_DATA>;
>    close GET_FILE_DATA;
>    print @filedata;
>    return @filedata;
> }
>
> #######################################################################################################################################################################################
>
> ... and the blastp on the ncbi-homepage. The people from NCBI wrote  
> me that I have to change some parameters:
> ""
> You need to have the following:
>
>
> Max target sequences = 100
> Expect threshold = 10
> Gap Costs = Existence 11 Extension 1
> Compositional adjustments = Conditional compositional score matrix  
> adjustment""
>
> Could you please tell me exactly how to change this parameters  
> within my perl-skript? I think I have to use the "put" command, but  
> I just cannot find out, how...
>
> Regards and thank you so much in advance :),
>
> Jonas Schaer


From biopython at maubp.freeserve.co.uk  Mon Jun 22 10:24:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Jun 2009 15:24:55 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
> Peter wrote:
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional repeated
>> title is missing on the "+" lines (as discussed earlier on the BioPerl
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if that's
> currently the case. ?I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes - especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description (as
>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>> for examples, e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris

Another couple of points that I should have remembered earlier,
related to converting between PHRED scores and Solexa scores.
On the bright side, with Illumina abandoning the Solexa scores
in pipeline 1.3+, these issues will go away with time:

(7) If BioPerl will be converting Solexa scores to/from PHRED
scores as integers automatically (as discussed earlier), make
sure you round to the nearest whole number (don't just truncate
with a call to int!). MAQ does this by adding 0.5 before calling
int (while in Biopython I just use Python's round function).

(8) When asked to write out an old Solexa style FASTQ file,
what will you do if given a standard Sanger FASTQ file (or a
new Illumina 1.3+ FASTQ file) containing a base with PHRED
quality zero? This maps to a Solexa quality of minus infinity...
Right now the development version of Biopython will throw an
error in this situation, but mapping to the lowest observed
Solexa score might be reasonable.

Peter


From cjfields at illinois.edu  Mon Jun 22 09:54:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 08:54:22 -0500
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <19007.33948.411442.197063@already.dhcp.gene.com>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
Message-ID: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>

I think some of the regular #bioperl folk are there (Jay Hannah, R.  
Buels, etc).  May be worth going on IRC to find everyone.

I'm giving serious thought to going next year if I can get enough work  
done towards a perl6 or Moose-based bioperl.

chris

On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:

>
> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>
> g.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vofford at rvc.ac.uk  Mon Jun 22 12:10:43 2009
From: vofford at rvc.ac.uk (Offord, Victoria)
Date: Mon, 22 Jun 2009 17:10:43 +0100
Subject: [Bioperl-l] Clustalw
Message-ID: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>

Hi,

 
Can anyone help and tell me where I am going wrong please J 

I am getting this error from the following script:

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
-output=gcg   -matrix=BLOSUM -ktuple=2
-outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
file or directory

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357

STACK: Bio::Tools::Run::Alignment::Clustalw::_run
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756

STACK: Bio::Tools::Run::Alignment::Clustalw::align
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515

STACK: tester.pl:25

-----------------------------------------------------------

 
#--------------------------------------------SCRIPT---------------------
--------------------------#

#!/usr/bin/perl -w

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';

use Bio::Seq;

 
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');

 my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);

 
my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";

my $b =
"NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";

my $seq1 = Bio::Seq->new ( -seq  => $a,

                           -id   => 'real',

                           -desc => 'this is a real Seq');

 my $seq2 = Bio::Seq->new ( -seq  => $b,

                           -id   => 'test',

                           -desc => 'this is a test Seq');


my @seq_array = ($seq1,$seq2);

 
my $seq_array_ref = \@seq_array;

my $aln = $factory->align($seq_array_ref);

 
From Kevin.M.Brown at asu.edu  Mon Jun 22 12:48:27 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 22 Jun 2009 09:48:27 -0700
Subject: [Bioperl-l] Clustalw
In-Reply-To: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
References: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9BAF@EX02.asurite.ad.asu.edu>

Do you have ClustalW installed and in your path? 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Offord, Victoria
> Sent: Monday, June 22, 2009 9:11 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Clustalw
> 
> Hi,
> 
>  
> 
> Can anyone help and tell me where I am going wrong please J 
> 
> I am getting this error from the following script:
> 
>  
> 
>  
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
> -output=gcg   -matrix=BLOSUM -ktuple=2
> -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
> file or directory
> 
> STACK: Error::throw
> 
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::_run
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::align
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515
> 
> STACK: tester.pl:25
> 
> -----------------------------------------------------------
> 
>  
> 
>  
> 
>  
> 
>  
> 
> #--------------------------------------------SCRIPT-----------
> ----------
> --------------------------#
> 
> #!/usr/bin/perl -w
> 
> use Bio::Tools::Run::Alignment::Clustalw;
> 
> $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';
> 
> use Bio::Seq;
> 
>  
> 
>  my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
> 
>  my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
> 
>  
> 
> my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";
> 
> my $b =
> "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";
> 
> my $seq1 = Bio::Seq->new ( -seq  => $a,
> 
>                            -id   => 'real',
> 
>                            -desc => 'this is a real Seq');
> 
>  my $seq2 = Bio::Seq->new ( -seq  => $b,
> 
>                            -id   => 'test',
> 
>                            -desc => 'this is a test Seq');
> 
> 
>                            
> 
> my @seq_array = ($seq1,$seq2);
> 
>  
> 
> my $seq_array_ref = \@seq_array;
> 
> my $aln = $factory->align($seq_array_ref);
> 
>  
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jun 22 15:20:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 14:20:14 -0500
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
	<6DF025D32D664F61BC64B49184A2E6DD@NewLife>
Message-ID: <4766E259-B184-4552-817E-FBBB3A71A17F@illinois.edu>

On Jun 17, 2009, at 11:47 AM, Mark A. Jensen wrote:

> Hi All,
> I thought I'd revisit this thread, since in the last couple weeks,
> have used both techniques (bioperl-dev and branch from trunk) to
> produce completed projects. My thoughts:
>
> Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
> new addition to the core api. There was no pressure to conform to the
> existing api there. In particular, there was no implicit insistence to
> make things work through Bio::Search::Utils, and I was free to factor
> it out. The Tiling api was definitely unstable until the end, when it
> was ported to the core. As I made regular reports to bioperl-l,
> everything was transparent and up front, and I received excellent
> suggestions there (as usual).
> For Bio::Restriction, using the branch was just as natural. Here, the
> existing structure was well established, and all the work needed to
> happen beneath the api. All old t/Restriction tests needed to pass,
> and additional ones created for the new functionality. So here, using
> bioperl-dev wasn't natural, even though some "experiments" needed to
> be tried (some succeeded and some failed, as you can see in the
> commentary at Bug #2855). Even though the new code turned out to
> require substantial effort, the effort was required to fix a true bug
> in the working core, and any fixes needed to work transparently with
> respect to the users for whom this bug had not been an issue. Using
> the branch made it relatively easy to merge quickly back into the core
> when done, and there is a certain psychological pressure too provided
> by an open branch which is helpful.
>
> Hilmar raised the very good point in the previous discussion that
> (essentially) bioperl-dev shouldn't become a sandbox with lots of
> unfinished code scraps and derelict stuff that doesn't work. My view
> is bioperl-dev will become a sandbox only if we treat it like
> one. I've filled out the Bioperl-dev page on the wiki
> (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
> some recognition to devs there whose modules become part of the
> core may be a better way to insure that projects that are started on
> bioperl-dev actually get finished, than to prescribe beforehand what
> kinds of projects may get started. I believe this follows the adage of
> liberality on what is accepted, and strictness on what is emitted.
>
> cheers, MAJ

The main reason I wanted a bioperl-dev is for some code or  
implementations that don't seem to fit on a branch or directly into  
core, but would definitely be of use.  The tendency in the past has  
been to accept anything that works into core (the 'bazaar' approach).   
Initially that worked well, but the long-term end result has become  
potentially unmaintainable code bloat.  Committing new code to a  
branch isn't a great idea either, primarily b/c the code may be lost  
to the branch if it isn't followed up and remerged into trunk.  And  
forcing the code to fit into bioperl (or vice versa, which happened  
re: Feature Annotation) isn't the best way either.

Like Hilmar, though, I don't want dev to become a (sandbox|code  
dumping ground) either, so I think some additional discussion is  
warranted if anyone else wants to chime in.

chris

From mauricio at open-bio.org  Mon Jun 22 15:56:33 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Mon, 22 Jun 2009 14:56:33 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <A53006055C854297AAA58F6650F4F867@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
Message-ID: <4A3FE1F1.40607@open-bio.org>

Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 
release and latest code from bioperl-live. Also added bioperl-dev and 
bioperl-pise to the list.

Cheers,
Mauricio.


Mark A. Jensen wrote:
> cheers Mauricio! MAJ
> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
> <mauricio at open-bio.org>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
> <bioperl-l at bioperl.org>
> Sent: Thursday, June 11, 2009 12:46 PM
> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
> 
> 
>> Hi Mark,
>>
>> I'll take a look into this sometime between today and tomorrow. Will 
>> keep you posted. Thanks for the heads up :)
>>
>> Mauricio.
>>
>>
>> Mark A. Jensen wrote:
>>> Hi Chris and list-
>>> Will documentation for release 1.6 be available in pdoc on 
>>> doc.bioperl.org?
>>> I notice also that autogenerated documentation for bioperl-live 
>>> doesn't contain
>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>> cheers, Mark
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> 
> 

From cjfields at illinois.edu  Mon Jun 22 16:29:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:29:46 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
Message-ID: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>

On Jun 22, 2009, at 9:24 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
>> Peter wrote:
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional  
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the  
>>> BioPerl
>>> list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if that's
>> currently the case.  I thought that was fixed but maybe not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -  
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description (as
>>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>>> for examples, e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>
> Another couple of points that I should have remembered earlier,
> related to converting between PHRED scores and Solexa scores.
> On the bright side, with Illumina abandoning the Solexa scores
> in pipeline 1.3+, these issues will go away with time:
>
> (7) If BioPerl will be converting Solexa scores to/from PHRED
> scores as integers automatically (as discussed earlier), make
> sure you round to the nearest whole number (don't just truncate
> with a call to int!). MAQ does this by adding 0.5 before calling
> int (while in Biopython I just use Python's round function).

That can probably be done with sprintf if needed.  It avoids a call to  
POSIX functions.

> (8) When asked to write out an old Solexa style FASTQ file,
> what will you do if given a standard Sanger FASTQ file (or a
> new Illumina 1.3+ FASTQ file) containing a base with PHRED
> quality zero? This maps to a Solexa quality of minus infinity...
> Right now the development version of Biopython will throw an
> error in this situation, but mapping to the lowest observed
> Solexa score might be reasonable.
>
> Peter

Maybe address with a warning followed by assigning to the lowest  
solexa score?

chris


From cjfields at illinois.edu  Mon Jun 22 16:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:27:32 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <D9414186-E1DD-47B5-A0CF-9B96CD8151F8@illinois.edu>

np.  Thanks Mauricio!

chris

On Jun 22, 2009, at 2:56 PM, Mauricio Herrera Cuadra wrote:

> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0  
> release and latest code from bioperl-live. Also added bioperl-dev  
> and bioperl-pise to the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org 
>> >
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" <bioperl-l at bioperl.org 
>> >
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow.  
>>> Will keep you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on  
>>>> doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live  
>>>> doesn't contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun 22 22:46:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 22:46:58 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <78130116A84C4D989F3BCC217E8C5ACE@NewLife>

Done-- fortinbras-public/bioperl-max-0.1.1 is at ami-b55dbbdc; rakudo cloned at 
00:44 UTC,
parrot @ r39729, bioperl-live @ 15800, nexml @ r1136.
cheers!
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  do you 
> have mysql or pg?
>
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  rakudo 
> and we could do some damage...
>
> chris
>
> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
>
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 22 23:22:48 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 23:22:48 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife><4A3134EB.4080702@open-bio.org><A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <8B93DCE168434F608620AF17CAF12A9F@NewLife>

awesome, MHC- cheers and thanks-MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Monday, June 22, 2009 3:56 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release 
> and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to 
> the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
>> <mauricio at open-bio.org>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
>> <bioperl-l at bioperl.org>
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>
>>
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow. Will keep 
>>> you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live doesn't 
>>>> contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From pmr at ebi.ac.uk  Tue Jun 23 07:00:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 12:00:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
Message-ID: <4A40B5D6.40504@ebi.ac.uk>

We just added FASTQ parsing to EMBOSS and faced the same issues.

Parsing was easy - find the '@' line, read sequence until the '+' line
is reached, then read (seqlen) quality characters ... and check the next
line starts with '@'

Quality scores are kept as phred values. Phred of 0 means unknown, which
in Solexa is -5 (0.75 error rate = could be anything). We assume lower
quality scores are from alignments rather than single reads.

We gave up on trying to guess the quality score standard and require
users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
format files. If we only want the sequence then we don't care so we allow
"fastq" as a sequence format and ignore the quality scores in that case.

We also allow the integer quality score format ... is anyone still using
that (it looks horrible to me :-)

Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.

Any further tips would be very useful.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Tue Jun 23 07:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 12:29:56 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40B5D6.40504@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
Message-ID: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>

On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> We just added FASTQ parsing to EMBOSS and faced the same issues.
>

I was going to chat to you about this at BOSC, and suggest this be
added to EMBOSS - but you are well ahead of me ;)

> Parsing was easy - find the '@' line, read sequence until the '+' line
> is reached, then read (seqlen) quality characters ... and check the next
> line starts with '@'

That is basically what I did for Biopython.

> Quality scores are kept as phred values. Phred of 0 means unknown,
> which in Solexa is -5 (0.75 error rate = could be anything).

A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
quite follow your leap that this corresponds to a Solexa quality of -5. Could
you clarify?

> We assume lower quality scores are from alignments rather than single reads.

Did you mean to say "higher quality scores" (i.e. lower probability of error),
e.g a PHRED score of 80 which you can get from MAQ doing read mapping
or something consensus based.

> We gave up on trying to guess the quality score standard and require
> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
> format files. If we only want the sequence then we don't care so we allow
> "fastq" as a sequence format and ignore the quality scores in that case.

What format names have you used? Ideally we'd have the same names
in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
"fastq-illumina").

> We also allow the integer quality score format ... is anyone still using
> that (it looks horrible to me :-)

Do you mean the QUAL file format holding PHRED scores? Roche provide
tools to turn their SFF files into FASTA and QUAL files, so they are still used.

> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.
>
> Any further tips would be very useful.

Great. See you at BOSC 2009!

Peter
(Biopython)

From pmr at ebi.ac.uk  Tue Jun 23 08:22:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 13:22:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <4A40C909.40803@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>
> 
> I was going to chat to you about this at BOSC, and suggest this be
> added to EMBOSS - but you are well ahead of me ;)

Not that well ahead really ... someone asked for it in our BoF at
BOSC/ISMB last year so we thought we'd better get it done before this
one. it was implemented a couple of days ago :-)

>> Parsing was easy - find the '@' line, read sequence until the '+' line
>> is reached, then read (seqlen) quality characters ... and check the next
>> line starts with '@'
> 
> That is basically what I did for Biopython.
> 
>> Quality scores are kept as phred values. Phred of 0 means unknown,
>> which in Solexa is -5 (0.75 error rate = could be anything).
> 
> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
> quite follow your leap that this corresponds to a Solexa quality of -5. Could
> you clarify?

Phred score is -10 log(p) where p is the probability of error. A phred
of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
(3/4 chance that any base you pick is wrong).

Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
why Solexa scores can go down to -5 in their fastq format.

>> We assume lower quality scores are from alignments rather than single reads.
> 
> Did you mean to say "higher quality scores" (i.e. lower probability of error),
> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
> or something consensus based.

Actually I mean both. Error probabilities below 0.75 for a single base
are silly, and error probabilities below 0.0001 make sense only when two
or more high quality bases are aligned.

>> We gave up on trying to guess the quality score standard and require
>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>> format files. If we only want the sequence then we don't care so we allow
>> "fastq" as a sequence format and ignore the quality scores in that case.
> 
> What format names have you used? Ideally we'd have the same names
> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
> "fastq-illumina").

We don't normally use '-' in our format names so we have fastqsanger,
fastqsolexa, fastqillumina and fastqint. None of these have been tried
on users as yet.

The '-' names look nice though. We can consider introducing them. Do you
have a full list of format names (sequence, feature, alignment, etc.) we
can try to conform to?

>> We also allow the integer quality score format ... is anyone still using
>> that (it looks horrible to me :-)
> 
> Do you mean the QUAL file format holding PHRED scores? Roche provide
> tools to turn their SFF files into FASTA and QUAL files, so they are still used.

Probably ... unless there is a Solexa version too.

regards,

Peter

From rmb32 at cornell.edu  Tue Jun 23 10:28:08 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 23 Jun 2009 07:28:08 -0700
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
	<FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
Message-ID: <4A40E678.8010709@cornell.edu>

Yep, YAPC is great!  This is my first one.  I saw a guy walking around 
here with a nametag that I thought said "Mark Jensen".  MAJ, are you here?

Rob

Chris Fields wrote:
> I think some of the regular #bioperl folk are there (Jay Hannah, R. 
> Buels, etc).  May be worth going on IRC to find everyone.
> 
> I'm giving serious thought to going next year if I can get enough work 
> done towards a perl6 or Moose-based bioperl.
> 
> chris
> 
> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
> 
>>
>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>
>> g.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu

From maj at fortinbras.us  Tue Jun 23 11:54:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 23 Jun 2009 11:54:24 -0400
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <4A40E678.8010709@cornell.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com><FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
	<4A40E678.8010709@cornell.edu>
Message-ID: <DD5C6FE6AC5842CEAA4487EEC65AC726@NewLife>

I think there are about 75000 of us; that one ain't me, I'm afraid. Maybe next 
year! cheers  MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "bioperl-l List" <bioperl-l at bioperl.org>
Sent: Tuesday, June 23, 2009 10:28 AM
Subject: Re: [Bioperl-l] Anyone at YAPC?


> Yep, YAPC is great!  This is my first one.  I saw a guy walking around here 
> with a nametag that I thought said "Mark Jensen".  MAJ, are you here?
>
> Rob
>
> Chris Fields wrote:
>> I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, 
>> etc).  May be worth going on IRC to find everyone.
>>
>> I'm giving serious thought to going next year if I can get enough work done 
>> towards a perl6 or Moose-based bioperl.
>>
>> chris
>>
>> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
>>
>>>
>>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>>
>>> g.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Tue Jun 23 16:34:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 15:34:48 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <21116F70-93A3-4539-9BE2-61C838BA730E@illinois.edu>


On Jun 23, 2009, at 7:22 AM, Peter Rice wrote:

> Peter wrote:
> ...
>>> Parsing was easy - find the '@' line, read sequence until the '+'  
>>> line
>>> is reached, then read (seqlen) quality characters ... and check  
>>> the next
>>> line starts with '@'
>>
>> That is basically what I did for Biopython.

This is now what bioperl will do (at least when I commit changes today  
or tomorrow).

> ...
>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so  
>>> we allow
>>> "fastq" as a sequence format and ignore the quality scores in that  
>>> case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do  
> you
> have a full list of format names (sequence, feature, alignment,  
> etc.) we
> can try to conform to?

We (bioperl) are using biopython's convention of format-variant, or at  
least that's how I'm coding it up.  With SeqIO it's fairly easy to  
check for the format variant prior to loading the class and pass it in  
as a second named parameter.

I have actually thought of adding in fastqint as an option (it would  
be fairly easy to do).

chris

From cjfields at illinois.edu  Tue Jun 23 17:04:25 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 16:04:25 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <49A4AD93-69FB-406E-8FFB-99C74A457402@illinois.edu>

Just so we're on the same page data-wise, would there be a common set  
of fastq data files to use for tests?  I am using some from SRA (which  
is all converted to Sanger).  Just need a few small ones for older  
solexa and newer illumina.

chris

On Jun 23, 2009, at 6:29 AM, Peter wrote:

> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July  
>> 15th.
>>
>> Any further tips would be very useful.
>
> Great. See you at BOSC 2009!
>
> Peter
> (Biopython)

From biopython at maubp.freeserve.co.uk  Tue Jun 23 17:39:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 22:39:48 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>

On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter wrote:
>> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>>
>>
>> I was going to chat to you about this at BOSC, and suggest this be
>> added to EMBOSS - but you are well ahead of me ;)
>
> Not that well ahead really ... someone asked for it in our BoF at
> BOSC/ISMB last year so we thought we'd better get it done before this
> one. it was implemented a couple of days ago :-)
>

Well, ahead of my asking!

>>> Quality scores are kept as phred values. Phred of 0 means unknown,
>>> which in Solexa is -5 (0.75 error rate = could be anything).
>>
>> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
>> quite follow your leap that this corresponds to a Solexa quality of -5. Could
>> you clarify?
>
> Phred score is -10 log(p) where p is the probability of error. A phred
> of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
> (3/4 chance that any base you pick is wrong).
>
> Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
> why Solexa scores can go down to -5 in their fastq format.
>
>>> We assume lower quality scores are from alignments rather than
>>> single reads.
>>
>> Did you mean to say "higher quality scores" (i.e. lower probability of error),
>> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
>> or something consensus based.
>
> Actually I mean both. Error probabilities below 0.75 for a single base
> are silly, and error probabilities below 0.0001 make sense only when two
> or more high quality bases are aligned.

I see what you mean - a probability of error of 0.75 matches that
for a random base call, obvious when you put it like that. Of course,
there is this nasty little thought at the back of my mind that sooner
or later someone will use FASTQ files for proteins (e.g. from some
mass-spec protein sequencing).

A probability less than that (e.g. 0) is actually worse than random and
could be considered as mean "we're pretty sure this isn't the stated
letter". But that would be silly, as you say.

>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so we allow
>>> "fastq" as a sequence format and ignore the quality scores in that case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do you
> have a full list of format names (sequence, feature, alignment, etc.) we
> can try to conform to?

See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Getting EMBOSS to conforming should be trivial - in general when
picking a format name for Biopython's SeqIO or AlignIO (and we
have avoided multiple aliases with one exception) we have tried to
use anything shared by BioPerl and EMBOSS. The FASTQ variants
are unusual in that Biopython got to invent some names.

In future where would be a good place to discuss these kinds of
cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

>>> We also allow the integer quality score format ... is anyone still
>>> using that (it looks horrible to me :-)
>>
>> Do you mean the QUAL file format holding PHRED scores?
>> Roche provide tools to turn their SFF files into FASTA and
>> QUAL files, so they are still used.
>
> Probably ... unless there is a Solexa version too.

We may be talking at cross purposes here, this is QUAL format:
http://www.bioperl.org/wiki/Qual_sequence_format

Peter

From pmr at ebi.ac.uk  Wed Jun 24 07:48:23 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 12:48:23 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>	
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>	
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
Message-ID: <4A421287.4000203@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> The '-' names look nice though. We can consider introducing them. Do you
>> have a full list of format names (sequence, feature, alignment, etc.) we
>> can try to conform to?
> 
> See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Thanks. I'll take a look at those.

> Getting EMBOSS to conforming should be trivial - in general when
> picking a format name for Biopython's SeqIO or AlignIO (and we
> have avoided multiple aliases with one exception) we have tried to
> use anything shared by BioPerl and EMBOSS. The FASTQ variants
> are unusual in that Biopython got to invent some names.
> 
> In future where would be a good place to discuss these kinds of
> cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

I was planning to suggest a get-together at BOSC in Stockholm so we can
identify common cross-platform issues. I'm sure there are many ways we
can conform with naming and interfaces and perhaps even share code.

>>>> We also allow the integer quality score format ... is anyone still
>>>> using that (it looks horrible to me :-)
>>> Do you mean the QUAL file format holding PHRED scores?
>>> Roche provide tools to turn their SFF files into FASTA and
>>> QUAL files, so they are still used.
>> Probably ... unless there is a Solexa version too.
> 
> We may be talking at cross purposes here, this is QUAL format:
> http://www.bioperl.org/wiki/Qual_sequence_format

Yes that is different. We'll worry about separate QUAL files later (we
already find separate GFF files a pain for features) and still with the
"fastqint" format name.

regards,

Peter

From biopython at maubp.freeserve.co.uk  Wed Jun 24 10:56:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 15:56:13 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A421287.4000203@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
Message-ID: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>

On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> I was planning to suggest a get-together at BOSC in Stockholm so we can
> identify common cross-platform issues. I'm sure there are many ways we
> can conform with naming and interfaces and perhaps even share code.
>

That would be a good idea - but while there are quite a few Biopython
people at BOSC this year, I don't know if there will be many from BioPerl
(there isn't a BioPerl update talk scheduled).

>>>>> We also allow the integer quality score format ... is anyone still
>>>>> using that (it looks horrible to me :-)
>>>> Do you mean the QUAL file format holding PHRED scores?
>>>> Roche provide tools to turn their SFF files into FASTA and
>>>> QUAL files, so they are still used.
>>> Probably ... unless there is a Solexa version too.
>>
>> We may be talking at cross purposes here, this is QUAL format:
>> http://www.bioperl.org/wiki/Qual_sequence_format
>
> Yes that is different. We'll worry about separate QUAL files later (we
> already find separate GFF files a pain for features) and still with the
> "fastqint" format name.

So when you say "fastqint" are you talking about something else?
Could you show us an example record in this format?

Peter
[I need to remember to proof read my evening emails more carefully]

From vecchi.b at gmail.com  Wed Jun 24 12:13:02 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:13:02 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
Message-ID: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>

Jay asked me to forward this to the list, since he sometimes has problems
getting his mails delivered.
Feel free to suggest topics for the bioperl hackathon to take place tomorrow
and on friday!

Bruno.


From: Jay Hannah <jay at jays.net>
Date: June 24, 2009 11:55:42 AM EDT
To: Bioperl <bioperl-l at bioperl.org>
Subject: Hackathon tomorrow (I think)

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

  http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla.

Come yell at me (us?) in IRC:

  http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah

From cjfields at illinois.edu  Wed Jun 24 12:22:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:22:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
Message-ID: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>


On Jun 24, 2009, at 9:56 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> I was planning to suggest a get-together at BOSC in Stockholm so we  
>> can
>> identify common cross-platform issues. I'm sure there are many ways  
>> we
>> can conform with naming and interfaces and perhaps even share code.
>>
>
> That would be a good idea - but while there are quite a few Biopython
> people at BOSC this year, I don't know if there will be many from  
> BioPerl
> (there isn't a BioPerl update talk scheduled).

Most of us are caught up with other work, though I will likely be able  
to dedicate more time to it in the ext few months.

Also doesn't help that my travel stipend doesn't start until Aug. 1.

>>>>>> We also allow the integer quality score format ... is anyone  
>>>>>> still
>>>>>> using that (it looks horrible to me :-)
>>>>> Do you mean the QUAL file format holding PHRED scores?
>>>>> Roche provide tools to turn their SFF files into FASTA and
>>>>> QUAL files, so they are still used.
>>>> Probably ... unless there is a Solexa version too.
>>>
>>> We may be talking at cross purposes here, this is QUAL format:
>>> http://www.bioperl.org/wiki/Qual_sequence_format
>>
>> Yes that is different. We'll worry about separate QUAL files later  
>> (we
>> already find separate GFF files a pain for features) and still with  
>> the
>> "fastqint" format name.
>
> So when you say "fastqint" are you talking about something else?
> Could you show us an example record in this format?
>
> Peter
> [I need to remember to proof read my evening emails more carefully]

The same as fastq, except the ASCII quality is converted to actual  
score:

@4_1_912_360
AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
+4_1_912_360
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40  
40 40 40 40 40 40 26 40 40 14 39 40 40
@4_1_54_483
TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
+4_1_54_483
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40  
28 40 40 40 40 40 40 16 40 40 5 40 40
chris


From cjfields at illinois.edu  Wed Jun 24 12:26:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:26:22 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
Message-ID: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>

1) Any help towards bugzilla fixes would be most welcome.
2) Better GFF3 integration
3) Typed but lightweight seqfeatures
4) Bio::Moose?

I can dedicate more time to the latter two in about a month, but I'll  
be tied up until then.  Let me know if anyone needs collab on biomoose  
on github; Mark Jensen's already added.

chris

On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:

> Jay asked me to forward this to the list, since he sometimes has  
> problems
> getting his mails delivered.
> Feel free to suggest topics for the bioperl hackathon to take place  
> tomorrow
> and on friday!
>
> Bruno.
>
>
> From: Jay Hannah <jay at jays.net>
> Date: June 24, 2009 11:55:42 AM EDT
> To: Bioperl <bioperl-l at bioperl.org>
> Subject: Hackathon tomorrow (I think)
>
> Hola,
>
> So a few of us here at YAPC might try to be productive tomorrow (and
> Friday?).
>
> I don't know if we have any commit bits attending.
>
> Feel free to suggest things:
>
>  http://yapc10.org/yn2009/wiki?node=BioPerl
>
> Or point me to list(s) of things. Perhaps we'll try to help out in  
> Bugzilla.
>
> Come yell at me (us?) in IRC:
>
>  http://www.bioperl.org/wiki/Irc
>
> Thanks,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 24 12:27:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 17:27:39 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
Message-ID: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>

On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu> wrote:
>> So when you say "fastqint" are you talking about something else?
>> Could you show us an example record in this format?
>>
>> Peter
>
> The same as fastq, except the ASCII quality is converted to actual score:
>
> @4_1_912_360
> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
> +4_1_912_360
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40
> 40 40 40 40 26 40 40 14 39 40 40
> @4_1_54_483
> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
> +4_1_54_483
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40
> 40 40 40 40 40 16 40 40 5 40 40

OK - and who uses this "Integer FASTQ" files?

Peter

From vecchi.b at gmail.com  Wed Jun 24 12:40:50 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:40:50 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
Message-ID: <1a0c1b750906240940t7c0003f9hf10eb30c0d85a5ce@mail.gmail.com>

>
> Is there a todo list for biomoose? I'd be glad to hack in, but I'm afraid
> to step into someone else's work or to do things without general agreement.
> It would be nice to have directions for small sized chunks of work to do.
> In any case, count me in!
>
> 2009/6/24 Chris Fields <cjfields at illinois.edu>
>
> 1) Any help towards bugzilla fixes would be most welcome.
>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>> 4) Bio::Moose?
>>
>> I can dedicate more time to the latter two in about a month, but I'll be
>> tied up until then.  Let me know if anyone needs collab on biomoose on
>> github; Mark Jensen's already added.
>>
>> chris
>>
>>
>> On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:
>>
>>  Jay asked me to forward this to the list, since he sometimes has problems
>>> getting his mails delivered.
>>> Feel free to suggest topics for the bioperl hackathon to take place
>>> tomorrow
>>> and on friday!
>>>
>>> Bruno.
>>>
>>>
>>> From: Jay Hannah <jay at jays.net>
>>> Date: June 24, 2009 11:55:42 AM EDT
>>> To: Bioperl <bioperl-l at bioperl.org>
>>> Subject: Hackathon tomorrow (I think)
>>>
>>> Hola,
>>>
>>> So a few of us here at YAPC might try to be productive tomorrow (and
>>> Friday?).
>>>
>>> I don't know if we have any commit bits attending.
>>>
>>> Feel free to suggest things:
>>>
>>>  http://yapc10.org/yn2009/wiki?node=BioPerl
>>>
>>> Or point me to list(s) of things. Perhaps we'll try to help out in
>>> Bugzilla.
>>>
>>> Come yell at me (us?) in IRC:
>>>
>>>  http://www.bioperl.org/wiki/Irc
>>>
>>> Thanks,
>>>
>>> Jay Hannah
>>> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>

From jay at jays.net  Wed Jun 24 12:44:51 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 12:44:51 -0400
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
Message-ID: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>

On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> Let me know if anyone needs collab on biomoose on github; Mark  
> Jensen's already added.

Anything on github should be trivial, even with no perms -- we can  
just fork and then send you (whoever) pull requests. github++  :)

> 1) Any help towards bugzilla fixes would be most welcome.

I don't know how to make any progress in bugzilla if no one has a  
commit bit...?

> 2) Better GFF3 integration
> 3) Typed but lightweight seqfeatures

Are there bugzilla tickets (or somewhere) describing those?

I wonder if anyone can help me get out of sporadic MailMan purgatory...

Thanks,

j

From cjfields at illinois.edu  Wed Jun 24 12:54:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:54:06 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
Message-ID: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>


On Jun 24, 2009, at 11:27 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>> So when you say "fastqint" are you talking about something else?
>>> Could you show us an example record in this format?
>>>
>>> Peter
>>
>> The same as fastq, except the ASCII quality is converted to actual  
>> score:
>>
>> @4_1_912_360
>> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
>> +4_1_912_360
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40  
>> 40 40 40
>> 40 40 40 40 26 40 40 14 39 40 40
>> @4_1_54_483
>> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
>> +4_1_54_483
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40  
>> 40 28 40
>> 40 40 40 40 40 16 40 40 5 40 40
>
> OK - and who uses this "Integer FASTQ" files?
>
> Peter

Not sure, but it is covered by MAQ via the conversion script (as FASTQ- 
int):

http://maq.sourceforge.net/fq_all2std.pl

chris

From jay at jays.net  Wed Jun 24 11:55:42 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 11:55:42 -0400
Subject: [Bioperl-l] Hackathon tomorrow (I think)
Message-ID: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and  
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

    http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in  
Bugzilla.

Come yell at me (us?) in IRC:

    http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah

From bernd.web at gmail.com  Wed Jun 24 13:11:51 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 24 Jun 2009 19:11:51 +0200
Subject: [Bioperl-l] Bioperl_scripts
Message-ID: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>

Hi,

The bioperl scripts section at
http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
examples.
However, it quite a number of scripts cannot be found anymore and return errors:

For example for the first link (scripts/install_bioperl_scripts.pl)
Filesystem has no item: File not found: revision 15800, path
'/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
/usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245

Also all scripts in the Bio::Graphics section cannot be found.
Is the http://www.bioperl.org/wiki/Bioperl_scripts page still supported?


Regards,
Bernd

From cjfields at illinois.edu  Wed Jun 24 16:57:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 15:57:51 -0500
Subject: [Bioperl-l] Bioperl_scripts
In-Reply-To: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
References: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
Message-ID: <5AF99205-F977-45A1-B4AF-C3858A5727FD@illinois.edu>


On Jun 24, 2009, at 12:11 PM, Bernd Web wrote:

> Hi,
>
> The bioperl scripts section at
> http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
> examples.
> However, it quite a number of scripts cannot be found anymore and  
> return errors:
>
> For example for the first link (scripts/install_bioperl_scripts.pl)
> Filesystem has no item: File not found: revision 15800, path
> '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
> /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245
>
> Also all scripts in the Bio::Graphics section cannot be found.
> Is the http://www.bioperl.org/wiki/Bioperl_scripts page still  
> supported?
>
> Regards,
> Bernd

Re: Bio::Graphics, all modules and related scripts have been moved to  
a separate repo and CPAN release (latest):

http://search.cpan.org/~lds/Bio-Graphics-1.96/

Beyond that I would consider all scripts and the wiki page supported.   
It's best to file this to bugzilla as a documentation issue so we fix  
it and don't about forget it amongst the flurry of email.

chris

From cjfields at illinois.edu  Wed Jun 24 17:10:34 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 16:10:34 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
Message-ID: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>


On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:

> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>> Let me know if anyone needs collab on biomoose on github; Mark  
>> Jensen's already added.
>
> Anything on github should be trivial, even with no perms -- we can  
> just fork and then send you (whoever) pull requests. github++  :)
>
>> 1) Any help towards bugzilla fixes would be most welcome.
>
> I don't know how to make any progress in bugzilla if no one has a  
> commit bit...?

For some reason I thought you had a commit bit; we can add you in if  
needed.  Anyway, patches are most definitely welcome ;>

>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>
> Are there bugzilla tickets (or somewhere) describing those?

No as the issues are more complex than one single bug, but we do have  
something to help track for the time being:

http://www.bioperl.org/wiki/GFF_Refactor
http://www.bioperl.org/wiki/Align_Refactor

I'll probably file TODOs during the process for those refactors.  The  
easiest to tackle would be probably be Align/LocatableSeq refactors.

> I wonder if anyone can help me get out of sporadic MailMan  
> purgatory...
>
> Thanks,
>
> j

-c

PS - Don't feel constrained by the above.  There are many many areas  
to contribute to.


From pmr at ebi.ac.uk  Wed Jun 24 18:44:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 23:44:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
Message-ID: <4A42AC51.3090809@ebi.ac.uk>

Chris Fields wrote:
> Not sure, but it is covered by MAQ via the conversion script (as 
> FASTQ-int):

Are the scores phred or Solexa?

Peter Rice

From adlai at refenestration.com  Wed Jun 24 22:08:31 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 04:08:31 +0200
Subject: [Bioperl-l] Extreme newbie question.
Message-ID: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>

I have been trying to install BioPerl for a while now and after  
pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
Fink installation, a >cpan installation and removing my .cpan folder I  
am still at square 0. I do not want to do anymore damage to my  
computer, yet I really need a working install (especially to interface  
with remote DBs like GenBank. Can anyone give me some advice here?  
After each attempt, I have tried to run perldoc bptutorial.pl and  
tried test scripts with "use Bio::Perl" in the headers and I just  
receive  error mesages like the following:

Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level /Library/ 
Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/ 
Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
Library/Perl/5.8.1 .) at trsh.pl line 1.

I have been working from the OReilly book astering Perl for  
Bioinformatics and the INSTALL file and have scoured around the  
BioPerl website and am still stuck.

Thanks in advance,

Adlai

From kpclancy at hotmail.com  Wed Jun 24 22:31:17 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Wed, 24 Jun 2009 20:31:17 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net> 
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
Message-ID: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>


is there an intention to have a hackathon at ISMB this weekend - I know there is a 2 day BOSC 
kevin

> From: cjfields at illinois.edu
> To: jay at jays.net
> Date: Wed, 24 Jun 2009 16:10:34 -0500
> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> 
> 
> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> 
> > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >> Let me know if anyone needs collab on biomoose on github; Mark  
> >> Jensen's already added.
> >
> > Anything on github should be trivial, even with no perms -- we can  
> > just fork and then send you (whoever) pull requests. github++  :)
> >
> >> 1) Any help towards bugzilla fixes would be most welcome.
> >
> > I don't know how to make any progress in bugzilla if no one has a  
> > commit bit...?
> 
> For some reason I thought you had a commit bit; we can add you in if  
> needed.  Anyway, patches are most definitely welcome ;>
> 
> >> 2) Better GFF3 integration
> >> 3) Typed but lightweight seqfeatures
> >
> > Are there bugzilla tickets (or somewhere) describing those?
> 
> No as the issues are more complex than one single bug, but we do have  
> something to help track for the time being:
> 
> http://www.bioperl.org/wiki/GFF_Refactor
> http://www.bioperl.org/wiki/Align_Refactor
> 
> I'll probably file TODOs during the process for those refactors.  The  
> easiest to tackle would be probably be Align/LocatableSeq refactors.
> 
> > I wonder if anyone can help me get out of sporadic MailMan  
> > purgatory...
> >
> > Thanks,
> >
> > j
> 
> -c
> 
> PS - Don't feel constrained by the above.  There are many many areas  
> to contribute to.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 24 23:54:28 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 22:54:28 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
Message-ID: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>

I have no idea; I don't think there are many bioperl devs attending  
this year unfortunately.  Any meetings in the next year where we could  
set up a bioperl hackathon?  I will likely be available to attend if  
it's stateside...

chris

On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:

>
> is there an intention to have a hackathon at ISMB this weekend - I  
> know there is a 2 day BOSC
> kevin
>
>> From: cjfields at illinois.edu
>> To: jay at jays.net
>> Date: Wed, 24 Jun 2009 16:10:34 -0500
>> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
>>
>>
>> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
>>
>>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>>>> Let me know if anyone needs collab on biomoose on github; Mark
>>>> Jensen's already added.
>>>
>>> Anything on github should be trivial, even with no perms -- we can
>>> just fork and then send you (whoever) pull requests. github++  :)
>>>
>>>> 1) Any help towards bugzilla fixes would be most welcome.
>>>
>>> I don't know how to make any progress in bugzilla if no one has a
>>> commit bit...?
>>
>> For some reason I thought you had a commit bit; we can add you in if
>> needed.  Anyway, patches are most definitely welcome ;>
>>
>>>> 2) Better GFF3 integration
>>>> 3) Typed but lightweight seqfeatures
>>>
>>> Are there bugzilla tickets (or somewhere) describing those?
>>
>> No as the issues are more complex than one single bug, but we do have
>> something to help track for the time being:
>>
>> http://www.bioperl.org/wiki/GFF_Refactor
>> http://www.bioperl.org/wiki/Align_Refactor
>>
>> I'll probably file TODOs during the process for those refactors.  The
>> easiest to tackle would be probably be Align/LocatableSeq refactors.
>>
>>> I wonder if anyone can help me get out of sporadic MailMan
>>> purgatory...
>>>
>>> Thanks,
>>>
>>> j
>>
>> -c
>>
>> PS - Don't feel constrained by the above.  There are many many areas
>> to contribute to.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jun 25 10:00:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 09:00:47 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <CB4314ED-4076-42AD-96CC-64CB429929D5@illinois.edu>


On Jun 24, 2009, at 5:44 PM, Peter Rice wrote:

> Chris Fields wrote:
>> Not sure, but it is covered by MAQ via the conversion script (as  
>> FASTQ-int):
>
> Are the scores phred or Solexa?
>
> Peter Rice

Not sure actually.  The perl script I linked to looks like it converts  
using the same scale as solexa (illumina 1.0).

chris

From chmille4 at gmail.com  Thu Jun 25 10:46:26 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 10:46:26 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
Message-ID: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>

Hi all,

Quick question I came across while writing the Bio::Nexml module.

I'm trying to link taxon data to a Bio::LocatableSeq object inside a
Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
SeqFeatures, but according to this HowTo (
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
considered to refer to a portion of a sequence, whereas something like taxon
data would refer to the entire sequence and should be handled as an
annotation. However, as far as I can tell Bio::LocatableSeq does not support
annotation objects.
What would be the best way to relate taxon data to a single sequence inside
an alignment?


Thanks,
Chase

From Kevin.M.Brown at asu.edu  Thu Jun 25 11:21:02 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 08:21:02 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink

That error suggests that the install fails and you need to figure out
why from the install error messages. I suspect you aren't doing the
install as root, but as a normal user who lacks the needed permissions
to change files in certain directories. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Adlai Burman
> Sent: Wednesday, June 24, 2009 7:09 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Extreme newbie question.
> 
> I have been trying to install BioPerl for a while now and after  
> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
> Fink installation, a >cpan installation and removing my .cpan 
> folder I  
> am still at square 0. I do not want to do anymore damage to my  
> computer, yet I really need a working install (especially to 
> interface  
> with remote DBs like GenBank. Can anyone give me some advice here?  
> After each attempt, I have tried to run perldoc bptutorial.pl and  
> tried test scripts with "use Bio::Perl" in the headers and I just  
> receive  error mesages like the following:
> 
> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level 
> /Library/ 
> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl 
> /Network/Library/ 
> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
> Library/Perl/5.8.1 .) at trsh.pl line 1.
> 
> I have been working from the OReilly book astering Perl for  
> Bioinformatics and the INSTALL file and have scoured around the  
> BioPerl website and am still stuck.
> 
> Thanks in advance,
> 
> Adlai
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From David.Messina at sbc.su.se  Thu Jun 25 12:39:22 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 25 Jun 2009 18:39:22 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <628aabb70906250939l7d1116d0sec9efa2c16235c75@mail.gmail.com>

Hi Adlai,
Did the Bioperl tests run successfully? Did you get the impression that the
installation was successful?

If not, what are the errors you see during the install process?

I ask because the error you included in your message is not necessarily
indicative of a failed installation (it could just be a path issue).

By the way, as I think is indicated somewhere in the installation
instructions, you don't actually need to install Bioperl to use most of its
functionality. Simply having the Bio/ directory in your PERL5LIB path is
enough.


Dave

From cjfields at illinois.edu  Thu Jun 25 13:02:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 12:02:48 -0500
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
Message-ID: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>

On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:

> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something  
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not  
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence  
> inside
> an alignment?
>
> Thanks,
> Chase

 From working with feature/annotation-rich alignment formats such as  
stockholm I found this is one of the areas for Align that needs some  
rethinking. One way to work around this w/o major refactoring is to  
have a full-length SeqFeature (pointing to the proper LocatableSeq)  
that stores the Bio::Annotation.  I don't necessarily like that  
approach as a long-term solution, though, as it's a little hacky and  
indirect, but it might get you started (just mark it as TODO so we can  
catch it at some point).

For a long-term solution I don't think the answer is as simple as  
making LocatableSeq Bio::AnnotatableI; that would not be congruent  
with the PrimarySeq implementation (which is not AnnotatableI).   
LocatableSeq is supposed to represent a simple PrimarySeq that can be  
mapped to other sequences via start/end/strand, and thus inherits from  
both Bio::PrimarySeq (note lack of 'I') and RangeI.

Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and  
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the  
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow  
e.g. features.

I personally think option #2 is easiest, as this means anything that  
is-a PrimarySeq is also AnnotatableI, and it might not break past  
scripts.  Not sure how this would affect overall performance though.

chris

From me at miguel.weapps.com  Thu Jun 25 10:09:29 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Thu, 25 Jun 2009 16:09:29 +0200
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <94da4c880906250709j7b2cb78dk77710bd43e20fd42@mail.gmail.com>

Dear all,
Is there a way to run muscle silently via
Bio::Tools::Run::Alignment::Muscle?

Cheers,

-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]

+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From chmille4 at gmail.com  Thu Jun 25 13:57:25 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 13:57:25 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> 
	<3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
Message-ID: <991fb8210906251057i25bbe511r84f5d1319f191421@mail.gmail.com>

Ok, I'll use the full length SeqFeature for now and mark it with a TODO.
 Thanks for the help.
Chase

On Thu, Jun 25, 2009 at 1:02 PM, Chris Fields <cjfields at illinois.edu> wrote:

> On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:
>
>  Hi all,
>>
>> Quick question I came across while writing the Bio::Nexml module.
>>
>> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
>> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
>> SeqFeatures, but according to this HowTo (
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
>> considered to refer to a portion of a sequence, whereas something like
>> taxon
>> data would refer to the entire sequence and should be handled as an
>> annotation. However, as far as I can tell Bio::LocatableSeq does not
>> support
>> annotation objects.
>> What would be the best way to relate taxon data to a single sequence
>> inside
>> an alignment?
>>
>> Thanks,
>> Chase
>>
>
> From working with feature/annotation-rich alignment formats such as
> stockholm I found this is one of the areas for Align that needs some
> rethinking. One way to work around this w/o major refactoring is to have a
> full-length SeqFeature (pointing to the proper LocatableSeq) that stores the
> Bio::Annotation.  I don't necessarily like that approach as a long-term
> solution, though, as it's a little hacky and indirect, but it might get you
> started (just mark it as TODO so we can catch it at some point).
>
> For a long-term solution I don't think the answer is as simple as making
> LocatableSeq Bio::AnnotatableI; that would not be congruent with the
> PrimarySeq implementation (which is not AnnotatableI).  LocatableSeq is
> supposed to represent a simple PrimarySeq that can be mapped to other
> sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq
> (note lack of 'I') and RangeI.
>
> Three options:
> 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and
> Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
> 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the
> PrimarySeq AnnotationCollection).
> 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g.
> features.
>
> I personally think option #2 is easiest, as this means anything that is-a
> PrimarySeq is also AnnotatableI, and it might not break past scripts.  Not
> sure how this would affect overall performance though.
>
> chris
>

From Kevin.M.Brown at asu.edu  Thu Jun 25 14:54:19 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 11:54:19 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA08F@EX02.asurite.ad.asu.edu>

Please keep your replies on the list. 

> -----Original Message-----
> From: Adlai Burman [mailto:adlai at refenestration.com] 
> Sent: Thursday, June 25, 2009 11:39 AM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Extreme newbie question.
> 
> Thanks, Kevin.
> I did install everything using sudo. I will try again and pay  
> attention to the error log. I hope I did not introduce any conflicts  
> or weird path problems.
> 
> Adlai
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
> 
> > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >
> > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
> >
> > That error suggests that the install fails and you need to 
> figure out
> > why from the install error messages. I suspect you aren't doing the
> > install as root, but as a normal user who lacks the needed 
> permissions
> > to change files in certain directories.
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >> Adlai Burman
> >> Sent: Wednesday, June 24, 2009 7:09 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] Extreme newbie question.
> >>
> >> I have been trying to install BioPerl for a while now and after
> >> pummeling my hard drive (Mac OS 10.5 intel) with several 
> attempts at
> >> Fink installation, a >cpan installation and removing my .cpan
> >> folder I
> >> am still at square 0. I do not want to do anymore damage to my
> >> computer, yet I really need a working install (especially to
> >> interface
> >> with remote DBs like GenBank. Can anyone give me some advice here?
> >> After each attempt, I have tried to run perldoc bptutorial.pl and
> >> tried test scripts with "use Bio::Perl" in the headers and I just
> >> receive  error mesages like the following:
> >>
> >> Can't locate Bio/Perl.pm in @INC (@INC contains: 
> /home/users/dag/lib/
> >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
> >> /Library/
> >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
> >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
> >> /Network/Library/
> >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
> >> Network/Library/Perl 
> /System/Library/Perl/Extras/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/Extras/5.8.8 
> /Library/Perl/5.8.6 /
> >> Library/Perl/5.8.1 .) at trsh.pl line 1.
> >>
> >> I have been working from the OReilly book astering Perl for
> >> Bioinformatics and the INSTALL file and have scoured around the
> >> BioPerl website and am still stuck.
> >>
> >> Thanks in advance,
> >>
> >> Adlai
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> 
> 


From adlai at refenestration.com  Thu Jun 25 14:59:10 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 20:59:10 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
Message-ID: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>

Hey again, I'm right into trying to install again and I now get a new  
error:

Client not fully configured, please proceed with configuring.
  o conf init urllist

any ideas?

Adlai

On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:

> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>
> That error suggests that the install fails and you need to figure out
> why from the install error messages. I suspect you aren't doing the
> install as root, but as a normal user who lacks the needed permissions
> to change files in certain directories.
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Adlai Burman
>> Sent: Wednesday, June 24, 2009 7:09 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Extreme newbie question.
>>
>> I have been trying to install BioPerl for a while now and after
>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>> Fink installation, a >cpan installation and removing my .cpan
>> folder I
>> am still at square 0. I do not want to do anymore damage to my
>> computer, yet I really need a working install (especially to
>> interface
>> with remote DBs like GenBank. Can anyone give me some advice here?
>> After each attempt, I have tried to run perldoc bptutorial.pl and
>> tried test scripts with "use Bio::Perl" in the headers and I just
>> receive  error mesages like the following:
>>
>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/
>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>> /Library/
>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>> /Network/Library/
>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>
>> I have been working from the OReilly book astering Perl for
>> Bioinformatics and the INSTALL file and have scoured around the
>> BioPerl website and am still stuck.
>>
>> Thanks in advance,
>>
>> Adlai
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Thu Jun 25 16:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 15:07:44 -0500
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <F3802595-7617-4CD5-AC8A-2B67069BE001@illinois.edu>

That would mean, within the cpan shell, type 'o conf init  
urllist' (again, requires sudo).

chris

On Jun 25, 2009, at 1:59 PM, Adlai Burman wrote:

> Hey again, I'm right into trying to install again and I now get a  
> new error:
>
> Client not fully configured, please proceed with configuring.
> o conf init urllist
>
> any ideas?
>
> Adlai
>
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
>
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>>
>> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>>
>> That error suggests that the install fails and you need to figure out
>> why from the install error messages. I suspect you aren't doing the
>> install as root, but as a normal user who lacks the needed  
>> permissions
>> to change files in certain directories.
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Adlai Burman
>>> Sent: Wednesday, June 24, 2009 7:09 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Extreme newbie question.
>>>
>>> I have been trying to install BioPerl for a while now and after
>>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>>> Fink installation, a >cpan installation and removing my .cpan
>>> folder I
>>> am still at square 0. I do not want to do anymore damage to my
>>> computer, yet I really need a working install (especially to
>>> interface
>>> with remote DBs like GenBank. Can anyone give me some advice here?
>>> After each attempt, I have tried to run perldoc bptutorial.pl and
>>> tried test scripts with "use Bio::Perl" in the headers and I just
>>> receive  error mesages like the following:
>>>
>>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/ 
>>> lib/
>>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>>> /Library/
>>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>>> /Network/Library/
>>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin- 
>>> thread-
>>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>>
>>> I have been working from the OReilly book astering Perl for
>>> Bioinformatics and the INSTALL file and have scoured around the
>>> BioPerl website and am still stuck.
>>>
>>> Thanks in advance,
>>>
>>> Adlai
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 25 16:19:07 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 25 Jun 2009 21:19:07 +0100
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <4A43DBBB.2050109@sendu.me.uk>

Adlai Burman wrote:
> Hey again, I'm right into trying to install again and I now get a new 
> error:
> 
> Client not fully configured, please proceed with configuring.
>  o conf init urllist

Run cpan and do as it says.

From cjm at berkeleybop.org  Thu Jun 25 20:32:05 2009
From: cjm at berkeleybop.org (Chris Mungall)
Date: Thu, 25 Jun 2009 17:32:05 -0700
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
Message-ID: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>


I've written a module Bio::FeatureIO::seqont_owl, which generates  
Sequence Ontology compliant RDF/OWL. This will allow for example  
loading of GFF into triplestores and inference using OWL reasoners.

- It's experimental, fairly incomplete, and subject to change
- Relies on an experimental extension of SO
- Probably of interest to a minority of bp users
- It's not yet fully documented (but there will be a paper)
- It doesn't introduce any additional dependencies (all done via  
XML::Writer, which is already a dependency)
- Doesn't otherwise impinge on existing code

I'd like to get this under source control. Is the appropriate place  
for this:

- HEAD
- a branch
- bioperl-dev
- a separate repository

?

Cheers
Chris


From maj at fortinbras.us  Thu Jun 25 21:08:43 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 25 Jun 2009 21:08:43 -0400
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
Message-ID: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>

This sounds very Dev to me. Also cool.
MAJ
----- Original Message ----- 
From: "Chris Mungall" <cjm at berkeleybop.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 25, 2009 8:32 PM
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF


>
> I've written a module Bio::FeatureIO::seqont_owl, which generates  Sequence 
> Ontology compliant RDF/OWL. This will allow for example  loading of GFF into 
> triplestores and inference using OWL reasoners.
>
> - It's experimental, fairly incomplete, and subject to change
> - Relies on an experimental extension of SO
> - Probably of interest to a minority of bp users
> - It's not yet fully documented (but there will be a paper)
> - It doesn't introduce any additional dependencies (all done via  XML::Writer, 
> which is already a dependency)
> - Doesn't otherwise impinge on existing code
>
> I'd like to get this under source control. Is the appropriate place  for this:
>
> - HEAD
> - a branch
> - bioperl-dev
> - a separate repository
>
> ?
>
> Cheers
> Chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 25 21:35:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 20:35:06 -0500
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
	<7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
Message-ID: <12F203C3-689B-423E-9691-86EB1D500A7D@illinois.edu>

I agree.  Just to note, FeatureIO (even though it's in core) will be  
operated on at some future point to be simplified (and likely will  
move away from Bio::SF::Annotated).

chris

On Jun 25, 2009, at 8:08 PM, Mark A. Jensen wrote:

> This sounds very Dev to me. Also cool.
> MAJ
> ----- Original Message ----- From: "Chris Mungall" <cjm at berkeleybop.org 
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Thursday, June 25, 2009 8:32 PM
> Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
>
>
>>
>> I've written a module Bio::FeatureIO::seqont_owl, which generates   
>> Sequence Ontology compliant RDF/OWL. This will allow for example   
>> loading of GFF into triplestores and inference using OWL reasoners.
>>
>> - It's experimental, fairly incomplete, and subject to change
>> - Relies on an experimental extension of SO
>> - Probably of interest to a minority of bp users
>> - It's not yet fully documented (but there will be a paper)
>> - It doesn't introduce any additional dependencies (all done via   
>> XML::Writer, which is already a dependency)
>> - Doesn't otherwise impinge on existing code
>>
>> I'd like to get this under source control. Is the appropriate  
>> place  for this:
>>
>> - HEAD
>> - a branch
>> - bioperl-dev
>> - a separate repository
>>
>> ?
>>
>> Cheers
>> Chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rmb32 at cornell.edu  Fri Jun 26 00:27:55 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 25 Jun 2009 21:27:55 -0700
Subject: [Bioperl-l] BioPerl hackathon, hooray!
Message-ID: <4A444E4B.2000808@cornell.edu>

I'm pleased to announce a thoroughly climactic conclusion to the 
YAPC::NA 2009 BioPerl hackathon.

Between Jay Hannah (jhannah) and myself (rbuels), plus #bioperl virtual 
participant Bruno Vecchi (brunov), we SMASHED the HECK out of 6 bugs in 
the BioPerl Bugzilla.

Many thanks to the participants, let's do it again next year!

Rob

From jay at jays.net  Fri Jun 26 00:54:31 2009
From: jay at jays.net (Jay Hannah)
Date: Fri, 26 Jun 2009 00:54:31 -0400
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <4A444E4B.2000808@cornell.edu>
References: <4A444E4B.2000808@cornell.edu>
Message-ID: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>

On Jun 26, 2009, at 12:27 AM, Robert Buels wrote:
> I'm pleased to announce a thoroughly climactic conclusion to the  
> YAPC::NA 2009 BioPerl hackathon.

Feel free to check our work:

    http://github.com/rbuels/bioperl-live

:)

j
http://www.bioperl.org/wiki/User:Jhannah


From rahall2 at ualr.edu  Fri Jun 26 02:28:05 2009
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 26 Jun 2009 01:28:05 -0500
Subject: [Bioperl-l] Random nucleotide string generator?
Message-ID: <fc2dd7b3461f.4a442425@ualr.edu>

All,
 
Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere? 
 
I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
 
If not - what should the namespace be for such a module should it be undone and desirable? 
 
TIA!
 
Roger 
 
 
From David.Messina at sbc.su.se  Fri Jun 26 06:15:04 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 12:15:04 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com>

The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this
post from Neil Saunders' blog:
http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/


You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID
package, available here:
[ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>

<ftp://selab.janelia.org/pub/software/squid/>

If not - what should the namespace be for such a module should it be undone
> and desirable?


Perhaps add it to Bio::SeqUtils?


Dave

From David.Messina at sbc.su.se  Fri Jun 26 07:37:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 13:37:44 +0200
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
References: <4A444E4B.2000808@cornell.edu>
	<E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
Message-ID: <628aabb70906260437r18fc7543oc05761241fe810ff@mail.gmail.com>

Awesome, great work guys!
Thanks so much.


Dave

From David.Messina at sbc.su.se  Fri Jun 26 08:58:20 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 14:58:20 +0200
Subject: [Bioperl-l]  Random nucleotide string generator?
In-Reply-To: <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
References: <fc2dd7b3461f.4a442425@ualr.edu>
	<628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> 
	<1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
Message-ID: <628aabb70906260558k585f6700ycef271e7f26dd1a3@mail.gmail.com>

[Forwarding Bruno's reply.... -Dave]
---------- Forwarded message ----------
From: Bruno Vecchi <vecchi.b at gmail.com>
Date: Fri, Jun 26, 2009 at 14:44
Subject: Re: [Bioperl-l] Random nucleotide string generator?
To: Dave Messina <David.Messina at sbc.su.se>


Here's a little script that I used for a somewhat related task. It produces
a randomized version of an input sequence (thus keeping the original's
composition). Maybe you could adjust it to your needs; providing an input
sequence with the desired length and composition you should get what you
want.

#!perl
use List::Util qw(shuffle);
use Bio::SeqIO;

my ($seqfile, $number) = @ARGV;

my $in = Bio::SeqIO->new(-file => $seqfile);
my $fh = Bio::SeqIO->newFh(-format => 'fasta');

my $seq = $in->next_seq;
my @chars = split '', $seq->seq;

for my $i (1 .. $number) {
    @chars = shuffle @chars;
    my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars);
    print $fh $new_seq;
}

You can use it like this from the command line (assuming you want 20 output
sequences):

shuffle.pl input_sequence.fasta 20 > random_sequences.fasta

Bruno.

2009/6/26 Dave Messina <David.Messina at sbc.su.se>

> The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on
> this
> post from Neil Saunders' blog:
>
> http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/
>
>
> You can also do this outside of BioPerl using shuffle from Sean Eddy's
> SQUID
> package, available here:
> [ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>
>
> <ftp://selab.janelia.org/pub/software/squid/>
>
> If not - what should the namespace be for such a module should it be undone
> > and desirable?
>
>
> Perhaps add it to Bio::SeqUtils?
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From budd at embl-heidelberg.de  Fri Jun 26 04:30:12 2009
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Fri, 26 Jun 2009 10:30:12 +0200 (CEST)
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <Pine.LNX.4.44.0906261028110.14978-100000@bibo.EMBL-Heidelberg.DE>

a non-bioperl option would be to use something external like seq-gen or 
similar - tools designed for outputing "random" sequences simulated over a 
tree - one could simply sample a single simulated sequence at random from 
the output alignment

On Fri, 26 Jun 2009, Roger Hall wrote:

> All,
>  Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>  
> I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
>  
> If not - what should the namespace be for such a module should it be undone and desirable? 
>  
> TIA!
>  
> Roger 
>  
>  
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
----------------------------------------------------------------------
Aidan Budd                                    tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

http://www.embl-heidelberg.de/~budd/
http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html


From me at miguel.weapps.com  Fri Jun 26 04:52:46 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Fri, 26 Jun 2009 10:52:46 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <94da4c880906260152k3a764951u6ea8a6fdfa3b7f2c@mail.gmail.com>

Dear all, dear Roger,
I'm not sure if there is such generator (I think so).  Anyway, if you flag
it as "undone and desirable", please take into account the possibility of
extend the generator for dinucleotides, particularly useful when working
with secondary structure of RNA molecules,

Cheers,

On Fri, Jun 26, 2009 at 8:28 AM, Roger Hall <rahall2 at ualr.edu> wrote:

> All,
>
> Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>
> I noticed a thread about it from 2000 and nothing since (searching for
> "random sequence").
>
> If not - what should the namespace be for such a module should it be undone
> and desirable?
>
> TIA!
>
> Roger
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]


+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From pri2darshini at gmail.com  Fri Jun 26 06:18:55 2009
From: pri2darshini at gmail.com (priya darshini)
Date: Fri, 26 Jun 2009 15:48:55 +0530
Subject: [Bioperl-l] bioperl installation
Message-ID: <7c569a160906260318t5611fdd8nd536ae5139f5b1d4@mail.gmail.com>

Respected Sir,
                    I am K.Lakshmi priya Darshini. My specialization is M.Sc
bioinformatics. I am interseted in learning bioperl. My operating system is
windows Vista. I have followed the steps to install bioperl as given by your
team in the bioperl tutorial. But i am getting the error message as *"Begin
failed".Sir please help me to continue with my installation further. I am
using 5.10 version of perl.Waithing for your reply.*
* thanking you.*
*                  *
**
*regards,*
*lakshmi priya darshini.*

From Jonathan.Moore at warwick.ac.uk  Fri Jun 26 05:55:54 2009
From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan)
Date: Fri, 26 Jun 2009 10:55:54 +0100
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>

I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML files at the TAIR FTP site.

I've tried SeqIO with both tigr and tigrxml formats but both are giving errors in 1.6.0.  Has anyone advice on whether it's likely to be doable, or should I wait til the .gb files are available?

Jay Moore


From fungazid at yahoo.com  Fri Jun 26 07:59:06 2009
From: fungazid at yahoo.com (Fungazid)
Date: Fri, 26 Jun 2009 04:59:06 -0700 (PDT)
Subject: [Bioperl-l] Bio::Assembly::IO
Message-ID: <57633.49243.qm@web65505.mail.ac4.yahoo.com>


Hello,

I received an ACE file containing newbler assembly of 454 cDNA reads, and a corresponding phd.ball file. I was able to view and manipulate the contigs in this assembly using Consed on linux. Consed required ~1.5GB RAM, and the assembly was loaded within ~2 min. 
I would like to parse the assembly within my code (preferentially in Perl, but not necessarily), to fetch all read sequences for each contig, nucleotide quality, alignment to consensus, etc. 
I am trying to use Bio::Assembly::IO , but it eats more than my entire RAM (3GB), and is extremely slow (~1 hour before it crashes).
Maybe you have an idea ?
In addition, do you maybe aware of other non-visual parsers of ACE assembly format for Perl or other languages

Many thanks,
funazid   


From cjfields at illinois.edu  Fri Jun 26 13:00:41 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 12:00:41 -0500
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <FEC1932A-49FE-4E63-9727-F08520FF0252@illinois.edu>

If there are errors this should be submitted as a bug.  You should  
attach example data to the report after it (e.g. don't copy&paste into  
the text box).

http://www.bioperl.org/wiki/Bugs

chris

On Jun 26, 2009, at 4:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From plantboy at gmail.com  Fri Jun 26 14:46:35 2009
From: plantboy at gmail.com (cody h)
Date: Fri, 26 Jun 2009 11:46:35 -0700
Subject: [Bioperl-l] test suite failing on mac os x 10.5
Message-ID: <320708320906261146v2e799c82mc1b921218fc233c5@mail.gmail.com>

Hi,

I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7.
The Build.PL file executes fine, but the test suite fails dramatically,
returning the error "No database selected" for many of the tests. All the
error calls seem to be originating from line 852 in
BasePersistenceAdaptor.pm. I took a look at the code but I could not figure
out why it wasn't working.

I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql
server. The dependencies all seem to be working, but I haven't used them
enough to completely verify this, so that could be part of the problem. I
don't know which ones to check though. Does anyone have any idea why I might
be getting these "No database selected" errors? Here is a sample of the
error messages given by the ./Build test command (note, this same error is
generated byt 15/16 test files)

t/12ontology.t .... 1/738
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: error while executing statement in
Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: t/12ontology.t:44
-----------------------------------------------------------
t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00)

From maj at fortinbras.us  Fri Jun 26 14:50:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 26 Jun 2009 14:50:02 -0400
Subject: [Bioperl-l] Fw: Inquiry about a prog written by [MAJ]
Message-ID: <0581B2DAE8514F418127D54407384905@NewLife>

Thought this should be archived to the list. 
MAJ

----- Original Message ----- 
From: Mark A. Jensen 
To: Ross KK Leung 
Sent: Thursday, June 25, 2009 8:46 AM
Subject: Re: Inquiry about a prog written by you


Hi Ross-
Yes, you can specify the recombinants, as "A/C/G[subtype]" in the query string. Unfortunately, the 10000 record limit is imposed by the Los Alamos site that my program accesses. You might be able to work around this if you're willing to write your own script using the BioPerl modules that are the basis for the hivq.PLS -- by using the modules to perform multiple queries, and collecting the the entire set of sequences over that series of queries. 
You might look at the documentation for the modules for ideas; try looking at http://www.bioperl.org/wiki/Module:Bio::DB::HIV and http://www.bioperl.org/wiki/Module:Bio::DB::Query::HIVQuery . 
best regards- 
Mark
  ----- Original Message ----- 
  From: Ross KK Leung 
  To: maj at fortinbras.us 
  Sent: Thursday, June 25, 2009 6:09 AM
  Subject: Inquiry about a prog written by you


  Dear Mark A. Jensen,

   
  A google search returns your program (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/DB-HIV/hivq.PLS)

   
  I wonder whether the program is able to search recombinants (e.g. B incl. recombinants) and retrieve results more than 50000 records. This limitation is a bottleneck by the web-based search.

   
  Thanks for your advice, Ross


From rmb32 at cornell.edu  Fri Jun 26 17:06:06 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Jun 2009 14:06:06 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
Message-ID: <4A45383E.40207@cornell.edu>

Reposting to bioperl list.

This is a really giant opportunity to expose some of the best 
technologists in the world to what we do in bioinformatics, and possibly 
to entice some of them to help us the heck out!  ;-)

Rob

On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
> University.  Can you offer any lecturer recommendations and could I 
> fill an entire multi day thread with BioPerl lectures?  I would also 
> like to "entice" MJD to come to YAPC with the use of BioPerl.
>
> Thanks for your thoughts.
>
> Heath Bair
> (Candybar)

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu

From cain.cshl at gmail.com  Fri Jun 26 17:12:37 2009
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 26 Jun 2009 17:12:37 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <D2A53AB2-E35A-499B-B81A-13B9D61752CA@gmail.com>

Cool--Columbus is just down the road.  I could give a talk (or even  
multiple talks) on a variety of GMOD topics (which I consider BioPerl  
related, since so much of what we do depends on BioPerl).

Scott

On Jun 26, 2009, at 5:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Fri Jun 26 17:49:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 16:49:39 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <642C6C93-8FCD-4463-8A39-E15832F8714C@illinois.edu>

Well, if it's in Columbus I'll be there (I can make a drive out of it).

In short, we should probably get something going, yes. Lots of things  
we can talk about, inc. bioperl6, Bio::Moose, etc.

chris

On Jun 26, 2009, at 4:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Fri Jun 26 23:59:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 26 Jun 2009 20:59:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <19013.39182.97468.604560@already.dhcp.gene.com>


This does seems like a great opportunity.  I think you/the-community
could put together at least a day, and maybe more, of Bio and Perl
stuff.  I think that it's important to range beyond the stuff that's
in the BioPerl namespace and pull in something from the Gene Ontology
project, the Ensembl project[s], maybe libbio, etc....

g.

Robert Buels writes:
 > Reposting to bioperl list.
 > 
 > This is a really giant opportunity to expose some of the best 
 > technologists in the world to what we do in bioinformatics, and possibly 
 > to entice some of them to help us the heck out!  ;-)
 > 
 > Rob
 > 
 > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > > I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > > University.  Can you offer any lecturer recommendations and could I 
 > > fill an entire multi day thread with BioPerl lectures?  I would also 
 > > like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >
 > > Thanks for your thoughts.
 > >
 > > Heath Bair
 > > (Candybar)
 > 
 > -- 
 > Robert Buels
 > Bioinformatics Analyst, Sol Genomics Network
 > Boyce Thompson Institute for Plant Research
 > Tower Rd
 > Ithaca, NY  14853
 > Tel: 503-889-8539
 > rmb32 at cornell.edu
 > http://www.sgn.cornell.edu
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 

From cjfields at illinois.edu  Sat Jun 27 00:28:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 23:28:14 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <19013.39182.97468.604560@already.dhcp.gene.com>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<19013.39182.97468.604560@already.dhcp.gene.com>
Message-ID: <EB3EB763-05F4-4F75-88F5-8A642E567ABA@illinois.edu>

Agree (and should add GMOD/Gbrowse to that as well).

chris

On Jun 26, 2009, at 10:59 PM, George Hartzell wrote:

>
> This does seems like a great opportunity.  I think you/the-community
> could put together at least a day, and maybe more, of Bio and Perl
> stuff.  I think that it's important to range beyond the stuff that's
> in the BioPerl namespace and pull in something from the Gene Ontology
> project, the Ensembl project[s], maybe libbio, etc....
>
> g.
>
> Robert Buels writes:
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best
>> technologists in the world to what we do in bioinformatics, and  
>> possibly
>> to entice some of them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would
>>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State
>>> University.  Can you offer any lecturer recommendations and could I
>>> fill an entire multi day thread with BioPerl lectures?  I would also
>>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sat Jun 27 00:56:41 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 00:56:41 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <E6D907E51B8D477FBB635ED4B500C257@NewLife>

I think BioPerl has enough to talk about to have its own conference, 
which would coincide with its 15th anniversary in 2010. That may 
put the kibosh on the original  intent of the inviter, which ultimately is 
to get The Dominus to bite (and more power to her, I say. My 
programming style is forever changed, and I haven't even finished
The Book). 

If someone organizes it, I'll bring the chips and dip.
MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Cc: <BAIRH at nationwide.com>
Sent: Friday, June 26, 2009 5:06 PM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


> Reposting to bioperl list.
> 
> This is a really giant opportunity to expose some of the best 
> technologists in the world to what we do in bioinformatics, and possibly 
> to entice some of them to help us the heck out!  ;-)
> 
> Rob
> 
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
>> University.  Can you offer any lecturer recommendations and could I 
>> fill an entire multi day thread with BioPerl lectures?  I would also 
>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
> 
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From maj at fortinbras.us  Sat Jun 27 01:30:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 01:30:34 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net><4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <B44649FB157145A3BE7153D163802926@NewLife>

[...to *him*, that is...pardon]

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Robert Buels" <rmb32 at cornell.edu>; "BioPerl List" 
<bioperl-l at lists.open-bio.org>
Sent: Saturday, June 27, 2009 12:56 AM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


>I think BioPerl has enough to talk about to have its own conference, which 
>would coincide with its 15th anniversary in 2010. That may put the kibosh on 
>the original  intent of the inviter, which ultimately is to get The Dominus to 
>bite (and more power to her, I say. My programming style is forever changed, 
>and I haven't even finished
> The Book).
> If someone organizes it, I'll bring the chips and dip.
> MAJ
> ----- Original Message ----- 
> From: "Robert Buels" <rmb32 at cornell.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Cc: <BAIRH at nationwide.com>
> Sent: Friday, June 26, 2009 5:06 PM
> Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
>
>
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best technologists 
>> in the world to what we do in bioinformatics, and possibly to entice some of 
>> them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would like to 
>>> have a "BioPerl" thread at YAPC::NA::2010 at Ohio State University.  Can you 
>>> offer any lecturer recommendations and could I fill an entire multi day 
>>> thread with BioPerl lectures?  I would also like to "entice" MJD to come to 
>>> YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kpclancy at hotmail.com  Sat Jun 27 06:04:20 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Sat, 27 Jun 2009 04:04:20 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <COL107-W978FB7B4A3E98561F84E5CE320@phx.gbl>


I think ismb will be in Boston in 2010 (feels odd just typing that...)

maybe that is enough of a running start to set something up.

kevin
 
> CC: jay at jays.net; vecchi.b at gmail.com; bioperl-l at bioperl.org
> From: cjfields at illinois.edu
> To: kpclancy at hotmail.com
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> Date: Wed, 24 Jun 2009 22:54:28 -0500
> 
> I have no idea; I don't think there are many bioperl devs attending 
> this year unfortunately. Any meetings in the next year where we could 
> set up a bioperl hackathon? I will likely be available to attend if 
> it's stateside...
> 
> chris
> 
> On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:
> 
> >
> > is there an intention to have a hackathon at ISMB this weekend - I 
> > know there is a 2 day BOSC
> > kevin
> >
> >> From: cjfields at illinois.edu
> >> To: jay at jays.net
> >> Date: Wed, 24 Jun 2009 16:10:34 -0500
> >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> >>
> >>
> >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> >>
> >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >>>> Let me know if anyone needs collab on biomoose on github; Mark
> >>>> Jensen's already added.
> >>>
> >>> Anything on github should be trivial, even with no perms -- we can
> >>> just fork and then send you (whoever) pull requests. github++ :)
> >>>
> >>>> 1) Any help towards bugzilla fixes would be most welcome.
> >>>
> >>> I don't know how to make any progress in bugzilla if no one has a
> >>> commit bit...?
> >>
> >> For some reason I thought you had a commit bit; we can add you in if
> >> needed. Anyway, patches are most definitely welcome ;>
> >>
> >>>> 2) Better GFF3 integration
> >>>> 3) Typed but lightweight seqfeatures
> >>>
> >>> Are there bugzilla tickets (or somewhere) describing those?
> >>
> >> No as the issues are more complex than one single bug, but we do have
> >> something to help track for the time being:
> >>
> >> http://www.bioperl.org/wiki/GFF_Refactor
> >> http://www.bioperl.org/wiki/Align_Refactor
> >>
> >> I'll probably file TODOs during the process for those refactors. The
> >> easiest to tackle would be probably be Align/LocatableSeq refactors.
> >>
> >>> I wonder if anyone can help me get out of sporadic MailMan
> >>> purgatory...
> >>>
> >>> Thanks,
> >>>
> >>> j
> >>
> >> -c
> >>
> >> PS - Don't feel constrained by the above. There are many many areas
> >> to contribute to.
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hartzell at alerce.com  Sat Jun 27 13:08:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 27 Jun 2009 10:08:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <19014.20986.867646.940277@already.dhcp.gene.com>


I had an eye-opening time at YAPC, and I think that it would be very
powerful to have many members of the Bio & Perl community rubbing
elbows with the folks leading (and following, for that matter) the
"Modern Perl" movement (in the broader sense, not _just_ chromatic):
Moose, DBIx::Class, Dist::Zilla, KiokoDB, etc....  I think that it
would help pull BioPerl and the others towards powerful mainstream
technologies and expose many of us to new people, tricks, and tools.
Having us off on our own, or mingling with ISMB'ers, doesn't really
stir the pot.

g.


Mark A. Jensen writes:
 > I think BioPerl has enough to talk about to have its own conference, 
 > which would coincide with its 15th anniversary in 2010. That may 
 > put the kibosh on the original  intent of the inviter, which ultimately is 
 > to get The Dominus to bite (and more power to her, I say. My 
 > programming style is forever changed, and I haven't even finished
 > The Book). 
 > 
 > If someone organizes it, I'll bring the chips and dip.
 > MAJ
 > ----- Original Message ----- 
 > From: "Robert Buels" <rmb32 at cornell.edu>
 > To: "BioPerl List" <bioperl-l at lists.open-bio.org>
 > Cc: <BAIRH at nationwide.com>
 > Sent: Friday, June 26, 2009 5:06 PM
 > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
 > 
 > 
 > > Reposting to bioperl list.
 > > 
 > > This is a really giant opportunity to expose some of the best 
 > > technologists in the world to what we do in bioinformatics, and possibly 
 > > to entice some of them to help us the heck out!  ;-)
 > > 
 > > Rob
 > > 
 > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > >> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > >> University.  Can you offer any lecturer recommendations and could I 
 > >> fill an entire multi day thread with BioPerl lectures?  I would also 
 > >> like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >>
 > >> Thanks for your thoughts.
 > >>
 > >> Heath Bair
 > >> (Candybar)
 > > 
 > > -- 
 > > Robert Buels
 > > Bioinformatics Analyst, Sol Genomics Network
 > > Boyce Thompson Institute for Plant Research
 > > Tower Rd
 > > Ithaca, NY  14853
 > > Tel: 503-889-8539
 > > rmb32 at cornell.edu
 > > http://www.sgn.cornell.edu
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > > 
 > >
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 

From richard.harrison at edinburgh.ac.uk  Mon Jun 29 18:43:54 2009
From: richard.harrison at edinburgh.ac.uk (Richard Harrison)
Date: Mon, 29 Jun 2009 23:43:54 +0100
Subject: [Bioperl-l] PopGen
Message-ID: <5FBB6056-386D-42E3-8236-1FEB8F5BE520@edinburgh.ac.uk>

Dear all,

I am having trouble with the PopGen modules and I was wondering if  
anyone had any ideas.

I am working with polymorphism data. I am trying to identify the  
derived vs ancestral allele between two species. I have been modifying  
the modules a bit to include different site models etc.  Here is where  
I fall over:

Within aln_to_population I can create a modified Genotype object to  
include details of the ancestral allele (see at end of this post).

However,  the problem that I have hit upon is that aln_to_population  
returns a population object, filled with IndividualI objects.  In  
other words, it takes my array of GenotypeI objects and converts them  
into IndividualI objects, wrapped in a single Population object.  This  
means that the information in the GenotypeI object about the ancestral/ 
derived states is lost. How can I overcome this?


Thanks,
Richard


###excerpt from aln_to_population


  $inds[$i]->add_Genotype(Bio::PopGen::Genotype->new
					   (-marker_name  => $nm,
					    -individual_id=> $inds[$i]->unique_id,
					    -alleles      => [$genotypes[$i]],
					    -outgroup      => $outgroup[0]));


###excerpt from Genotypes.pm

sub new {
   my($class, at args) = @_;

   my $self = $class->SUPER::new(@args);
   my ($name,$desc,$type,$uid,$af,$og) = $self->_rearrange([qw(NAME
							  DESCRIPTION
							  TYPE
							  UNIQUE_ID
							  ALLELE_FREQ
							  OUTGROUP)], at args);
   $self->{'_allele_freqs'} = {};
   $self->{'_outgroup_name'} = {};

   if( ! defined $uid ) {
       $uid = $UniqueCounter++;
   }
   if( defined $name) {
       $self->name($name);
   } else {
       $self->throw("Must provide a name when initializing a Marker");
   }
   defined $desc && $self->description($desc);
   defined $type && $self->type($type);


       $self->outgroup_name($og);


   $self->unique_id($uid);

   return $self;
}

=head2 og
  Title   : name
  Usage   : my $name = $marker->og();
  Function: Get the name of the outgroup
  Returns : string representing the name of the marker
  Args    : [optional] name


=cut

sub outgroup_name{
     my $self = shift;

     return $self->{'_outgroup_name'} = shift if @_;
     return $self->{'_outgroup_name'};
}


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Tue Jun 30 01:03:08 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 29 Jun 2009 22:03:08 -0700
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <E6D82027-AF55-4E64-BC8F-71F3F60D0E7E@bioperl.org>

There are several flavors of TIGR XML for rice and arabidoposis, and  
other projects etc, I don't know which is tracked with the current  
tigrxml version unfortunately but one can compare the test files in t/ 
data to the versions downloaded to see what is currently supported.   
Usually the gbk will be more consistently parseable but we can try and  
work it out if it is a sensible transformation.

On Jun 26, 2009, at 2:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From paola.bisignano at gmail.com  Tue Jun 30 05:12:49 2009
From: paola.bisignano at gmail.com (Paola Bisignano)
Date: Tue, 30 Jun 2009 11:12:49 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25
In-Reply-To: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
Message-ID: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>

Hi,
I need a little help, to parse a file, but I tried to search some
modules of bioperl, but there are a lot, and I don't know how to
start, I find moduls for all db, for different web site, but not for
my favorite PDBsum....so I parsed a lot of thing on my own, even if I
was new in learning perl....but now I'm waiting for help...because I
need to parse a FASTA file, resulted from aligned sequences...I need
to extract the aligned sequences, only for the pdb in my lista....


my fasta file is like:

Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
  1>>>Sequence 3e7e:A - 333 aa
Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
17840403 residues in 79353 sequences

       opt      E()
< 20   286     0:===
  22     1     0:=          one = represents 135 library sequences
  24     1     0:=
  26     0     2:*
  28    21    18:*
  30    36   109:*
  32   237   421:== *
  34   956  1140:========*
  36  1924  2342:===============  *
  38  3591  3871:=========================== *
  40  4904  5400:=====================================  *
  42  6750  6600:================================================*=
  44  7145  7281:=====================================================*
  46  8047  7416:======================================================*=====
.........

>>2np8:A                                                  (159 aa)
 initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
overlap (59-204:13-153)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                                 ::
2np8:A                                               QWALEDFEIGRPLG
                                                             10

               70          80        90         100        110
Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
           20        30        40        50        60        70

         120         130       140       150       160       170
Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
             80        90       100            110       120

           180       190        200       210       220       230
Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
       : ::::.:..::      ::: : . :.: :.
2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
       130             140       150

            240       250       260       270       280       290
Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP

            300       310       320       330
Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

>>2ojg:A                                                  (337 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:1-204)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2ojg:A                                              FDVGPRYTNLSYI-G
                                                            10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
           20        30         40        50             60

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
       70        80        90        100       110       120

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
       130       140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
            190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
            250       260       270       280       290       300

2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
            310       320       330

>>2oji:A                                                  (344 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:5-208)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2oji:A                                          RGQVFDVGPRYTNLSYI-G
                                                        10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
       20        30        40         50             60        70

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
             80        90        100       110       120       130

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
             140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
        190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
        250       260       270       280       290       300

2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
        310       320       330       340

.......
I show a part of the file...if I want for example only that two
alignment? are there moduls to parse...because I've tried to parse
whit regex but....without results :-(....
If anyone has suggestion for muduls or anything else, I'll be very
happy to learn
thanks
Paola

From giles.weaver at googlemail.com  Tue Jun 30 07:28:25 2009
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Tue, 30 Jun 2009 12:28:25 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>

I'm developing a transcriptomics database for use with next-gen data, and
have found processing the raw data to be a big hurdle.

I'm a bit late in responding to this thread, so most issues have already
been discussed. One thing that hasn't been mentioned is removal of adapters
from raw Illumina sequence. This is a PITA, and I'm not aware of any well
developed and documented open source software for removal of adapters (and
poor quality sequence) from Illumina reads.

My current Illumina sequence processing pipeline is an unholy mix of
biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting
the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure
perl to trim the poor quality sequence from each read, and bioperl with
emboss to remove the adapter sequence. I'm aware that the pipeline contains
bugs and would like to simplify it, but at least it does work...

Ideally I'd like to replace as much of the pipeline as possible with
bioperl/bioperl-run, but this isn't currently possible due to both a lack of
features and poor performance. I'm sure the features will come with time,
but the performance is more of a concern to me. I wonder if Bio::Moose might
be used to alleviate some of the performance issues? Might next-gen modules
be an ideal guinea pig for Bio::Moose?

For my purposes the tools that would love to see supported in
bioperl/bioperl-run are:

   - next-gen sequence quality parsing (to output phred scores)
   - sequence quality based trimming
   - sequencing adapter removal
   - filtering based on sequence complexity (repeats, entropy etc)
   - bioperl-run modules for bowtie etc.

Obviously all of these need to be fast!
I'd love to muck in, but I doubt I'll contribute much before
Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares!

Regarding trimming bad quality bases (see comments from Tristan Lefebure)
from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be
much faster than a primarily bioperl based implementation. I found
Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My
current code trims ~1300 sequences/second, including unzipping the raw data
and converting it to sanger fastq with biopython. Processing an entire
sequencing run with the whole pipeline takes in the region of 6-12h.

Hope this looooong post was of interest to someone!

Giles

2009/6/17 Tristan Lefebure <tristan.lefebure at gmail.com>

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
>
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan
>

From manchunjohn-ma at uiowa.edu  Tue Jun 30 12:17:08 2009
From: manchunjohn-ma at uiowa.edu (John M.C. Ma)
Date: Tue, 30 Jun 2009 11:17:08 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker crashes perl
Message-ID: <5486b2980906300917m20e8cd06sbaee207aed3a27c9@mail.gmail.com>

Hi everyone,

(OS: OpenSuSE 11.1, Versions: Perl:v5.10.0-i586-linux-thread-multi,
Bioperl: 1.6.0-cpan, Bioperl-run: 1.6.1-cpan, Ensembl: Ver 54-cvs)

This is the first time I use Bio::Tools::Run::RepeatMasker, and it
came with a strange crash that I can't think of a reason. I would
rather think it's my problem?

My code involved pulling a sequence from Ensembl-variation, put it
into a PrimarySeq Object and run RepeatMasker on it:

use strict;
use warnings;
use Bio::SeqIO;
use Bio::PrimarySeq;
use Bio::Tools::Run::RepeatMasker;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Variation::Variation;
[snips most Ensembl code as the sequence itself looks OK]
	my $ref_allele=$snp_obj->five_prime_flanking_seq.${$snp_obj->get_all_Alleles}[0]->allele.$snp_obj->three_prime_flanking_seq;
	my $mask_seq=Bio::PrimarySeq->new (-seq=>$ref_allele);
	my $rmasker_handle=Bio::Tools::Run::RepeatMasker->new(-species=>'rat',-noisy=>"1");
	my @masked_features=$rmasker_handle->run($mask_seq);
	my $masked_seq=$rmasker_handle->run;

And when I let the wrapper run, perl crashed with these warnings:

--------------------- WARNING ---------------------
MSG: RepeatMasker didn't find any repetitive sequences

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/EWLAmIVymd/wByClB8iqr.masked: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::Root::IO::_initialize_io
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/IO.pm:310
STACK: Bio::SeqIO::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:450
STACK: Bio::SeqIO::fasta::_initialize
/usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO/fasta.pm:81
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:347
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:373
STACK: Bio::Tools::Run::RepeatMasker::_run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:320
STACK: Bio::Tools::Run::RepeatMasker::run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:260
STACK: main::SeqList
/home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:40
STACK: /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:63
-----------------------------------------------------------

What could happen?

Cheers,

John Ma,
University of Iowa

From cjfields at illinois.edu  Tue Jun 30 13:46:27 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 12:46:27 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
Message-ID: <6723B5A0-9A21-4851-BD88-0BA3CC107439@illinois.edu>


On Jun 30, 2009, at 6:28 AM, Giles Weaver wrote:

> I'm developing a transcriptomics database for use with next-gen  
> data, and
> have found processing the raw data to be a big hurdle.
>
> I'm a bit late in responding to this thread, so most issues have  
> already
> been discussed. One thing that hasn't been mentioned is removal of  
> adapters
> from raw Illumina sequence. This is a PITA, and I'm not aware of any  
> well
> developed and documented open source software for removal of  
> adapters (and
> poor quality sequence) from Illumina reads.
>
> My current Illumina sequence processing pipeline is an unholy mix of
> biopython, bioperl, pure perl, emboss and bowtie. Biopython for  
> converting
> the Illumina fastq to Sanger fastq, bioperl to read the quality  
> values, pure
> perl to trim the poor quality sequence from each read, and bioperl  
> with
> emboss to remove the adapter sequence. I'm aware that the pipeline  
> contains
> bugs and would like to simplify it, but at least it does work...

My local bioperl is working with FASTQ parsing of Sanger and Illumina  
(but not solexa yet).  I'll commit what I have today, and we should be  
able to add in solexa soon.  We'll also need to add in write_seq  
support.

> Ideally I'd like to replace as much of the pipeline as possible with
> bioperl/bioperl-run, but this isn't currently possible due to both a  
> lack of
> features and poor performance. I'm sure the features will come with  
> time,
> but the performance is more of a concern to me. I wonder if  
> Bio::Moose might
> be used to alleviate some of the performance issues? Might next-gen  
> modules
> be an ideal guinea pig for Bio::Moose?

We should get FASTQ working in core first then optimize on speed (as  
Elia previously pointed out).  We can do that within the actual SeqIO  
parser using a few simple tricks. For instance my local  
Bio::SeqIO::fastq has a reconfigured next_seq to call an iterator that  
returns raw processed data as a simple hash ref; users have access to  
that method, so if one wanted they could retrieve the raw data  
directly, or pass it through a filter that only creates seq instances  
one wants on the fly (that would be where your quality checks, adaptor  
modification, etc. fit in).

In the end it might be to wrap a C/C++-based solution for speed.  As  
mentioned previously a C-based parser exists from Sanger Centre that  
we could incorporate in some fashion, but I would like if it were able  
to report back file position for fast indexing.  The code is fairly  
simple so it should be too hard to incorporate that in somehow.

Just so there is no confusion, Bio::Moose is an attempt to both lay  
out plans for perl6 and deal with inheritance issues within bioperl  
now. It's still in very early development and may not see a release  
until Dec. at the very earliest, it will be an alpha release then, and  
likely won't have every major class represented at that point.  It's  
also not intended to be backwards-compatible with bioperl core.  It  
may help, but that's not an absolute certainty.  As for bioperl6, it  
will be pre-alpha until perl6 spec reaches a stable draft and we have  
an active implementation.

> For my purposes the tools that would love to see supported in
> bioperl/bioperl-run are:
>
>   - next-gen sequence quality parsing (to output phred scores)
>   - sequence quality based trimming
>   - sequencing adapter removal
>   - filtering based on sequence complexity (repeats, entropy etc)
>   - bioperl-run modules for bowtie etc.
>
> Obviously all of these need to be fast!
> I'd love to muck in, but I doubt I'll contribute much before
> Bio::Moose/bioperl6, as the (bio)perl object system gives me  
> nightmares!

One can only read a file so fast (even with a highly optimized C/C++  
based parser), but I don't think that will be the limiting factor as  
much as object instantiation.

> Regarding trimming bad quality bases (see comments from Tristan  
> Lefebure)
> from Solexa/Illumina reads, I did find a mixed pure/bioperl solution  
> to be
> much faster than a primarily bioperl based implementation. I found
> Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow.  
> My
> current code trims ~1300 sequences/second, including unzipping the  
> raw data
> and converting it to sanger fastq with biopython. Processing an entire
> sequencing run with the whole pipeline takes in the region of 6-12h.

Right, hence coming up with a 'pre-filter' for raw data (hash refs)  
prior to object instantiation to speed things up.  This will be a bit  
easier with Bio::Moose as we can introspect attributes via the meta  
class, but this will be a while yet.

> Hope this looooong post was of interest to someone!
>
> Giles

It's always good to hear about such issues and what one expects.

chris

From cjfields at illinois.edu  Tue Jun 30 17:58:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 16:58:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <A9776DF4-CE78-4973-9ADC-7594A3DAA118@illinois.edu>

All,

I have committed the first run at adding Illumina/Solexa parsing for  
FASTQ along with tests.  It's very possible the quality scores are  
off, particularly for Solexa (Illumina 1.0), so test away and let me  
know if anything pops up (should be a quick fix).  Along with that is  
a small commit to Bio::SeqIO so that we can add format variants (see  
below for an example).  write_seq/write_qual/write_fastq will likely  
not work as expected as I haven't touched them; they are to be tackled  
next.

For faster parsing I have also added a next_dataset method that  
returns a hash reference to the parsed data instead of an object; this  
hash includes quality scores.  This method is called by next_seq and  
the relevant data is passed in to the sequence factory directly; one  
could do something like the following to filter sequences as needed:

use Modern::Perl;
use Bio::SeqIO;
use Bio::Seq::SeqFactory;

my $file = shift;

# same as (-format   => 'fastq', -variant => 'illumina')
my $in = Bio::SeqIO->new(-file     => $file,
                          -format   => 'fastq-illumina');

my $factory = Bio::Seq::SeqFactory->new(-type => 'Bio::Seq::Quality');

while (my $data = $in->next_dataset) {
     next if seq_is_crap($data);
     my $seq = $factory->create(%$data);
}

sub seq_is_crap { # filter here
}


chris

From maj at fortinbras.us  Tue Jun 30 21:41:16 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 30 Jun 2009 21:41:16 -0400
Subject: [Bioperl-l] Parsing a FASTA file (Was:  Bioperl-l Digest, Vol 74,
	Issue 25)
In-Reply-To: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
	<e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
Message-ID: <9D386274308C4DF98E38918477801541@NewLife>

Hi Paola, 

You want to try Bio::SearchIO, I think. It's not quite clear what you 
want to do, but here's an example of what you can do: 

Get all high-scoring pairs ( the mini-alignments ) involving
the database sequence called "2ojg:A"--

 use Bio::SearchIO;
 
 my $io = Bio::SearchIO->new(-format=>'fasta', -file=>'yourfile.fasta');
 my $result = $io->next_result;
 my @desired_hsps;

 while ( my $hit = $result->next_hit ) {
   push @desired_hsps, grep { $_->subject->seq_id =~ /2ojg:A/ } $hit->hsps;
 }
 
 # now all your desired hsps are in the array @desired_hsps;
 # you can get Bio::SimpleAlign objects from them all, for example:
 my @aligns = map { $_->get_aln } @desired_hsps;
 #...and lots of other things...

Look at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
and http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods 
for a nice introduction to the Bio::SearchIO system by its authors. They 
use a blast output as an example, but everything applies to fasta output 
as well.

You didn't waste your time writing regexps, by the way. For a Perl
student, that kind of work is like money in the bank.

cheers, 
Mark
      

----- Original Message ----- 
From: "Paola Bisignano" <paola.bisignano at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 30, 2009 5:12 AM
Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25


> Hi,
> I need a little help, to parse a file, but I tried to search some
> modules of bioperl, but there are a lot, and I don't know how to
> start, I find moduls for all db, for different web site, but not for
> my favorite PDBsum....so I parsed a lot of thing on my own, even if I
> was new in learning perl....but now I'm waiting for help...because I
> need to parse a FASTA file, resulted from aligned sequences...I need
> to extract the aligned sequences, only for the pdb in my lista....
> 
> 
> my fasta file is like:
> 
> Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
>  1>>>Sequence 3e7e:A - 333 aa
> Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
> 17840403 residues in 79353 sequences
> 
>       opt      E()
> < 20   286     0:===
>  22     1     0:=          one = represents 135 library sequences
>  24     1     0:=
>  26     0     2:*
>  28    21    18:*
>  30    36   109:*
>  32   237   421:== *
>  34   956  1140:========*
>  36  1924  2342:===============  *
>  38  3591  3871:=========================== *
>  40  4904  5400:=====================================  *
>  42  6750  6600:================================================*=
>  44  7145  7281:=====================================================*
>  46  8047  7416:======================================================*=====
> .........
> 
>>>2np8:A                                                  (159 aa)
> initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
> Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
> overlap (59-204:13-153)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                                 ::
> 2np8:A                                               QWALEDFEIGRPLG
>                                                             10
> 
>               70          80        90         100        110
> Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
>       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
> 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
>           20        30        40        50        60        70
> 
>         120         130       140       150       160       170
> Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
>        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
> 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
>             80        90       100            110       120
> 
>           180       190        200       210       220       230
> Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
>       : ::::.:..::      ::: : . :.: :.
> 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
>       130             140       150
> 
>            240       250       260       270       280       290
> Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP
> 
>            300       310       320       330
> Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
>>>2ojg:A                                                  (337 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:1-204)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2ojg:A                                              FDVGPRYTNLSYI-G
>                                                            10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>           20        30         40        50             60
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>       70        80        90        100       110       120
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>       130       140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>            190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>            250       260       270       280       290       300
> 
> 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
>            310       320       330
> 
>>>2oji:A                                                  (344 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:5-208)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2oji:A                                          RGQVFDVGPRYTNLSYI-G
>                                                        10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>       20        30        40         50             60        70
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>             80        90        100       110       120       130
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>             140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>        190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>        250       260       270       280       290       300
> 
> 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
>        310       320       330       340
> 
> .......
> I show a part of the file...if I want for example only that two
> alignment? are there moduls to parse...because I've tried to parse
> whit regex but....without results :-(....
> If anyone has suggestion for muduls or anything else, I'll be very
> happy to learn
> thanks
> Paola
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Tue Jun 30 23:48:11 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 22:48:11 -0500
Subject: [Bioperl-l] FASTQ output
Message-ID: <A6217D90-4861-4EEB-B2D8-F3565B81EB4B@illinois.edu>

I am working on FASTQ output and noticed a real oddity.  Apparently,  
there are three write_* methods for this module, with the odd choice  
of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ.   
write_qual() writes Qual format:

http://www.bioperl.org/wiki/Qual_sequence_format

and write_fastq() writes FASTQ.  Now, maybe it's just me, but I think  
an implementation of write_seq() for a specific format should probably  
output that format and not something else entirely unexpected.  Also,  
is there a reason for duplicating output code for qual and FASTA  
output within Bio::SeqIO::fastq, i.e. should we call Bio::SeqIO::fasta/ 
qual instead?

I would consider the write_seq() issue a bug, the others are really  
just maintenance issues.  Anyone have problems with me changing that  
up a bit?

chris

From upgrade32009 at live.com  Mon Jun 29 20:07:57 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:07:57 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780056@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team

From upgrade32009 at live.com  Mon Jun 29 20:10:43 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:10:43 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780088@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team

From Jonas_Schaer at gmx.de  Sun Jun 28 06:15:18 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Sun, 28 Jun 2009 12:15:18 +0200
Subject: [Bioperl-l] different results with remote-blast skript
Message-ID: <D6BA00577BC94BDFAB04DF5EF43E9598@jonas>

Hi again :)
please, I only have this little question:
why do I get different results with my remote::blast perl skript then on the ncbi blast homepage?
I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'.
PLEASE help me,
thank you in advance,
Jonas

ps: my skript:
################################################################################
use Bio::Seq::SeqFactory;
  use Bio::Tools::Run::RemoteBlast;
  use strict;
  my @blast_report;
  my $prog = 'blastp';
  my $db   = 'nr';
  my $e_val= '1e-10';
  #my $e_val= '10';
  my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
 $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
  
  my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE';
  #$v is just to turn on and off the messages
  my $v = 1;
  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');   
  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); 
  my $filename='temp2.out';
  my $r = $factory->submit_blast($seq);
  print STDERR "waiting..." if( $v > 0 );
    while ( my @rids = $factory->each_rid ) 
    {
        foreach my $rid ( @rids ) 
        {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) 
            {
                if( $rc < 0 ) 
                {
                    $factory->remove_rid($rid);
                }
                print STDERR "." if ( $v > 0 );
            } 
                else    
                {
                    my $result = $rc->next_result();
                    $factory->save_output($filename);
                    $factory->remove_rid($rid);
                    print "\nQuery Name: ", $result->query_name(), "\n";
                    while ( my $hit = $result->next_hit ) 
                    {
                        next unless ( $v > 0);
                        print "\thit name is ", $hit->name, "\n";
                        while( my $hsp = $hit->next_hsp ) 
                        {
                            print "\t\tscore is ", $hsp->score, "\n";
                        }
                    }
                }
        }
   
    
    }
@blast_report = get_file_data ($filename);
return @blast_report;
##################################################################################

From stevey_mac2k2 at hotmail.com  Sun Jun 28 06:53:04 2009
From: stevey_mac2k2 at hotmail.com (stephenmcgowan1)
Date: Sun, 28 Jun 2009 03:53:04 -0700 (PDT)
Subject: [Bioperl-l]  Installing Bioperl on Mac OS X 10.5.7
Message-ID: <24240541.post@talk.nabble.com>


Hi,

I'm new to the mac way of working and programming aswell as the UNIX
(Terminal) environment. I will describe in as much detail as i can as to
what i have done so far in terms of bioperl installation and try to describe
what my problem is.

Ok so first of all i have downloaded and extracted the files BioPerl-1.6.0
and BioPerl-db-1.6.0 from the site. I have these two folders saved in a
folder on my OSX desktop called "ExerciseTwo".

After doing this, i open up Terminal and locate BioPerl-1.6.0.

i then run:

perl Build.PL (i have also tried sudo perl Build.pl)

i then run ./Build test (again tried this with sudo ./Build test)

after running the build test, i receive the feedback:

Failed Test                              Stat Wstat Total Fail  Failed  List
of Failed
-------------------------------------------------------------------------------
t/AlignIO/AlignIO.t                    255 65280    28   42 150.00%  8-28
t/AlignIO/arp.t                         255 65280    48   92 191.67%  3-48
t/Annotation/Annotation.t          255 65280   159   83  52.20%  9 117
119-159
t/ClusterIO/SequenceFamily.t    255 65280    19   34 178.95%  3-19
t/LocalDB/Flat.t                       255 65280    24   20  83.33%  15-24
t/LocalDB/Index.t                     255 65280    64   66 103.12%  32-64
t/RemoteDB/BioFetch.t              255 65280    36    2   5.56%  36
t/RemoteDB/DB.t                      3   768   113   59  52.21%  83-113
t/RemoteDB/EUtilities.t              1   256   309    1   0.32%  307
t/SeqIO/Handler.t                     255 65280   550 1098 199.64%  2-550
t/SeqIO/chaos.t                        1   256     8    1  12.50%  1
t/SeqIO/swiss.t                        255 65280   240  479 199.58%  1-240
t/SeqTools/GuessSeqFormat.t          1   256    49    2   4.08%  25 50
t/Tools/Analysis/Protein/ELM.t     255 65280    15   22 146.67%  5-15
t/Tools/Analysis/Protein/Scansite  255 65280    14   20 142.86%  5-14
t/Tools/Run/WrapperBase.t            1   256    27    1   3.70%  20
44 tests and 250 subtests skipped.
Failed 16/318 test scripts, 94.97% okay. 1015/15518 subtests failed, 93.46%
okay

Ok so going off this i then decide to run the install: ./Build install

This is a segment of the info i receive back in Terminal after the install:

Manifying blib/script/bp_pairwise_kaks.pl ->
blib/bindoc/bp_pairwise_kaks.pl.1
Manifying blib/script/bp_seqret.pl -> blib/bindoc/bp_seqret.pl.1
Manifying blib/script/bp_seq_length.pl -> blib/bindoc/bp_seq_length.pl.1
Manifying blib/script/bp_query_entrez_taxa.pl ->
blib/bindoc/bp_query_entrez_taxa.pl.1
Manifying blib/script/bp_load_gff.pl -> blib/bindoc/bp_load_gff.pl.1
Manifying blib/script/bp_fastam9_to_table.pl ->
blib/bindoc/bp_fastam9_to_table.pl.1
Manifying blib/script/bp_process_wormbase.pl ->
blib/bindoc/bp_process_wormbase.pl.1
Manifying blib/script/bp_nrdb.pl -> blib/bindoc/bp_nrdb.pl.1
Manifying blib/script/bp_composite_LD.pl -> blib/bindoc/bp_composite_LD.pl.1
Manifying blib/script/bp_classify_hits_kingdom.pl ->
blib/bindoc/bp_classify_hits_kingdom.pl.1
Manifying blib/script/bp_blast2tree.pl -> blib/bindoc/bp_blast2tree.pl.1
Manifying blib/script/bp_heterogeneity_test.pl ->
blib/bindoc/bp_heterogeneity_test.pl.1
Manifying blib/script/bp_generate_histogram.pl ->
blib/bindoc/bp_generate_histogram.pl.1
Manifying blib/script/bp_process_gadfly.pl ->
blib/bindoc/bp_process_gadfly.pl.1
mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

now these bp_files such as bp_nrdb.pl should be installed onto my Unix
somewhere? but i'm not sure if the install has worked, and these files saved
to the made directory, as is the case here:

mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

is there something wrong with my install? i think /usr/local/share should be
created and then all of these bp_files should go into this folder. Is there
anything that i'm doing wrong here?

Thanks

Stephen.


-- 
View this message in context: http://www.nabble.com/Installing-Bioperl-on-Mac-OS-X-10.5.7-tp24240541p24240541.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From w.bryant at ucl.ac.uk  Mon Jun  1 04:06:58 2009
From: w.bryant at ucl.ac.uk (Will Bryant)
Date: Mon, 01 Jun 2009 09:06:58 +0100
Subject: [Bioperl-l] Extract genomic data from GenBank
Message-ID: <4A238C22.9090604@ucl.ac.uk>

I'm trying to retrieve the complete GenBank format sequence file for a 
specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not 
exist' errors, even when trying the example gi '405830'.  The script was 
running fine September last year, but when I came back to it this week 
it wasn't working.  Am I missing something obvious?

In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100

Code:

#!/usr/bin/perl -w

use strict;
use Bio::Perl;
use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank');

my $straincomp = $gb->get_Seq_by_gi('405830');

my $seqout = 0;

#my $set_output_file = '$seqout = Bio::SeqIO->new( -format => 
\'genbank\', -file => 
\'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');';

#print $set_output_file;
eval ($set_output_file);

$seqout -> write_seq($straincomp);


Error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: gi does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi 
c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209
STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12
-----------------------------------------------------------

Many thanks,

Will Bryant.


From David.Messina at sbc.su.se  Mon Jun  1 05:04:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 1 Jun 2009 11:04:40 +0200
Subject: [Bioperl-l] Extract genomic data from GenBank
In-Reply-To: <4A238C22.9090604@ucl.ac.uk>
References: <4A238C22.9090604@ucl.ac.uk>
Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com>

Hey Will,
I think there have been API changes in GenBank's remote query interface that
have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to
BioPerl 1.6 and see if that works for you.

(Note that I've only glanced at your code -- I'm assuming that's not the
problem since it worked fine for you before.)


Dave


From fontanez at fas.harvard.edu  Mon Jun  1 08:41:06 2009
From: fontanez at fas.harvard.edu (Kristina Fontanez)
Date: Mon, 1 Jun 2009 08:41:06 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<4A205502.2030701@sendu.me.uk>
	<024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu>
	<4A205D46.4090105@sendu.me.uk>
	<C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu>

Hey everyone-

Thanks for all the advice. I reinstalled Xcode tools, installed Fink  
and downloaded bioperl successfully. It's now working smoothly.

Thanks again,
Kristina
---------------------------------------------------------------
Kristina Fontanez
PhD candidate
Department of Organismic and Evolutionary Biology
Cavanaugh lab
Harvard University
16 Divinity Ave.
Cambridge, MA 02138

tel: 617-495-1138
fax: 617-496-6933
email: fontanez at fas.harvard.edu


On May 29, 2009, at 10:40 PM, Chris Fields wrote:

Kristina,

You aren't running as superuser:

 > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez 
$ cpan

You'll need to run cpan using 'sudo cpan' if installing modules  
anywhere requiring superuser permissions.

chris

On May 29, 2009, at 5:10 PM, Sendu Bala wrote:

> Kristina Fontanez wrote:
>> Hello everyone-
>> Sendu - I took your advice but doing Install Bundle::CPAN did not  
>> take care of the dependencies. It still failed. See attached txt  
>> file with my terminal output. Does anyone have any idea how this  
>> might be?
>
> From reading the output it seems like perhaps you don't have 'make'  
> or there is something wrong when using it. If you're on a mac you  
> may need to install the dev tools. Someone else want to jump in here  
> with advice?
>
> Also, check your CPAN configuration to ensure it is trying to use  
> the correct make commands. ('o conf' etc.)
>
>
>> If I wanted to wipe all perl from my computer and simply start  
>> over, how might this be accomplished?
>
> Don't do that. At least not until you know you have a working make  
> setup.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun  1 10:55:50 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 10:55:50 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife>

Hi All 
Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. 
cheers and thanks for all input--
Mark


From cjfields at illinois.edu  Mon Jun  1 11:21:30 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 10:21:30 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>

A autogenerated passthrough Makefile.PL is generated with the  
distribution:

http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL

We may remove that in future releases, but it should work regardless  
(i.e. call Module::Build and Build.PL).  I'm pretty convinced that the  
issue was permissions-based at heart.  Note Kristina ran 'cpan'  
instead of 'sudo cpan' to invoke the shell, so the shell is using  
current user config instead of su for installation.  You need to use  
'sudo' to install anything /Library/Perl on Mac (unless you are  
already 'root', but on recent OS X version logging in as 'root' is  
turned off).

I just noticed nothing is mentioned along these lines in the  
installation docs, so we'll need to update those.

chris

On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:

> Hi Kristina,
>
> [Don't forget to reply-all, so the list stays in the loop. Many many  
> more helpers
> there.]
>
> Apparently cpan can't make the Makefile, but can download and expand  
> the
> library directories, in your .cpan directory (see edited highlights  
> below).
>
> Let's appeal to the BioPerl brethren/sestren---answers?
>
> MAJ
>
>
> term dump:
>
> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
> Terminal does not support AddHistory.
>
> cpan shell -- CPAN exploration and modules installation (v1.7602)
> ReadLine support available (try 'install Bundle::CPAN')
>
> cpan> install Test::Harness
> CPAN: Storable loaded ok
> Going to read /Users/kristinafontanez/.cpan/Metadata
> Database was generated on Fri, 29 May 2009 11:27:00 GMT
> Running install for module Test::Harness
> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
> CPAN: Digest::MD5 loaded ok
> CPAN: Compress::Zlib loaded ok
> Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ 
> ANDYA/Test-Harness-3.17.tar.gz ok
> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
> Test-Harness-3.17/
> Test-Harness-3.17/Build.PL
> ...
> Test-Harness-3.17/xt/perls/sample-tests/
> Test-Harness-3.17/xt/perls/sample-tests/perl_version
> Removing previously used /Users/kristinafontanez/.cpan/build/Test- 
> Harness-3.17
>
> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>
> Checking if your kit is complete...
> Looks good
> Writing Makefile for Test::Harness
>   -- NOT OK
> Running make test
> Can't test without successful make
> Running make install
> make had returned bad status, install seems impossible
>
> cpan> install File::HomeDir
> ...[more of same]...
>
>
> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu 
> >
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Friday, May 29, 2009 3:56 PM
> Subject: Re: [Bioperl-l] problem with bioperl install
>
>
>> Mr. Jensen-
>>
>> Thank you for your help but unfortunately the installation of
>> Test::Harness etc didn't work. I copied my terminal output and
>> attached the file. Any advice on what's still going wrong?
>>
>> Thanks,
>> Kristina
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>>
>>
>>
>> ---------------------------------------------------------------
>> Kristina Fontanez
>> PhD candidate
>> Department of Organismic and Evolutionary Biology
>> Cavanaugh lab
>> Harvard University
>> 16 Divinity Ave.
>> Cambridge, MA 02138
>>
>> tel: 617-495-1138
>> fax: 617-496-6933
>> email: fontanez at fas.harvard.edu
>>
>>
>>
>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>
>> The message says you are first updating your CPAN.pm.
>> That module needs modules you don't have, so
>>
>> use cpan to install the dependencies you don't have, viz.
>>>   Test::Harness
>>>   File::HomeDir
>>
>> $ cpan
>>> install Test::Harness
>> etc.
>> Then install CPAN.pm again (or run the Bioperl install again).
>>
>> Lather, rinse, repeat the install of Bioperl until it completes
>> without errors.
>>
>> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu
>> >
>> To: <bioperl-l at bioperl.org>
>> Sent: Friday, May 29, 2009 3:07 PM
>> Subject: [Bioperl-l] problem with bioperl install
>>
>>
>>> Hello-
>>>
>>> I am trying to install bioperl and I ran into some problems. See
>>> list  below.
>>>
>>>
>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>
>>> Checking if your kit is complete...
>>> Looks good
>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>> Writing Makefile for CPAN
>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>> CPAN-1.94.tar.gz] -----
>>>   Test::Harness
>>>   File::HomeDir
>>>
>>>
>>> How can I fix this?
>>>
>>>
>>> Thanks,
>>> Kristina
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jun  1 12:14:07 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 11:14:07 -0500
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>

I think, as long is it doesn't significantly impact SearchIO  
performance wise (from reading the HOWTO I can't see how it will), I  
say commit away. In fact, I consider this a bug fix that should be in  
the next 1.6 point release. We should add deprecation warnings where  
needed for 1.7...

chris

On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:

> Hi All
> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  
> exhibition of B::S::Tiling, use cases, code snippets, design,  
> implementation and algorithm discussions. We're just about ready to  
> port over to core from bioperl-dev; please shout out if this is not  
> a good idea.
> cheers and thanks for all input--
> Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.bolser at gmail.com  Mon Jun  1 12:27:30 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Mon, 1 Jun 2009 17:27:30 +0100
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>

2009/6/1 Chris Fields <cjfields at illinois.edu>:

...
> for installation. ?You need to use 'sudo' to install anything /Library/Perl
> on Mac (unless you are already 'root', but on recent OS X version logging in
...

local::lib is supposed to take care of this. Is this broken on Mac?
Building stuff as root is generally considered to be bad.


> I just noticed nothing is mentioned along these lines in the installation
> docs, so we'll need to update those.

I tried to write down a clear 'recipe' for getting things installed
(this was actually on the GMod wiki). I really think the install docs
could be improved. Sometimes less verbose is better.

Dan

> chris
>
> On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:
>
>> Hi Kristina,
>>
>> [Don't forget to reply-all, so the list stays in the loop. Many many more
>> helpers
>> there.]
>>
>> Apparently cpan can't make the Makefile, but can download and expand the
>> library directories, in your .cpan directory (see edited highlights
>> below).
>>
>> Let's appeal to the BioPerl brethren/sestren---answers?
>>
>> MAJ
>>
>>
>> term dump:
>>
>> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
>> Terminal does not support AddHistory.
>>
>> cpan shell -- CPAN exploration and modules installation (v1.7602)
>> ReadLine support available (try 'install Bundle::CPAN')
>>
>> cpan> install Test::Harness
>> CPAN: Storable loaded ok
>> Going to read /Users/kristinafontanez/.cpan/Metadata
>> Database was generated on Fri, 29 May 2009 11:27:00 GMT
>> Running install for module Test::Harness
>> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> CPAN: Digest::MD5 loaded ok
>> CPAN: Compress::Zlib loaded ok
>> Checksum for
>> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> ok
>> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
>> Test-Harness-3.17/
>> Test-Harness-3.17/Build.PL
>> ...
>> Test-Harness-3.17/xt/perls/sample-tests/
>> Test-Harness-3.17/xt/perls/sample-tests/perl_version
>> Removing previously used
>> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17
>>
>> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>>
>> Checking if your kit is complete...
>> Looks good
>> Writing Makefile for Test::Harness
>> ?-- NOT OK
>> Running make test
>> Can't test without successful make
>> Running make install
>> make had returned bad status, install seems impossible
>>
>> cpan> install File::HomeDir
>> ...[more of same]...
>>
>>
>> ----- Original Message ----- From: "Kristina Fontanez"
>> <fontanez at fas.harvard.edu>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Friday, May 29, 2009 3:56 PM
>> Subject: Re: [Bioperl-l] problem with bioperl install
>>
>>
>>> Mr. Jensen-
>>>
>>> Thank you for your help but unfortunately the installation of
>>> Test::Harness etc didn't work. I copied my terminal output and
>>> attached the file. Any advice on what's still going wrong?
>>>
>>> Thanks,
>>> Kristina
>>>
>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>>
>>> The message says you are first updating your CPAN.pm.
>>> That module needs modules you don't have, so
>>>
>>> use cpan to install the dependencies you don't have, viz.
>>>>
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>
>>> $ cpan
>>>>
>>>> install Test::Harness
>>>
>>> etc.
>>> Then install CPAN.pm again (or run the Bioperl install again).
>>>
>>> Lather, rinse, repeat the install of Bioperl until it completes
>>> without errors.
>>>
>>> ----- Original Message ----- From: "Kristina Fontanez"
>>> <fontanez at fas.harvard.edu
>>> >
>>> To: <bioperl-l at bioperl.org>
>>> Sent: Friday, May 29, 2009 3:07 PM
>>> Subject: [Bioperl-l] problem with bioperl install
>>>
>>>
>>>> Hello-
>>>>
>>>> I am trying to install bioperl and I ran into some problems. See
>>>> list ?below.
>>>>
>>>>
>>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>>
>>>> Checking if your kit is complete...
>>>> Looks good
>>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>>> Writing Makefile for CPAN
>>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>>> CPAN-1.94.tar.gz] -----
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>>
>>>>
>>>> How can I fix this?
>>>>
>>>>
>>>> Thanks,
>>>> Kristina
>>>> ---------------------------------------------------------------
>>>> Kristina Fontanez
>>>> PhD candidate
>>>> Department of Organismic and Evolutionary Biology
>>>> Cavanaugh lab
>>>> Harvard University
>>>> 16 Divinity Ave.
>>>> Cambridge, MA 02138
>>>>
>>>> tel: 617-495-1138
>>>> fax: 617-496-6933
>>>> email: fontanez at fas.harvard.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Jun  1 13:15:42 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 12:15:42 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>


On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote:

> 2009/6/1 Chris Fields <cjfields at illinois.edu>:
>
> ...
>> for installation.  You need to use 'sudo' to install anything / 
>> Library/Perl
>> on Mac (unless you are already 'root', but on recent OS X version  
>> logging in
> ...
>
> local::lib is supposed to take care of this. Is this broken on Mac?
> Building stuff as root is generally considered to be bad.

You can install to a local lib, yes, but cpan needs to be manually  
configured to do this; I don't think it is automatically configured to  
do so in OS X, eg. it defaults to /Library/Perl.

Frankly, I sidestep the whole issue with my own custom perl  
installation, but that's me.

>> I just noticed nothing is mentioned along these lines in the  
>> installation
>> docs, so we'll need to update those.
>
> I tried to write down a clear 'recipe' for getting things installed
> (this was actually on the GMod wiki). I really think the install docs
> could be improved. Sometimes less verbose is better.
>
> Dan

True, but I would much rather have reasonable instructions that  
outline most installation issues than ones that aren't detailed enough.

My thought is to strip down the INSTALL doc that comes with BioPerl  
down to the essentials and point to the wiki for the more detailed  
ones (including problems encountered).  It's too hard to maintain both  
and backport the wiki into plain text.

chris


From maj at fortinbras.us  Mon Jun  1 15:03:05 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 15:03:05 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
	<6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
Message-ID: <AABEFA992F2345548C861ADDFDC50132@NewLife>

Thanks, Chris--

Bio::Search::Tiling is now ported to core; the snapshot of the ported version is 
in bioperl-dev/tags/tiling-port-to-core-060109.
Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets 
BIOPERL_TILING_EXHAUSTIVE_TESTS .

Cry 'Havoc!' and let slip the dogs of war...

MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Sendu Bala" <bix at sendu.me.uk>; "Dave Messina" <dave at davemessina.com>; 
"BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, June 01, 2009 12:14 PM
Subject: Re: [Bioperl-l] a HOWTO for Tiling


>I think, as long is it doesn't significantly impact SearchIO  performance wise 
>(from reading the HOWTO I can't see how it will), I  say commit away. In fact, 
>I consider this a bug fix that should be in  the next 1.6 point release. We 
>should add deprecation warnings where  needed for 1.7...
>
> chris
>
> On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:
>
>> Hi All
>> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  exhibition of 
>> B::S::Tiling, use cases, code snippets, design,  implementation and algorithm 
>> discussions. We're just about ready to  port over to core from bioperl-dev; 
>> please shout out if this is not  a good idea.
>> cheers and thanks for all input--
>> Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From koenvanderdrift at gmail.com  Mon Jun  1 18:22:23 2009
From: koenvanderdrift at gmail.com (Koen van der Drift)
Date: Mon, 1 Jun 2009 18:22:23 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>


On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:

> My thought is to strip down the INSTALL doc that comes with BioPerl  
> down to the essentials and point to the wiki for the more detailed  
> ones (including problems encountered).  It's too hard to maintain  
> both and backport the wiki into plain text.


Good idea, please then also update the file PLATFORMS. It has a link  
to a very outdated website for the installation of bioperl on OS X.  
And maybe a line + link to the bioperl wiki can be added that  
recommends the use of fink as an alternative to cpan?

cheers,

- Koen.


From cjfields at illinois.edu  Mon Jun  1 19:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 18:27:32 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
	<2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu>


On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote:

>
> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:
>
>> My thought is to strip down the INSTALL doc that comes with BioPerl  
>> down to the essentials and point to the wiki for the more detailed  
>> ones (including problems encountered).  It's too hard to maintain  
>> both and backport the wiki into plain text.
>
>
> Good idea, please then also update the file PLATFORMS. It has a link  
> to a very outdated website for the installation of bioperl on OS X.  
> And maybe a line + link to the bioperl wiki can be added that  
> recommends the use of fink as an alternative to cpan?
>
> cheers,
>
> - Koen.

Done. I've added a ticket on bugzilla for tracking this so it doesn't  
get lost:

http://bugzilla.open-bio.org/show_bug.cgi?id=2846

chris


From shalabh.sharma7 at gmail.com  Tue Jun  2 10:44:25 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 10:44:25 -0400
Subject: [Bioperl-l] Refseq Hits
Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>

Hi All,
          This is not really a bioperl query, but i am really confused and
need some help.
I blasted some sequences against refseq database (locally). After parsing
the blast result what i noticed that some description fields contain two hit
names like:
hit_name ->    gi|71082715|ref|YP_265434.1|
Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
[Candidatus Pelagibacter ubique HTCC1002]

So besides giving me description for hit_name (HTCC 1062) its also giving me
HTCC 1002.
I will really appreciate if someone can help me out.

Thanks
Shalabh
_________________________________________________
Shalabh Sharma
Scientific Computing Professional Associate
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

phone: 706-542-0341
email: ssharmai at uga.edu


From jonathancrabtree at gmail.com  Tue Jun  2 11:04:33 2009
From: jonathancrabtree at gmail.com (Jonathan Crabtree)
Date: Tue, 2 Jun 2009 11:04:33 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>

Hi Shalabh-

I believe RefSeq is a non-redundant database, in which sequence entries with
identical sequences are merged and their descriptions are concatenated in
the FASTA defline.  If you look up the two accession numbers/gi numbers from
your search results I think you'll see that both are valid matches because
their polypeptide sequences are identical:

http://www.ncbi.nlm.nih.gov/protein/71082715
http://www.ncbi.nlm.nih.gov/protein/91762865

You're just getting a single match with two descriptions instead of two
matches with one description, but the sequence is the same and so, therefore
are the blast alignments.

Jonathan

On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>          This is not really a bioperl query, but i am really confused and
> need some help.
> I blasted some sequences against refseq database (locally). After parsing
> the blast result what i noticed that some description fields contain two
> hit
> names like:
> hit_name ->    gi|71082715|ref|YP_265434.1|
> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
> [Candidatus Pelagibacter ubique HTCC1002]
>
> So besides giving me description for hit_name (HTCC 1062) its also giving
> me
> HTCC 1002.
> I will really appreciate if someone can help me out.
>
> Thanks
> Shalabh
> _________________________________________________
> Shalabh Sharma
> Scientific Computing Professional Associate
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>
> phone: 706-542-0341
> email: ssharmai at uga.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Tue Jun  2 11:15:45 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 11:15:45 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>

Hi Jonathan,                  Your information is really helpful. Thanks a
lot.

-Shalabh


On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
jonathancrabtree at gmail.com> wrote:

>
> Hi Shalabh-
>
> I believe RefSeq is a non-redundant database, in which sequence entries
> with identical sequences are merged and their descriptions are concatenated
> in the FASTA defline.  If you look up the two accession numbers/gi numbers
> from your search results I think you'll see that both are valid matches
> because their polypeptide sequences are identical:
>
> http://www.ncbi.nlm.nih.gov/protein/71082715
> http://www.ncbi.nlm.nih.gov/protein/91762865
>
> You're just getting a single match with two descriptions instead of two
> matches with one description, but the sequence is the same and so, therefore
> are the blast alignments.
>
> Jonathan
>
> On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>          This is not really a bioperl query, but i am really confused and
>> need some help.
>> I blasted some sequences against refseq database (locally). After parsing
>> the blast result what i noticed that some description fields contain two
>> hit
>> names like:
>> hit_name ->    gi|71082715|ref|YP_265434.1|
>> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
>> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
>> protein
>> [Candidatus Pelagibacter ubique HTCC1002]
>>
>> So besides giving me description for hit_name (HTCC 1062) its also giving
>> me
>> HTCC 1002.
>> I will really appreciate if someone can help me out.
>>
>> Thanks
>> Shalabh
>> _________________________________________________
>> Shalabh Sharma
>> Scientific Computing Professional Associate
>> Department of Marine Sciences
>> University of Georgia
>> Athens, GA 30602-3636
>>
>> phone: 706-542-0341
>> email: ssharmai at uga.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From tristan.lefebure at gmail.com  Tue Jun  2 12:24:21 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 2 Jun 2009 12:24:21 -0400
Subject: [Bioperl-l] Creating a fastq format file?
In-Reply-To: <ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com>
	<ddde1f420904262242s533bd5abqeb9db75463d5a8f2@mail.gmail.com>
	<ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
Message-ID: <200906021224.21439.tristan.lefebure@gmail.com>

On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote:
> I convinced at least myself to the degree that I wrote
> the range_convert() method - with plenty of tests. I
> mention this now so that no-one else need to start
> thinking through all the edge values.
>
> :)
>
> I'll contribute it to the code base once there is a
> consensus of best way forward.
>

Heikki,

This thread has been quiet for a while, but I don't see 
anything new in Bio::Seq::Quality. Did we reach a consensus 
or are you waiting for some more discussion on the subject?

(I'm pretty impatient to see bioperl handling both sanger 
and illumina ranges on the fly!)

--Tristan

>     -Heikki
>
> 2009/4/27 Heikki Lehvaslaiho 
<heikki.lehvaslaiho at gmail.com>:
> >> I have tried to summarise this in a central place:
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >
> > Torsten,
> >
> > Thanks for putting this together. Very helpful.
> >
> > Do you have a plan of action?  Let me propose one for
> > BioPerl. It based on following assumptions:
> >
> > 1. There is multitude of different ways of coding
> > quality values out there. 2. Bio::Seq::Quality is
> > agnostic of any quality value range rules 3. The
> > emerging open standard is the Sanger fastq
> > specification 4. Open source programs use the Sanger
> > fastq specs
> >
> >
> > From these it follows that:
> >
> >
> > 1. BioPerl should support Sanger fastq standard
> >
> > 1.1. it already does and there are other SeqIO modules
> > for dealing with other non-fastq formats.
> >
> > 2. BioPerl should offer simple ways of converting
> > between quality range rules
> >
> > 2.1. Have a generic method accessible from
> > Bio::Seq::Quality with preset versions of the method
> > for converting between known variants (Sanger fastq and
> > the two Illumina versions)
> >
> > For example:
> >
> > range_convert ($from_lower, $from_upper, $to_lower,
> > $to_upper, $value) throw if $value < $from_lower or
> > $value > $from_upper return $newvalue
> >
> > range_convert_illumina2fastq(),
> > range_convert_fastq2illumina(),
> > range_convert_fastq2phred(),
> >  range_convert_phred2fastq()....
> >
> > (assuming that illumina 1.3 eq phred)
> >
> > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert
> > Illumina qualities into Sanger fastq on the fly
> >
> > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the
> > incoming stream of quality value range either
> > automatically or be given a keyword parameter
> > indicating the range.
> >
> > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.4. It would be useful but not absolutely necessary
> > for Bio::SeqIO::Fastq::write_seq to be able to write
> > out in Illumina ranges
> >
> >
> > What do you think?
> >
> >    -Heikki
> >
> > 2009/4/26 Torsten Seemann 
<torsten.seemann at infotech.monash.edu.au>:
> >>> > This might be a good place to ask the question:
> >>> > having looked at the fastq.pm page, is the fastq
> >>> > format defined (only) by a "@'" followed by
> >>>
> >>> a
> >>>
> >>> > sequence line and a "+" header followed by a
> >>> > quality line and the two headers have to agree? Now
> >>> > that Illumina is using phred scaling, are 'Sanger'
> >>> > and 'Illumina' versions the same?
> >>>
> >>> No they aren't the same, Illumina still encodes the
> >>> ascii as value + 64 and Sanger as value + 33.
> >>
> >> Illumina have now CHANGED how they calculate the
> >> quality value however in the last month or so... Their
> >> Q range used to be -5..40 mapped to ASCII 64+, but now
> >> they produce Q >= 0 and it is unclear if they start at
> >> 69 or 64 now...
> >>
> >> I have tried to summarise this in a central place:
> >>
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >>
> >> Corrections welcome!
> >>
> >>
> >> --Torsten Seemann
> >> --Victorian Bioinformatics Consortium, Dept.
> >> Microbiology, Monash University, AUSTRALIA
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> >    -Heikki
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +27 (0)714328090
> > Sent from Claremont, WC, South Africa


From Russell.Smithies at agresearch.co.nz  Tue Jun  2 16:56:26 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 3 Jun 2009 08:56:26 +1200
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
	<9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz>

The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Wednesday, 3 June 2009 3:16 a.m.
> To: Jonathan Crabtree
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Refseq Hits
> 
> Hi Jonathan,                  Your information is really helpful. Thanks a
> lot.
> 
> -Shalabh
> 
> 
> On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
> jonathancrabtree at gmail.com> wrote:
> 
> >
> > Hi Shalabh-
> >
> > I believe RefSeq is a non-redundant database, in which sequence entries
> > with identical sequences are merged and their descriptions are concatenated
> > in the FASTA defline.  If you look up the two accession numbers/gi numbers
> > from your search results I think you'll see that both are valid matches
> > because their polypeptide sequences are identical:
> >
> > http://www.ncbi.nlm.nih.gov/protein/71082715
> > http://www.ncbi.nlm.nih.gov/protein/91762865
> >
> > You're just getting a single match with two descriptions instead of two
> > matches with one description, but the sequence is the same and so, therefore
> > are the blast alignments.
> >
> > Jonathan
> >
> > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > > wrote:
> >
> >> Hi All,
> >>          This is not really a bioperl query, but i am really confused and
> >> need some help.
> >> I blasted some sequences against refseq database (locally). After parsing
> >> the blast result what i noticed that some description fields contain two
> >> hit
> >> names like:
> >> hit_name ->    gi|71082715|ref|YP_265434.1|
> >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
> >> protein
> >> [Candidatus Pelagibacter ubique HTCC1002]
> >>
> >> So besides giving me description for hit_name (HTCC 1062) its also giving
> >> me
> >> HTCC 1002.
> >> I will really appreciate if someone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _________________________________________________
> >> Shalabh Sharma
> >> Scientific Computing Professional Associate
> >> Department of Marine Sciences
> >> University of Georgia
> >> Athens, GA 30602-3636
> >>
> >> phone: 706-542-0341
> >> email: ssharmai at uga.edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Tue Jun  2 17:05:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 2 Jun 2009 17:05:03 -0400
Subject: [Bioperl-l] Bio::Search::Tiling
Message-ID: <B006036D760941179148C9F8E2AD7E05@NewLife>

All-
Bio::Search::Tiling is now in bioperl-live, passes all tests.
Thanks, 
Mark


From shalabh.sharma7 at gmail.com  Wed Jun  3 13:27:59 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 3 Jun 2009 13:27:59 -0400
Subject: [Bioperl-l] gbf to gff
Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>

Hi all,                 I am working on Roseobacters. Many times I've
converted gbk file from GenBank to gff format but now one genome
"Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
gbf files:

https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain

So now how i can convert this genome to one gff file so i can use it in
gbrowse?
I would really appreciate if anyone can help me out.

Thanks


From scott at scottcain.net  Wed Jun  3 14:11:54 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 3 Jun 2009 14:11:54 -0400
Subject: [Bioperl-l] gbf to gff
In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com>

Hi Shalabh,

Do you want them combined onto a single reference sequence?  I'm
guessing this is a circular microbial genome in two segments.  Do you
know how to the coordinates in one genbank file relates to the other
(or are you willing to make something up)?  I imagine the way I would
do it would be to convert both files to gff and then write a quicky
script to convert the coordinates and reference sequence name (column
1) of one file to be consistent with the other.

Scott


On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've
> converted gbk file from GenBank to gff format but now one genome
> "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
> gbf files:
>
> https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain
>
> So now how i can convert this genome to one gff file so i can use it in
> gbrowse?
> I would really appreciate if anyone can help me out.
>
> Thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From alperyilmaz at gmail.com  Fri Jun  5 14:50:46 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Fri, 5 Jun 2009 14:50:46 -0400
Subject: [Bioperl-l] GBroswe2 - feature details
Message-ID: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>

Dear all,

I have a question about utilizing the tag/value pairs that were used
in 9th of GFF. If my 9th column is like this:

ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22

How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
print name and sequence of a BindingSite, what do I need to replace
question marks below?

balloon hover = <font size=small color=red>Motif name: $name,
Sequence: ???????</font>


The manual is mentioning that it's possible to use user defined
tag/value pairs, but I couldn't figure out how. The manual is
mentioning:
 [feature_type:details]
 tag1 = formatting rule
 tag2 = formatting rule
 tag3 = formatting rule

can be used to adjust formatting of a tag, but I don't how this can be
used to assign value to a tag? I tried ;
[cis-elements:details]
bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
mentioned, tags are case-insensitive)
 OR
$bs_seq = <b>$value</b>

but, I cannot use $bs_seq in hover link option after doing this. What
am I doing wrong?

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954
www.grassius.org


From cjfields at illinois.edu  Fri Jun  5 16:43:04 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 5 Jun 2009 15:43:04 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm?
In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
References: <002b01c9e567$e09b0de0$a1d129a0$@edu>
	<A145C0B1-D2B3-47CB-BA46-DCCDD693D05F@illinois.edu>
	<52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
Message-ID: <C29B8160-5682-48AF-BD9E-A5FF26EC679F@illinois.edu>

(Just so this is going to the correct list)

Marcos,

I'll look into it.  This may have been fixed in between the releases,  
though.

There isn't a PPM available for 1.6 yet (several prereqs were missing  
at the time of the 1.6 release, such as Graphviz and so on).  A bug  
report is in the queue for this, though, as a reminder.  I think those  
are now available, though, so we should *theoretically* be capable of  
getting a PPM ready.  I say 'theoretically' b/c I don't have easy  
access to a PC running Windows (I have moved to OS X).  I'll see what  
I can do about that in the next few weeks.

In the meantime, if you need it you can download 1.6 or the 'nightly  
build' version (nightly snapshots of svn code) and add it to PERL5LIB  
or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work.

Nightly builds:

http://bioperl.org/DIST/nightly_builds/

chris

On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote:

> OK, I attached the first record for both files.  These are GenBank  
> flat files that were emailed to us and transferred from Macs to PCs,  
> so I am not sure if the encoding/line terminations got messed up at  
> some point.  I converted the line terminations to Unix and the  
> encoding to Western European Windows, still, it didn't work. May be  
> worth it mention that BioEdit did understand the format after I  
> fixed the encoding.
>
> The data was erased because my boss is kind of finicky about sharing  
> information.  However, I tested the files attached to this email and  
> got the same results.
>
> I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the  
> availability of an upgrade from CPAN, are you releasing the PPD as  
> well?
>
> Thanks!
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thu 6/4/2009 8:05 PM
> To: Barbeitos, Marcos
> Cc: bioperl-guts-l at lists.open-bio.org
> Subject: Re: [Bioperl-guts-l] Bug in genbank.pm?
>
> Marcos,
>
> We need the GenBank file (or the accession) you are attempting to
> parse.  Also, what version are you using?  We have released v. 1.6 on
> CPAN, and I intend on releasing 1.6.1 soon.
>
> chris
>
> On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote:
>
>> Hello.  I am trying to parse the Info from GeneBank flat files using
>> Bio::SeqIO.  I got two file which are virtually identical and one of
>> them
>> gets parsed just fine.  However, in the case of the other, the  
>> program
>> croaks when trying to parse the features and gives me:
>>
>>
>>
>> -------------------- WARNING ---------------------
>>
>> MSG: Unexpected error in feature table for  Skipping feature,
>> attempting to
>> recover
>>
>> ---------------------------------------------------
>>
>>
>>
>> I noticed that it does that after it reads the entry '/organism' in
>> Features.  The only difference I can see between the two files is the
>> presence of the feature ' /organelle' and of the line BASE COUNT in
>> one of
>> them, but the error persists even after I remove these lines.  Apart
>> from
>> that, there are the number of white spaces that precede the
>> beginning of
>> each line.   Any ideas?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Marcos S. Barbeitos
>>
>> Post-Doc Fellow
>>
>> The University of Kansas
>> Department of Ecology and Evolutionary Biology
>> 2041 Haworth Hall
>> 1200 Sunnyside Avenue
>> Lawrence, Kansas 66045
>> p: 785.864.5887
>> f: 785.864.5860
>>
>>
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
>
> <BioPerlTest.gb>


From Russell.Smithies at agresearch.co.nz  Sun Jun  7 16:32:27 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 8 Jun 2009 08:32:27 +1200
Subject: [Bioperl-l] GBroswe2 - feature details
In-Reply-To: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
References: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz>

For the first part of your question, you can use a sub to access values in your annotations:

balloon hover = sub{my $f = shift;
			my %a = $f->attributes;
			my $name = $f->name;
			my $seq = $a{'BS_Seq'};
			return "<font size=small color=red>Motif name: $name, Sequence: $seq</font>" if defined $seq;
			return "<font size=small color=red>Motif name: $name, No sequence defined</font>";
			}


For the second bit, here's the formatting rules I'm using to create hyperlinks:

[Dbxref:DETAILS]
URL = sub {
      my ($tag,$value)=@_;
      if ($value =~ /NCBI_gi:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1";
       }
      if ($value =~ /NCBI_Gene:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1";
       }
       return;
     }

And this is what the gff looks like:
BTA10	refseq	mRNA	10011147	10176454	0	-	.	ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311;
BTA10	refseq	mRNA	10241506	10301142	0	+	.	ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862;

Hopefully, this will get you going :-)


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz
> Sent: Saturday, 6 June 2009 6:51 a.m.
> To: BioPerl List
> Subject: [Bioperl-l] GBroswe2 - feature details
> 
> Dear all,
> 
> I have a question about utilizing the tag/value pairs that were used
> in 9th of GFF. If my 9th column is like this:
> 
> ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22
> 
> How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
> print name and sequence of a BindingSite, what do I need to replace
> question marks below?
> 
> balloon hover = <font size=small color=red>Motif name: $name,
> Sequence: ???????</font>
> 
> 
> The manual is mentioning that it's possible to use user defined
> tag/value pairs, but I couldn't figure out how. The manual is
> mentioning:
>  [feature_type:details]
>  tag1 = formatting rule
>  tag2 = formatting rule
>  tag3 = formatting rule
> 
> can be used to adjust formatting of a tag, but I don't how this can be
> used to assign value to a tag? I tried ;
> [cis-elements:details]
> bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
> mentioned, tags are case-insensitive)
>  OR
> $bs_seq = <b>$value</b>
> 
> but, I cannot use $bs_seq in hover link option after doing this. What
> am I doing wrong?
> 
> thanks,
> 
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> www.grassius.org
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From bernd.jagla at pasteur.fr  Mon Jun  8 12:24:12 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 8 Jun 2009 18:24:12 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>

Hi, 

 
I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
'install Bio::Das'
This is perl, v5.8.9 built for darwin-2level
(please let me know if you need anything else)

 
I am trying to install Bio::Das 1.11

 
I get the following error:

 
not ok 3

not ok 4

Can't call method "description" on an undefined value at t/01das.t line 62.

 
When going into the sources for 01das.t and printing out $db I get:

 
$VAR1 = \bless( {

                   'autotypes' => undef,

                   'default_dsn' => undef,

                   'autocategories' => undef,

                   'sockets' => {},

                   'aggregators' => [

                                      bless( {

                                               'sub_parts' => [

 
'coding_exon'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'CDS',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator'
),

                                      bless( {

                                               'sub_parts' => [

                                                                'EST_match'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'alignment',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator' )

                                    ],

                   'timeout' => undef,

                   'oldstyle_api' => 1,

                   'default_server' => 'http://www.wormbase.org/db/seq/das'

                 }, 'Bio::Das' );

 
@sources is empty

And test(3, at sources) fails.

 
Please advise.

 
Thanks,

 
Bernd

 
From lincoln.stein at gmail.com  Mon Jun  8 13:00:48 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 8 Jun 2009 13:00:48 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>

Hi,

The regression tests require an active Internet connection, as well as the
DAS test server being up and running. It may be there was a temporary
failure of one of those two. I just tested on my end and the regression
tests ran ok, so could you try it again?

Lincoln

On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:

> Hi,
>
>
>
> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
> 'install Bio::Das'
> This is perl, v5.8.9 built for darwin-2level
> (please let me know if you need anything else)
>
>
>
> I am trying to install Bio::Das 1.11
>
>
>
> I get the following error:
>
>
>
> not ok 3
>
> not ok 4
>
> Can't call method "description" on an undefined value at t/01das.t line 62.
>
>
>
> When going into the sources for 01das.t and printing out $db I get:
>
>
>
> $VAR1 = \bless( {
>
>                   'autotypes' => undef,
>
>                   'default_dsn' => undef,
>
>                   'autocategories' => undef,
>
>                   'sockets' => {},
>
>                   'aggregators' => [
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>
> 'coding_exon'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'CDS',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator'
> ),
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>                                                                'EST_match'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'alignment',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator' )
>
>                                    ],
>
>                   'timeout' => undef,
>
>                   'oldstyle_api' => 1,
>
>                   'default_server' => 'http://www.wormbase.org/db/seq/das'
>
>                 }, 'Bio::Das' );
>
>
>
>
>
> @sources is empty
>
> And test(3, at sources) fails.
>
>
>
> Please advise.
>
>
>
> Thanks,
>
>
>
> Bernd
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lsbrath at gmail.com  Mon Jun  8 16:28:46 2009
From: lsbrath at gmail.com (lsbrath at gmail.com)
Date: Mon, 08 Jun 2009 20:28:46 +0000
Subject: [Bioperl-l] fasta conversion
Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com>

Hello!

I am running into trouble while trying to convert a text file to fasta. It  
should be simple enough but I am getting a wierd error message.

This is my script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
use Bio::SeqIO;


my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
my $maid = '13063';

opendir my $dh, "$maid_dir"; # directory to search
my @files = readdir $dh;
#find the _fasta file
for my $f (@files){
my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
my $r = $maid_dir."/".$maid."_hu_1kb.txt";
open (my $in,$r);
if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta

print Dumper($f);
my $hu_1kb = $maid.'_hu_1kb'; #file to convert
my $in = Bio::SeqIO->new(-file => $r,
-format => 'raw');
my $out = Bio::SeqIO->new(-file => ">$fa",
-format => 'Fasta');
while ( my $seq = $in->next_seq()) {
$out->write_seq($seq);
}
}
}

I keep getting the following error message:

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 13063
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [13063HU] which does not look healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
STACK: Bio::Seq::SeqFactory::create  
C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
-----------------------------------------------------------

Anyone out there that can help me solve this?


From kjaja27 at yahoo.com  Fri Jun  5 19:42:13 2009
From: kjaja27 at yahoo.com (kayj)
Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT)
Subject: [Bioperl-l]  finding SNPs in a given region
Message-ID: <23897107.post@talk.nabble.com>


Hi All,

Is there a way to find the SNPs in a given region, I have the start and the
end base pair position, I am looking to download the SNPs in different
regions, Is that possible ?
 This is my first time using bioperl and any help will be greatly
appreciated

Thanks

-- 
View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From kjaja27 at yahoo.com  Mon Jun  8 09:49:24 2009
From: kjaja27 at yahoo.com (kayj)
Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT)
Subject: [Bioperl-l]  How to extract SNPs
Message-ID: <23924432.post@talk.nabble.com>


Hi All,
I have several regions on the genome each is defined with the start and the
end base pair position. I am looking into using HapMap
http://hapmart.hapmap.org/BioMart/martview

 to extract the SNPs in these region given a population. I am new to bioperl
and any help will be greatly appreciated.


-- 
View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bernd at pasteur.fr  Mon Jun  8 16:31:57 2009
From: bernd at pasteur.fr (bernd at pasteur.fr)
Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST)
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>

I tested the connection with wget and everything works fine.
I suspect that our proxy might be the problem but all variables are set
correctly (ftp_proxy, http_proxy and many more) I am not sure which
environment variable are being used...
I am not too familiar with all this and don't know where to look for the
right configurations.

Thanks,

Bernd

> Hi,
>
> The regression tests require an active Internet connection, as well as the
> DAS test server being up and running. It may be there was a temporary
> failure of one of those two. I just tested on my end and the regression
> tests ran ok, so could you try it again?
>
> Lincoln
>
> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>> -e
>> 'install Bio::Das'
>> This is perl, v5.8.9 built for darwin-2level
>> (please let me know if you need anything else)
>>
>>
>>
>> I am trying to install Bio::Das 1.11
>>
>>
>>
>> I get the following error:
>>
>>
>>
>> not ok 3
>>
>> not ok 4
>>
>> Can't call method "description" on an undefined value at t/01das.t line
>> 62.
>>
>>
>>
>> When going into the sources for 01das.t and printing out $db I get:
>>
>>
>>
>> $VAR1 = \bless( {
>>
>>                   'autotypes' => undef,
>>
>>                   'default_dsn' => undef,
>>
>>                   'autocategories' => undef,
>>
>>                   'sockets' => {},
>>
>>                   'aggregators' => [
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>
>> 'coding_exon'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' => 'CDS',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator'
>> ),
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>                                                                'EST_match'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' =>
>> 'alignment',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator' )
>>
>>                                    ],
>>
>>                   'timeout' => undef,
>>
>>                   'oldstyle_api' => 1,
>>
>>                   'default_server' =>
>> 'http://www.wormbase.org/db/seq/das'
>>
>>                 }, 'Bio::Das' );
>>
>>
>>
>>
>>
>> @sources is empty
>>
>> And test(3, at sources) fails.
>>
>>
>>
>> Please advise.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Bernd
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Mon Jun  8 17:12:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 8 Jun 2009 17:12:03 -0400
Subject: [Bioperl-l] fasta conversion
In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com>
References: <000e0cd6aa4cd53993046bdc1675@google.com>
Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife>

you're getting the sequence descriptor rather than the sequence in the return 
from
$in->next_seq. Read up on what the 'raw' format actually entails in the 
Bio::SeqIO pod..
cheers MAJ
----- Original Message ----- 
From: <lsbrath at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, June 08, 2009 4:28 PM
Subject: [Bioperl-l] fasta conversion


> Hello!
>
> I am running into trouble while trying to convert a text file to fasta. It 
> should be simple enough but I am getting a wierd error message.
>
> This is my script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use File::Copy;
> use Bio::SeqIO;
>
>
> my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
> my $maid = '13063';
>
> opendir my $dh, "$maid_dir"; # directory to search
> my @files = readdir $dh;
> #find the _fasta file
> for my $f (@files){
> my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
> my $r = $maid_dir."/".$maid."_hu_1kb.txt";
> open (my $in,$r);
> if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta
>
> print Dumper($f);
> my $hu_1kb = $maid.'_hu_1kb'; #file to convert
> my $in = Bio::SeqIO->new(-file => $r,
> -format => 'raw');
> my $out = Bio::SeqIO->new(-file => ">$fa",
> -format => 'Fasta');
> while ( my $seq = $in->next_seq()) {
> $out->write_seq($seq);
> }
> }
> }
>
> I keep getting the following error message:
>
> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 13063
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Attempting to set the sequence to [13063HU] which does not look healthy
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
> STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
> STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
> STACK: Bio::Seq::SeqFactory::create 
> C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
> STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
> -----------------------------------------------------------
>
> Anyone out there that can help me solve this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From stefan.kirov at bms.com  Mon Jun  8 17:26:17 2009
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Mon, 08 Jun 2009 17:26:17 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
	<47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
Message-ID: <4A2D81F9.8060509@bms.com>

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>                                                                'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bernd.jagla at pasteur.fr  Tue Jun  9 03:05:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Tue, 9 Jun 2009 09:05:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <4A2D81F9.8060509@bms.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
	<4A2D81F9.8060509@bms.com>
Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>

Great, that works!!!
But since I am using Bio::Das within GBrowse I can't/don't want to  change
those sources. I tried setting some environment variable but that doesn't
seem to work either...
So far I have the set the following:
FTP_PROXY=http://...
HTTP_PROXY=http://...
PROXYFTP=http://...
PROXYHTTP=http://...
ftp_proxy=http://...
http_proxy=http://...
PROXY=http://...

Any suggestions are welcome.

Thanks,

Bernd


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov
Sent: Monday, June 08, 2009 11:26 PM
To: bernd at pasteur.fr
Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as
the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Tue Jun  9 07:20:35 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 9 Jun 2009 12:20:35 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>

Hi,

I have been experimenting with the Bio::DB::EUtilities module, with  
help from the Cookbook. But I can't seem to figure out how to get the  
DNA sequence of a gene; all the examples seem to be fetching protein  
sequence.

How would i go about fetching a sequence using an Entrez GeneID?

thanks for any help

adam


From Kevin.M.Brown at asu.edu  Tue Jun  9 11:25:45 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 9 Jun 2009 08:25:45 -0700
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com>
	<19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Tue Jun  9 12:08:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 11:08:46 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>

All,

I've noticed a few methods in bioperl with names like 'no_Foo' that  
mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
problem I foresee are possible ambiguities, particularly with negative  
boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
Foo'), something that BioPerl also has with various settings.

I suggest we alias these as num_* to disambiguate that.  There's no  
easy way to change already in-place flag setting w/o going through a  
deprecation cycle, but we can promote using positive booleans where  
possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
the older 'no_*' methods as is for the time being and maybe deprecate  
them later.

If no one has objections I'll add these in as needed.

chris


From SMarkel at accelrys.com  Tue Jun  9 12:26:08 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 9 Jun 2009 12:26:08 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>

Chris,

I just checked our code for the Sequence Analysis Collection in
Pipeline Pilot.  We've got a few places we'd need to make code
changes, but we like your suggestion.  So, no objections from us.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, 09 June 2009 9:09 AM
> To: BioPerl List
> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
> 
> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
> problem I foresee are possible ambiguities, particularly with negative
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no
> easy way to change already in-place flag setting w/o going through a
> deprecation cycle, but we can promote using positive booleans where
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave
> the older 'no_*' methods as is for the time being and maybe deprecate
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jun  9 13:03:16 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 12:03:16 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
Message-ID: <A5461F02-AA81-4A02-88DA-181B33EE41FE@illinois.edu>

I don't think it would require code changes right away; for the time  
being no_* will just alias num_*.  We can probably have deprecation  
warnings activate when we reach a particular version.

chris

On Jun 9, 2009, at 11:26 AM, Scott Markel wrote:

> Chris,
>
> I just checked our code for the Sequence Analysis Collection in
> Pipeline Pilot.  We've got a few places we'd need to make code
> changes, but we like your suggestion.  So, no objections from us.
>
> Scott
>
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Tuesday, 09 June 2009 9:09 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative  
>> booleans
>>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
>> problem I foresee are possible ambiguities, particularly with  
>> negative
>> boolean checks (eg 'no_Foo' could also mean 'this instance contains  
>> no
>> Foo'), something that BioPerl also has with various settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no
>> easy way to change already in-place flag setting w/o going through a
>> deprecation cycle, but we can promote using positive booleans where
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
>> leave
>> the older 'no_*' methods as is for the time being and maybe deprecate
>> them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun  9 12:32:51 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 9 Jun 2009 12:32:51 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife>

+1, absolutely- MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 09, 2009 12:08 PM
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans


> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with negative  
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
> the older 'no_*' methods as is for the time being and maybe deprecate  
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hlapp at gmx.net  Tue Jun  9 13:18:05 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 13:18:05 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>

Great suggestions, I'm all for it.

	-hilmar

On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:

> All,
>
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with  
> negative boolean checks (eg 'no_Foo' could also mean 'this instance  
> contains no Foo'), something that BioPerl also has with various  
> settings.
>
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
> leave the older 'no_*' methods as is for the time being and maybe  
> deprecate them later.
>
> If no one has objections I'll add these in as needed.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florent.angly at gmail.com  Tue Jun  9 14:41:51 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 09 Jun 2009 11:41:51 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
Message-ID: <4A2EACEF.3090809@gmail.com>

Agree! no_* is prone to misunderstandings.
Also, some BioPerl code uses nof_*, which I quite like.
Florent

Hilmar Lapp wrote:
> Great suggestions, I'm all for it.
>
>     -hilmar
>
> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>> problem I foresee are possible ambiguities, particularly with 
>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>> contains no Foo'), something that BioPerl also has with various 
>> settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no 
>> easy way to change already in-place flag setting w/o going through a 
>> deprecation cycle, but we can promote using positive booleans where 
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>> leave the older 'no_*' methods as is for the time being and maybe 
>> deprecate them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Jun  9 14:55:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 13:55:48 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EACEF.3090809@gmail.com>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
	<4A2EACEF.3090809@gmail.com>
Message-ID: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>

We could probably alias nof_* with num_* just for consistency, but  
leave nof_* as is and not deprecate it (I don't think anyone would  
confuse nof* with no*).

chris

On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:

> Agree! no_* is prone to misunderstandings.
> Also, some BioPerl code uses nof_*, which I quite like.
> Florent
>
> Hilmar Lapp wrote:
>> Great suggestions, I'm all for it.
>>
>>    -hilmar
>>
>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>
>>> All,
>>>
>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>> The problem I foresee are possible ambiguities, particularly with  
>>> negative boolean checks (eg 'no_Foo' could also mean 'this  
>>> instance contains no Foo'), something that BioPerl also has with  
>>> various settings.
>>>
>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>> no easy way to change already in-place flag setting w/o going  
>>> through a deprecation cycle, but we can promote using positive  
>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>> time being and maybe deprecate them later.
>>>
>>> If no one has objections I'll add these in as needed.
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mauricio at open-bio.org  Tue Jun  9 15:33:18 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Tue, 09 Jun 2009 14:33:18 -0500
Subject: [Bioperl-l] Project Help
In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
Message-ID: <4A2EB8FE.4080402@open-bio.org>

Hi Chirag,

The OBF applied for the GSoC 2009 but unfortunately we were not 
accepted. However, other organizations/projects made their way into it 
and have been kind enough to adopt some of the ideas originally proposed 
under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list 
so the people involved with those projects can give you more details.

Regards,
Mauricio.


chirag matkar wrote:
> Hello,
> THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation.
> Also as i am myself a perl developer can i can some stipend or internship for building perl modules?.
> 
> Thanking You,
> Regards Chirag.
> 
> 
>       Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/
> 


From rmb32 at cornell.edu  Tue Jun  9 15:12:54 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 12:12:54 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
Message-ID: <4A2EB436.8020506@cornell.edu>

Why not just add deprecation warnings now?  Or you could add deprecation 
warnings now that only print if $Bio::Root::Version::VERSION >= 
something.  Best to do it while one is thinking about it, I always say. 
  Cause I always forget to do it later.  ;-)

Rob

Chris Fields wrote:
> We could probably alias nof_* with num_* just for consistency, but leave 
> nof_* as is and not deprecate it (I don't think anyone would confuse 
> nof* with no*).
> 
> chris
> 
> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
> 
>> Agree! no_* is prone to misunderstandings.
>> Also, some BioPerl code uses nof_*, which I quite like.
>> Florent
>>
>> Hilmar Lapp wrote:
>>> Great suggestions, I'm all for it.
>>>
>>>    -hilmar
>>>
>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>
>>>> All,
>>>>
>>>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>>>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>>>> problem I foresee are possible ambiguities, particularly with 
>>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>>>> contains no Foo'), something that BioPerl also has with various 
>>>> settings.
>>>>
>>>> I suggest we alias these as num_* to disambiguate that.  There's no 
>>>> easy way to change already in-place flag setting w/o going through a 
>>>> deprecation cycle, but we can promote using positive booleans where 
>>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>>>> leave the older 'no_*' methods as is for the time being and maybe 
>>>> deprecate them later.
>>>>
>>>> If no one has objections I'll add these in as needed.
>>>>
>>>> chris
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 16:19:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:19:03 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EB436.8020506@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
Message-ID: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>

On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:

> Why not just add deprecation warnings now?  Or you could add  
> deprecation warnings now that only print if  
> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
> is thinking about it, I always say.  Cause I always forget to do it  
> later.  ;-)
>
> Rob

Actually, that's one thing I want to implement within Root, namely the  
ability to do this:

$self->deprecated(-message     => 'method Foo is deprecated',
                   -start_ver   => $version1,
                   -throw_ver   => $version2
);

So it's essentially a noop and invisible up to start_ver (upon where  
it warns), then throws after, well, throw_ver.  I could probably  
finagle that in w/o destroying things...

chris

> Chris Fields wrote:
>> We could probably alias nof_* with num_* just for consistency, but  
>> leave nof_* as is and not deprecate it (I don't think anyone would  
>> confuse nof* with no*).
>> chris
>> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
>>> Agree! no_* is prone to misunderstandings.
>>> Also, some BioPerl code uses nof_*, which I quite like.
>>> Florent
>>>
>>> Hilmar Lapp wrote:
>>>> Great suggestions, I'm all for it.
>>>>
>>>>   -hilmar
>>>>
>>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>>>> The problem I foresee are possible ambiguities, particularly  
>>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this  
>>>>> instance contains no Foo'), something that BioPerl also has with  
>>>>> various settings.
>>>>>
>>>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>>>> no easy way to change already in-place flag setting w/o going  
>>>>> through a deprecation cycle, but we can promote using positive  
>>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>>>> time being and maybe deprecate them later.
>>>>>
>>>>> If no one has objections I'll add these in as needed.
>>>>>
>>>>> chris
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 16:45:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:45:37 -0500
Subject: [Bioperl-l] deprecated(), was Re:  use of no_* to mean 'number_of',
	negative booleans
In-Reply-To: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
Message-ID: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>

On Jun 9, 2009, at 3:19 PM, Chris Fields wrote:

> On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:
>
>> Why not just add deprecation warnings now?  Or you could add  
>> deprecation warnings now that only print if  
>> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
>> is thinking about it, I always say.  Cause I always forget to do it  
>> later.  ;-)
>>
>> Rob
>
> Actually, that's one thing I want to implement within Root, namely  
> the ability to do this:
>
> $self->deprecated(-message     => 'method Foo is deprecated',
>                  -start_ver   => $version1,
>                  -throw_ver   => $version2
> );
>
> So it's essentially a noop and invisible up to start_ver (upon where  
> it warns), then throws after, well, throw_ver.  I could probably  
> finagle that in w/o destroying things...
>
> chris

Just to note, this is mainly to allow us devs the opportunity to add  
these to main trunk w/o having to worry about merges over to the 1.6  
branch (where the version is different).  We don't want the dep  
warnings showing up there right away, but maybe in a point release or  
minor version.

chris


From hlapp at gmx.net  Tue Jun  9 19:09:26 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 19:09:26 -0400
Subject: [Bioperl-l] Project Help
In-Reply-To: <4A2EB8FE.4080402@open-bio.org>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
	<4A2EB8FE.4080402@open-bio.org>
Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net>

Hi Chirag,

check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted  
Projects') at

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

	-hilmar

On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote:

> Hi Chirag,
>
> The OBF applied for the GSoC 2009 but unfortunately we were not  
> accepted. However, other organizations/projects made their way into  
> it and have been kind enough to adopt some of the ideas originally  
> proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl  
> mailing list so the people involved with those projects can give you  
> more details.
>
> Regards,
> Mauricio.
>
>
> chirag matkar wrote:
>> Hello,
>> THis is Chirag Matkar wanting to know whether there were any GSOC  
>> 2009 projects underway in open Bioinformatics Foundation.
>> Also as i am myself a perl developer can i can some stipend or  
>> internship for building perl modules?.
>> Thanking You,
>> Regards Chirag.
>>      Explore and discover exciting holidays and getaways with  
>> Yahoo! India Travel http://in.travel.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rmb32 at cornell.edu  Tue Jun  9 21:13:36 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 18:13:36 -0700
Subject: [Bioperl-l] deprecated(),
 was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
Message-ID: <4A2F08C0.3010609@cornell.edu>

Chris Fields wrote:
>> Actually, that's one thing I want to implement within Root, namely the 
>> ability to do this:
>>
>> $self->deprecated(-message     => 'method Foo is deprecated',
>>                  -start_ver   => $version1,
>>                  -throw_ver   => $version2
>> );

Here's a patch with tests against the svn trunk head.  Is this what you 
had in mind?

-- 
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deprecated.patch
Type: text/x-diff
Size: 5601 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090609/431738da/attachment-0002.bin>

From cjfields at illinois.edu  Tue Jun  9 22:54:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 21:54:47 -0500
Subject: [Bioperl-l] deprecated(),
	was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2F08C0.3010609@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
	<4A2F08C0.3010609@cornell.edu>
Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu>

On Jun 9, 2009, at 8:13 PM, Robert Buels wrote:

> Chris Fields wrote:
>>> Actually, that's one thing I want to implement within Root, namely  
>>> the ability to do this:
>>>
>>> $self->deprecated(-message     => 'method Foo is deprecated',
>>>                 -start_ver   => $version1,
>>>                 -throw_ver   => $version2
>>> );
>
> Here's a patch with tests against the svn trunk head.  Is this what  
> you had in mind?
>
> -- 
> Rob

Funny, I had written up almost exactly the same code, just a little  
rearranged.  I've modified mine to follow your use of -warn_version (I  
also had -throw_version as a synonym of -version, JIC).  Also, for the  
tests I created a temp class in the tests and ran tests off that.   
Thanks for the patch!

chris


From maj at fortinbras.us  Wed Jun 10 00:10:12 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:10:12 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>

Hi All, 

I've built a public Amazon machine image, loaded with many many 
goodies, including the most recent (r15747) trunks of 
- bioperl-live
- bioperl-run
- bioperl-db/biosql
The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, 
emboss, and more are all there (and most even pass bioperl-run tests), and 
perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
(r1071) and others. This is *not* a lean mean fighting machine. 

Please give it a try if you're so inclined. Fuller details (including 
image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max.

Ping me if it doesn't work.

Cheers, 
Mark


From cjfields at illinois.edu  Wed Jun 10 00:36:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 23:36:40 -0500
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>

I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
do you have mysql or pg?

Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
rakudo and we could do some damage...

chris

On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jun 10 00:39:36 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:39:36 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife>

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
> do you have mysql or pg?

-both (I'm all about options...)


> 
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
> rakudo and we could do some damage...
> 

bioperl-max-0.1.1, here we come...


> chris
> 

cheers MAJ

> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
> 
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  
>> tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
>> .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From bernd.jagla at pasteur.fr  Wed Jun 10 03:43:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 09:43:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina>

I wrote a small test program to test the environment variables and I have
them:

          'SSH_CLIENT' => '157.
          'FTP_PROXY' => 'http://
          'HTTP_PROXY' => 'http://cache.past
          'SSH_TTY' => '/dev/ttys002',
          'ftp_proxy' => 'http://
          'http_proxy' => 'http://

Using the "-proxy" works, without it doesn't. 

(and yes, I export the variables..)

Thanks for any suggestions.

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.jagla at pasteur.fr  Wed Jun 10 04:16:08 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 10:16:08 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <F5844533CFCB425DA400C888A9995F70@zillumina>

To whom it may concern:

I added 
  $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'};

Around line 72 before:
  $self->proxy($proxy) if $proxy;

In Das.pm. This did the trick.

For completeness I also edited Fetch.pm:
Around line 134:
  $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'};
Before:
  my $dest = $proxy || $request->url;

Best,

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ron at ron.dk  Wed Jun 10 03:35:09 2009
From: ron at ron.dk (Rasmus Ory Nielsen)
Date: Wed, 10 Jun 2009 09:35:09 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase
	file.
Message-ID: <4A2F622D.5060500@ron.dk>

Hi,

This is my first time using bioperl for restriction analysis, so please bear 
with me, if this is a FAQ.

I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
script shown at the bottom of the mail.
My bioperl version is bioperl-live nightly from 09-Jun-2009.

The scripts throws an exception - see below. But, if I comment out the 
'-enzymes' argument, so it uses the built-in collection of enzymes, it works.

My problem is, that I need to use some of the enzymes that are only available 
in rebase. So how do I get this working?

Thanks for your attention.

Best regards,
Rasmus Ory Nielsen


############################################################
Output from the script:
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------

------------- EXCEPTION -------------
MSG: Bad end parameter (11). End must be less than the total length of 
sequence (total=7)
STACK Bio::PrimarySeq::subseq 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
STACK Bio::Restriction::Analysis::_enzyme_sites 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
STACK Bio::Restriction::Analysis::_cuts 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
STACK Bio::Restriction::Analysis::cut 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
STACK Bio::Restriction::Analysis::fragment_maps 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
STACK toplevel ./restriction_test.pl:30
-------------------------------------

[roni at ksdhcp ~]$


############################################################
Output from the script with the '-enzymes' argument commented out
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------
$VAR1 = [
           {
             'seq' => 'CTCGACCGTTAGCAA',
             'end' => 15,
             'start' => '1'
           },
           {
             'seq' => 'AGCTTTCTACCGTTATCGT',
             'end' => 34,
             'start' => '16'
           }
         ];
[roni at ksdhcp ~]$

############################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::PrimarySeq;
use Bio::Restriction::IO;
use Bio::Restriction::Analysis;
use Data::Dumper;

# create seq obj
my $seqobj = new Bio::PrimarySeq(
     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
     -primary_id => 'test',
     -molecule   => 'dna'
);

# read rebase file
my $rebase_io = Bio::Restriction::IO->new(
     -file   => 'withrefm.906',
     -format => 'withrefm',
);
my $rebase_collection = $rebase_io->read;

# start restriction analysis
my $restriction_analysis = Bio::Restriction::Analysis->new(
     -seq     => $seqobj,
     -enzymes => $rebase_collection,    # it works with this line commented out
);

# retrieve fragment maps
my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
print Dumper \@fragment_maps;


From awitney at sgul.ac.uk  Wed Jun 10 07:19:55 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 12:19:55 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>

Hi,

I am going through the EUtilities Cookbook, but the last example (in  
section 2.3.1) fails with:

Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.

This is with BioPerl 1.6.0, perl v5.8.8

thanks for any help

adam


From hlapp at gmx.net  Wed Jun 10 08:08:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 10 Jun 2009 08:08:54 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net>

Very cool! -hilmar

On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at illinois.edu  Wed Jun 10 08:28:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 07:28:44 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>

I can reproduce that; I'll look into it.

chris

On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:

> Hi,
>
> I am going through the EUtilities Cookbook, but the last example (in  
> section 2.3.1) fails with:
>
> Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
> site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>
> This is with BioPerl 1.6.0, perl v5.8.8
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:20:43 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:20:43 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>

EntrezGene doesn't contain the sequence information; I believe it just  
links to the sequence in a specified nuc record with given  
coordinates.  You can get to it, but it takes a little trickery; in  
essence you need to use the UID to get the gene summary information,  
extract that, then grab the sequence record using seqstart, seqend,  
and seqstrand.

A dump of esummary info for UID 18131, for instance, (using $eutil- 
 >print_all) gives this info (abbreviated somewhat):

UID                 :18131
Name                :Notch3
Description         :Notch gene homolog 3 (Drosophila)
Orgname             :Mus musculus
...
GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837
GeneWeight          :23049

The genomic info section gives the accession.version, start, end, and  
(implicitly) the strand (ChrStop is less that ChrStart). I have added  
an example to the cookbook:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

chris

On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:

> Hi,
>
> I have been experimenting with the Bio::DB::EUtilities module, with  
> help from the Cookbook. But I can't seem to figure out how to get  
> the DNA sequence of a gene; all the examples seem to be fetching  
> protein sequence.
>
> How would i go about fetching a sequence using an Entrez GeneID?
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:33:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:33:51 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
	<98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu>

Adam,

Okay, fixed that and the previous issue with 'use an undefined value  
as an ARRAY reference'.  The previous issue appears to be due to a  
change in the XML output from NCBI (it used to give the IDs at one  
point).  Also made the wiki changes for this; didn't take long to find  
everything.

Thanks for pointing that out!  If you find any more issues feel free  
to make the necessary changes on the wiki or point them out if they're  
in code.

chris

On Jun 10, 2009, at 8:12 AM, Adam Witney wrote:

> Hi Chris,
>
> not sure if I should start a new thread for this, but it is related  
> to the EUtilities Cookbook and LinkSet.pm.
>
> There are several references in the Cookbook to the method  
> "get_linkname", however this seems to have changed in the recent  
> version of LinkSet.pm to "get_link_name". But one reference to the  
> old method name still exists in LinkSet.pm, as shown by this patch:
>
> --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
> LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
> +++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
> @@ -220,7 +220,7 @@
> =cut
>
> sub get_link_name {
> -    return ($_[0]->get_linknames)[0];
> +    return ($_[0]->get_link_names)[0];
> }
>
> =head2 get_submitted_ids
>
> If i haven't got this all wrong entirely, I could go through and fix  
> the Cookbook entries if that was useful?
>
> adam
>
>
> On 10 Jun 2009, at 13:28, Chris Fields wrote:
>
>> I can reproduce that; I'll look into it.
>>
>> chris
>>
>> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I am going through the EUtilities Cookbook, but the last example  
>>> (in section 2.3.1) fails with:
>>>
>>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>>
>>> This is with BioPerl 1.6.0, perl v5.8.8
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From awitney at sgul.ac.uk  Wed Jun 10 09:12:05 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 14:12:05 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>


Hi Chris,

not sure if I should start a new thread for this, but it is related to  
the EUtilities Cookbook and LinkSet.pm.

There are several references in the Cookbook to the method  
"get_linkname", however this seems to have changed in the recent  
version of LinkSet.pm to "get_link_name". But one reference to the old  
method name still exists in LinkSet.pm, as shown by this patch:

--- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
+++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
@@ -220,7 +220,7 @@
  =cut

  sub get_link_name {
-    return ($_[0]->get_linknames)[0];
+    return ($_[0]->get_link_names)[0];
  }

  =head2 get_submitted_ids

If i haven't got this all wrong entirely, I could go through and fix  
the Cookbook entries if that was useful?

adam


On 10 Jun 2009, at 13:28, Chris Fields wrote:

> I can reproduce that; I'll look into it.
>
> chris
>
> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I am going through the EUtilities Cookbook, but the last example  
>> (in section 2.3.1) fails with:
>>
>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>
>> This is with BioPerl 1.6.0, perl v5.8.8
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Wed Jun 10 10:10:21 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 15:10:21 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
Message-ID: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>


Thanks for the pointers Chris.

The new example on the Cookbook doesn't quite work for me as ChrStart  
seems to appear in the DocSum twice, thus  
get_contents_by_name('ChrStart') returns a list of two values (which  
writes the second ChrStart into $end). Also the $start and $end seem  
to be out by 1, so I needed to change it to this:

my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
my ($start) = ($docsum->get_contents_by_name('ChrStart'));
my ($end) = ($docsum->get_contents_by_name('ChrStop'));

  $start += 1;
  $end += 1;

Ah, looking at this further there appears to be something going on in  
the response from Entrez. Compare these two gene records:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? 
db=gene&id=18131		(your example below)
http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
		(my gene)

In both cases you can see that ChrStart appears twice, once as part of  
the GenomicInfo list and once on its own at the bottom. In my example  
above the two ChrStart values match, but in the Notch3 example you  
posted the 2nd ChrStart seems to be the same as the ChrStop in the  
GenomicInfo list. Do you know if the second ChrStart has a separate  
meaning?

I guess in the Cookbook example we would need to make sure that the  
get_contents_by_name('ChrStart') picks up the value from the  
GenomicInfo list, is this possible?

thanks again

adam


On 10 Jun 2009, at 14:20, Chris Fields wrote:

> EntrezGene doesn't contain the sequence information; I believe it  
> just links to the sequence in a specified nuc record with given  
> coordinates.  You can get to it, but it takes a little trickery; in  
> essence you need to use the UID to get the gene summary information,  
> extract that, then grab the sequence record using seqstart, seqend,  
> and seqstrand.
>
> A dump of esummary info for UID 18131, for instance, (using $eutil- 
> >print_all) gives this info (abbreviated somewhat):
>
> UID                 :18131
> Name                :Notch3
> Description         :Notch gene homolog 3 (Drosophila)
> Orgname             :Mus musculus
> ...
> GenomicInfo
>    GenomicInfoType
>        ChrLoc      :17
>        ChrAccVer   :NC_000083.5
>        ChrStart    :32303796
>        ChrStop     :32257837
> GeneWeight          :23049
>
> The genomic info section gives the accession.version, start, end,  
> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
> added an example to the cookbook:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>
> chris
>
> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I have been experimenting with the Bio::DB::EUtilities module, with  
>> help from the Cookbook. But I can't seem to figure out how to get  
>> the DNA sequence of a gene; all the examples seem to be fetching  
>> protein sequence.
>>
>> How would i go about fetching a sequence using an Entrez GeneID?
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 13:56:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 12:56:46 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
	<B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
Message-ID: <CD8513A6-0872-4174-9333-94D76D5711F8@illinois.edu>

Adam,

That's really odd that they do that (both the duplication of ChrStart  
and the coordinates being off-by-one, which means they appear to be 0- 
based).  It's possible that the second ChrStart is meant to represent  
the actual first base for the gene irrespective of start/end.  My  
example is on the opposite strand, so the second ChrStart == end.

The fact that they use the same element name is slightly annoying (and  
seemingly redundant), but there is a workaround.  We grab only the  
layered information specifically; in this case we want everything  
below 'GenomicInfoType':

GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837

So, we can do this in the DocSum loop (that appears to work for your  
example):

############################

for my $docsum ($eutil->next_DocSum) {
     # to ensure we grab the right ChrStart information, we grab the  
Item above
     # it in the Item hierarchy (visible via print_all from the eutil  
instance)
     my ($item) = $docsum->get_Items_by_name('GenomicInfoType');

     my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop);

     while (my $sub_item = $item->next_subItem) {
         if (exists $item_data{$sub_item->get_name}) {
             $item_data{$sub_item->get_name} = $sub_item->get_content;
         }
     }
     # check to make sure everything is set
     for my $check (qw(ChrAccVer ChrStart ChrStop)) {
         die "$check not set" unless $item_data{$check};
     }

     my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1;
     $fetcher->set_parameters(-id => $item_data{ChrAccVer},
                              -seq_start => $item_data{ChrStart} + 1,
                              -seq_stop  => $item_data{ChrStop} + 1,
                              -strand    => $strand);
     print $fetcher->get_Response->content;
}

############################

That's to retain compatibility with 1.6; I'll update the wiki.  I can  
add some common Item container methods to grab information for any  
Items contained in the current instance (be it a DocSum or another  
Item).  I'll add that in bioperl-live.

chris

On Jun 10, 2009, at 9:10 AM, Adam Witney wrote:

> Thanks for the pointers Chris.
>
> The new example on the Cookbook doesn't quite work for me as  
> ChrStart seems to appear in the DocSum twice, thus  
> get_contents_by_name('ChrStart') returns a list of two values (which  
> writes the second ChrStart into $end). Also the $start and $end seem  
> to be out by 1, so I needed to change it to this:
>
> my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
> my ($start) = ($docsum->get_contents_by_name('ChrStart'));
> my ($end) = ($docsum->get_contents_by_name('ChrStop'));
>
> $start += 1;
> $end += 1;
>
> Ah, looking at this further there appears to be something going on  
> in the response from Entrez. Compare these two gene records:
>
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 
> 		(your example below)
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
> 		(my gene)
>
> In both cases you can see that ChrStart appears twice, once as part  
> of the GenomicInfo list and once on its own at the bottom. In my  
> example above the two ChrStart values match, but in the Notch3  
> example you posted the 2nd ChrStart seems to be the same as the  
> ChrStop in the GenomicInfo list. Do you know if the second ChrStart  
> has a separate meaning?
>
> I guess in the Cookbook example we would need to make sure that the  
> get_contents_by_name('ChrStart') picks up the value from the  
> GenomicInfo list, is this possible?
>
> thanks again
>
> adam
>
>
> On 10 Jun 2009, at 14:20, Chris Fields wrote:
>
>> EntrezGene doesn't contain the sequence information; I believe it  
>> just links to the sequence in a specified nuc record with given  
>> coordinates.  You can get to it, but it takes a little trickery; in  
>> essence you need to use the UID to get the gene summary  
>> information, extract that, then grab the sequence record using  
>> seqstart, seqend, and seqstrand.
>>
>> A dump of esummary info for UID 18131, for instance, (using $eutil- 
>> >print_all) gives this info (abbreviated somewhat):
>>
>> UID                 :18131
>> Name                :Notch3
>> Description         :Notch gene homolog 3 (Drosophila)
>> Orgname             :Mus musculus
>> ...
>> GenomicInfo
>>   GenomicInfoType
>>       ChrLoc      :17
>>       ChrAccVer   :NC_000083.5
>>       ChrStart    :32303796
>>       ChrStop     :32257837
>> GeneWeight          :23049
>>
>> The genomic info section gives the accession.version, start, end,  
>> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
>> added an example to the cookbook:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>>
>> chris
>>
>> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I have been experimenting with the Bio::DB::EUtilities module,  
>>> with help from the Cookbook. But I can't seem to figure out how to  
>>> get the DNA sequence of a gene; all the examples seem to be  
>>> fetching protein sequence.
>>>
>>> How would i go about fetching a sequence using an Entrez GeneID?
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 07:36:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 07:36:40 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife>

Hi Chris and list-
Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
I notice also that autogenerated documentation for bioperl-live doesn't contain
new modules (or HIVQuery & Tiling, anyway ;) )--
cheers, Mark


From maj at fortinbras.us  Thu Jun 11 09:17:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 09:17:25 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>

Rasmus et al-

This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles 
through
all enzymes apparently creating a global cut map). AarI has a recognition 
sequence of

CACCTGC (in $enz->seq->seq)

but a cut site of

CACCTGCNNNN^ (in $enz->seq->site)

The bad parm '11' refers to the end of the cut site sequence, but the routine
B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence,
and so throws.

This surprises me. Core, let me know if you want me to take this on, or
if the module author can fix it quicker.

cheers,
Mark

----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 11 10:19:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 11 Jun 2009 09:19:51 -0500
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>

Mark,

Feel free to take it up.  It's probably a good idea to start a bug  
report for tracking if it proves to be thornier to fix than expected.

chris

On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:

> Rasmus et al-
>
> This looks like a bug. A quick debug shows it's barfing on  
> 'AarI' (as it cycles through
> all enzymes apparently creating a global cut map). AarI has a  
> recognition sequence of
>
> CACCTGC (in $enz->seq->seq)
>
> but a cut site of
>
> CACCTGCNNNN^ (in $enz->seq->site)
>
> The bad parm '11' refers to the end of the cut site sequence, but  
> the routine
> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition  
> sequence,
> and so throws.
>
> This surprises me. Core, let me know if you want me to take this on,  
> or
> if the module author can fix it quicker.
>
> cheers,
> Mark
>
> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
> using rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so  
>> please bear with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>> created the script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out  
>> the '-enzymes' argument, so it uses the built-in collection of  
>> enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only  
>> available in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length  
>> of sequence (total=7)
>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>> Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>          {
>>            'seq' => 'CTCGACCGTTAGCAA',
>>            'end' => 15,
>>            'start' => '1'
>>          },
>>          {
>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>            'end' => 34,
>>            'start' => '16'
>>          }
>>        ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>    -primary_id => 'test',
>>    -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>    -file   => 'withrefm.906',
>>    -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>    -seq     => $seqobj,
>>    -enzymes => $rebase_collection,    # it works with this line  
>> commented out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 10:26:19 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 10:26:19 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <CD6C392C39CD4287B3619FCDBC1D19CF@NewLife>

All-righty-- thanks MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From mauricio at open-bio.org  Thu Jun 11 12:46:35 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 11 Jun 2009 11:46:35 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
Message-ID: <4A3134EB.4080702@open-bio.org>

Hi Mark,

I'll take a look into this sometime between today and tomorrow. Will 
keep you posted. Thanks for the heads up :)

Mauricio.


Mark A. Jensen wrote:
> Hi Chris and list-
> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
> I notice also that autogenerated documentation for bioperl-live doesn't contain
> new modules (or HIVQuery & Tiling, anyway ;) )--
> cheers, Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Thu Jun 11 14:41:26 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 14:41:26 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3134EB.4080702@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
Message-ID: <A53006055C854297AAA58F6650F4F867@NewLife>

cheers Mauricio! MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Thursday, June 11, 2009 12:46 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Hi Mark,
>
> I'll take a look into this sometime between today and tomorrow. Will keep you 
> posted. Thanks for the heads up :)
>
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> Hi Chris and list-
>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>> I notice also that autogenerated documentation for bioperl-live doesn't 
>> contain
>> new modules (or HIVQuery & Tiling, anyway ;) )--
>> cheers, Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> 


From Xianjun.Dong at bccs.uib.no  Fri Jun 12 16:38:50 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Fri, 12 Jun 2009 22:38:50 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for
	Bio::Graphics::Glyph
Message-ID: <4A32BCDA.4080605@ii.uib.no>

HI,

I am not sure this is the right place I can get help.

I've suffered by a problem for several days: I want to highlight parts 
of regions in my track, using a different background color. To do that, 
I defined a glyph named "background", based on the 
'Bio::Graphics::Glyph::generic' module. I override the draw_component() 
method, by adding code like below:

$gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));

# the script is pasted at the end

This will draw a rectangle with top=0, bottom=$gd->height. I made the 
highlight regions into a list of features, and add_track with 
-glyph=>'background'. (see the following script, test.pl) This really 
works as I expect, which will add a colored block at background of all 
tracks in a panel (including the ruler arrow). You can see the output 
image in attached file "test.bioperl1.2.3.png"

Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does 
not work. Well, it works, but the highlight part only shrink to a low 
height, instead of covering all tracks in the panel. I also attached the 
output here, see the file "test.bioperl1.6.png".

I tried to think about the reason, the 'background' module is based on 
the generic module. What can cause the difference? Is it because 
$gd->height is different, or the tracks followed with 'background' track 
can not draw from the first position?

Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
person solve problem, wise person avoid problem"...) But another problem 
is coming: Bio::Graphics in Bioperl 1.2.3 does not support 
$panel->create_web_map() function, which means I have to use some higher 
version if I want to create web map for my graphics, but then I have to 
give up using highlight background.

OK. It's long enough for my first-time submission here. Hope someone can 
throw me some clue.

Thanks ahead!!

Xianjun


==================== test.pl =======================
#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12);

# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
$panel->add_track([$trans41,$trans31],
          -glyph   => 'background',
                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();

1;

==================== background.pm =======================
package Bio::Graphics::Glyph::background;
 
use strict;
use base 'Bio::Graphics::Glyph::generic';
sub pad_top{
  return 0;
}

sub draw_component {
  my $self = shift;
  #$self->SUPER::draw_component(@_);
  my ($gd,$dx,$dy) = @_;
  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
 
  # draw an arrow to indicate the direction of transcript
  my $color = $self->option('block_bgcolor') || '#cccccc';
  $gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));
}
 
1;

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0005.png>

From scott at scottcain.net  Fri Jun 12 21:29:09 2009
From: scott at scottcain.net (Scott Cain)
Date: Fri, 12 Jun 2009 21:29:09 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A32BCDA.4080605@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>

Hello Xianjun,

I don't think that approach will work.  What you almost certainly need
to do is a postgrid callback that does the drawing of the highlighted
region.  For example code of how to do this, take a look at the
make_postgrid_callback subroutine in GBrowse 1.69.  The option
-postgrid is a method of Bio::Graphics::Panel.

Scott


On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
> ? ? ? ? -glyph ? => 'background',
> ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
> ? ? ? ? ? ? ? ? );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
> ? ? ? ? ? ? ? ? -glyph=>'arrow',
> ? ? ? ? ? ? ? ? -double=>1,
> ? ? ? ? ? ? ? ? -tick=>2);
>
> $panel->add_track($trans,
> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -title => '$source',
> ? ? ? ? ? ? ? ? -link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
> ? ? ? ? ? ? ? ? );
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Jun 13 09:27:39 2009
From: scott at scottcain.net (Scott Cain)
Date: Sat, 13 Jun 2009 09:27:39 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A339621.2060702@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>

Hi Xianjun,

I understand what you want to do, as the current version of gbrowse
does this, which uses bioperl 1.6.  Without digging through the code,
I can't tell you exactly how this works and you didn't send your code
that uses this callback, so I can't try it either.

One thing that is different between your code and gbrowse is that each
of the tracks is actually a seperate panel (to allow track dragging),
so it possible that this sort of callback doesn't work for
Bio::Graphics any more.

Scott

On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott
>
> Thanks for your reply first.
>
> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>
> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>
> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>
> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>
> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>
> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
> test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>
> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>
> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>
> Thanks
>
> Xianjun
> =============================================
>
> # this generates the callback for highlighting a region
> sub make_postgrid_callback {
> ?my $settings = shift;
> ?return unless ref $settings->{h_region};
>
> ?my @h_regions = map {
>  ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>  ? defined($h_ref) && $h_ref eq $settings->{ref}
>  ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>  ? ? ? ? ? ? ? ?: ()
> ?}
>  ? @{$settings->{h_region}};
>
> ?return unless @h_regions;
> ?return hilite_regions_closure(@h_regions);
> }
>
> # this subroutine generates a Bio::Graphics::Panel callback closure
> # suitable for hilighting a region of a panel.
> # The args are a list of [start,end,color]
> sub hilite_regions_closure {
> ?my @h_regions = @_;
>
> ?return sub {
>  ? my $gd ? ? = shift;
>  ? my $panel ?= shift;
>  ? my $left ? = $panel->pad_left;
>  ? my $top ? ?= $panel->top;
>  ? my $bottom = $panel->bottom;
>  ? for my $r (@h_regions) {
>  ? ? my ($h_start,$h_end,$h_color) = @$r;
>  ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>  ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>  ? ? # assuming top is 0 so as to ignore top padding
>  ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>  ? }
> ?};
> }
>
>
> Scott Cain wrote:
>
> Hello Xianjun,
>
> I don't think that approach will work. ?What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region. ?For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>
>
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
>  ? ? ? ?-glyph ? => 'background',
>  ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
>  ? ? ? ? ? ? ? ?);
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>  ? ? ? ? ? ? ? ?-glyph=>'arrow',
>  ? ? ? ? ? ? ? ?-double=>1,
>  ? ? ? ? ? ? ? ?-tick=>2);
>
> $panel->add_track($trans,
>  ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>  ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-title => '$source',
>  ? ? ? ? ? ? ? ?-link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>  ? ? ? ? ? ? ? ?);
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 12:48:16 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 18:48:16 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
Message-ID: <4A33D850.1020203@ii.uib.no>

Hi, Scott

Before I gave up my own whole solution to use GBrowse, I still want to 
bother you once:

As you suggested, I put -postgrid option when the panel, which will call 
a function to draw the background. The code below is almost copied from 
the online POD of Bio::Graphics::Panel (see 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
)

But it still does not work. Could you help to have a look? I paste it 
below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the 
gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)

  my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment,
                                        -grid=>1,
                                        -width=>600,
                                        -postgrid=> \&draw_gap);
  sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $panel->bottom;
     my $gray                 = $panel->translate_color('gray');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}

THanks

Xianjun

-----------------------------------------------

#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12
                                             -postgrid=>\&gap_it);

sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $gd->height, #panel->bottom;
     my $gray                 = $panel->translate_color('red');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}
# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
#$panel->add_track([$trans41,$trans31],
#          -glyph   => 'background',
#                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
#                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();


Scott Cain wrote:
> Hi Xianjun,
>
> I understand what you want to do, as the current version of gbrowse
> does this, which uses bioperl 1.6.  Without digging through the code,
> I can't tell you exactly how this works and you didn't send your code
> that uses this callback, so I can't try it either.
>
> One thing that is different between your code and gbrowse is that each
> of the tracks is actually a seperate panel (to allow track dragging),
> so it possible that this sort of callback doesn't work for
> Bio::Graphics any more.
>
> Scott
>
> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
>   
>> Hi, Scott
>>
>> Thanks for your reply first.
>>
>> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>>
>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>
>> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>
>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>
>> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>>
>> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>> test.bioperl1.2.3.png:    http://translog.genereg.net/test.bioperl1.2.3.png ]
>>
>> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>>
>> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>>
>> Thanks
>>
>> Xianjun
>> =============================================
>>
>> # this generates the callback for highlighting a region
>> sub make_postgrid_callback {
>>  my $settings = shift;
>>  return unless ref $settings->{h_region};
>>
>>  my @h_regions = map {
>>    my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>                 : ()
>>  }
>>    @{$settings->{h_region}};
>>
>>  return unless @h_regions;
>>  return hilite_regions_closure(@h_regions);
>> }
>>
>> # this subroutine generates a Bio::Graphics::Panel callback closure
>> # suitable for hilighting a region of a panel.
>> # The args are a list of [start,end,color]
>> sub hilite_regions_closure {
>>  my @h_regions = @_;
>>
>>  return sub {
>>    my $gd     = shift;
>>    my $panel  = shift;
>>    my $left   = $panel->pad_left;
>>    my $top    = $panel->top;
>>    my $bottom = $panel->bottom;
>>    for my $r (@h_regions) {
>>      my ($h_start,$h_end,$h_color) = @$r;
>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>>      # assuming top is 0 so as to ignore top padding
>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>    }
>>  };
>> }
>>
>>
>> Scott Cain wrote:
>>
>> Hello Xianjun,
>>
>> I don't think that approach will work.  What you almost certainly need
>> to do is a postgrid callback that does the drawing of the highlighted
>> region.  For example code of how to do this, take a look at the
>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>> -postgrid is a method of Bio::Graphics::Panel.
>>
>> Scott
>>
>>
>>
>>
>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>
>>
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>>     
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From maj at fortinbras.us  Sun Jun 14 00:35:18 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 00:35:18 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when
	usingrebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <A9819F7FF3894C768CF89C36CB689942@NewLife>

All-

I'm finding this is requiring a pretty substantial refactor and
rationalization. I have opened a branch at
REPOS/bioperl-live/branches/restriction-refactor
and am making commits at will there (won't Rob be pleased!).
When it appears to be passing tests, I'll let Chris know (on list),
and he can decide on its mergability, and brave users could try
it out by downloading Bio/Restriction (deeply) via subversion.

My running commentary is at Bug #2855.
MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when 
usingrebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Sun Jun 14 21:57:45 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 14 Jun 2009 18:57:45 -0700
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception
	when	usingrebasefile.
In-Reply-To: <A9819F7FF3894C768CF89C36CB689942@NewLife>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
	<A9819F7FF3894C768CF89C36CB689942@NewLife>
Message-ID: <4A35AA99.2080305@cornell.edu>

Mark A. Jensen wrote:
> I'm finding this is requiring a pretty substantial refactor and
> rationalization. I have opened a branch at
> REPOS/bioperl-live/branches/restriction-refactor
> and am making commits at will there (won't Rob be pleased!).
Oh Mark, you are so agile!

> When it appears to be passing tests, I'll let Chris know (on list),
> and he can decide on its mergability, and brave users could try
> it out by downloading Bio/Restriction (deeply) via subversion.
If it's passing tests but still has bugs, make sure you add tests for 
the additional bugs you find!

Rob


From maj at fortinbras.us  Sun Jun 14 22:02:37 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 22:02:37 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis.
	Exceptionwhen	usingrebasefile.
In-Reply-To: <4A35AA99.2080305@cornell.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu><A9819F7FF3894C768CF89C36CB689942@NewLife>
	<4A35AA99.2080305@cornell.edu>
Message-ID: <FFDC29BB104149BE95840F1AD1B61827@NewLife>


----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Sunday, June 14, 2009 9:57 PM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen 
usingrebasefile.


> Mark A. Jensen wrote:
>> I'm finding this is requiring a pretty substantial refactor and
>> rationalization. I have opened a branch at
>> REPOS/bioperl-live/branches/restriction-refactor
>> and am making commits at will there (won't Rob be pleased!).
> Oh Mark, you are so agile!
ha!
>
>> When it appears to be passing tests, I'll let Chris know (on list),
>> and he can decide on its mergability, and brave users could try
>> it out by downloading Bio/Restriction (deeply) via subversion.
> If it's passing tests but still has bugs, make sure you add tests for the 
> additional bugs you find!

mais, bien sur; plenty new tests coming-- thanks Rob-
MAJ

>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From shalabh.sharma7 at gmail.com  Mon Jun 15 16:06:31 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 15 Jun 2009 16:06:31 -0400
Subject: [Bioperl-l] sub sampling
Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>

Hi All,           I was just wondering that is there any module is bioperl
that do subsampling?
I have a file like this:

369859  0477    93
163417  1348    92
228122  0176    88
232792  0050    93
239636  1850    95
300069  0048    96
244108  0046    91
199087  0055    93
206209  0048    96
-              -         -
-              -         -

which contain around 100,000 lines and i want to take out a sample of 25%
from this file. Is there any way i can do this in Bioperl?

Thanks
Shalabh


From maj at fortinbras.us  Mon Jun 15 19:49:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 19:49:58 -0400
Subject: [Bioperl-l] Bio::Restriction refactor [Was:
	Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>

Dear All,

The revamped Bio::Restriction::* in branch

REPOS/bioperl-live/branches/restriction-refactor

passes all existing tests, including those in t/Restriction.
New tests will be added within the next day or so.
The original bug occurred because only a subset of
the possible rebase withrefm-formatted enzymes were
handled; it choked on freshly-downloaded rebase
files because of this.

The refactored version now handles *all* rebase types,
including those of rebase forms

XXX^X                [ intrasite cutters, the main types
                               built in to base.pm]
XXXX(m/n)          [ right-end extrasite cutters ]
(s/t)XXXX            [ left-end ditto ]
(s/t)XXXX(m/n)    [ double-end ditto],

palindromic and non-palindromic, as well as multisite
enzymes that string together combinations of these
forms. Much rationalization (well, seems rational to me
anyway) and cruft removal in the affected code has also
occurred. itype2.pm has been updated as well, to
conform to the refactoring.

If you're dying to try this now, get a working copy
of the branch like so

$ svn co 
svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
bioperl-rr
$ cd bioperl-rr
$ perl Build.PL
$ ./Build test
$ ./Build install

This will only hammer your current installation in the
$SITE_LIB/Bio/Restriction path; I worked only on
a sparse checkout of the necessary files. To revert to your
old install, do

$ cd $MY_OLD_BIOPERL_WORKINGDIR
$ ./Build install

[In the possible event that these instructions are in error,
there will be a response on this list in a matter of
milliseconds, so stand by.]

Happy coding-
Mark


----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 15 20:07:21 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 20:07:21 -0400
Subject: [Bioperl-l] sub sampling
In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
Message-ID: <A030148C139446DAB1DEE791A4EC2D3B@NewLife>

Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):

 # open your file into $my_infile, then
 @lines = <$my_infile>;

 my $num_samps = 10;
 my $sample_size_pc = 0.25;
 my @samples;

 for (1..$num_samps) {
    push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * 
@lines) ) ];
 }

# now, do something, fr'instance
 my @sample_pc;
 foreach (@samples) {
    my $pct=0;
    foreach my $line (@lines[ @$_ ]) {
        @a = split(/\s+/,$line);
        $pct += $a[2];
    }
    $pct /= @$_;
    push @sample_pc, $pct;
 }

R's just better for some things, ain't it?
MAJ


----- Original Message ----- 
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling


> Hi All,           I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859  0477    93
> 163417  1348    92
> 228122  0176    88
> 232792  0050    93
> 239636  1850    95
> 300069  0048    96
> 244108  0046    91
> 199087  0055    93
> 206209  0048    96
> -              -         -
> -              -         -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 08:05:53 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 14:05:53 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
Message-ID: <4A339621.2060702@ii.uib.no>

Hi, Scott

Thanks for your reply first.

I still have question: I dig out the code from GBrowse (which I paste 
below). Method make_postgrid_callback gets all highlight region and then 
use hilite_regions_closure function to draw them out, using the 
following GD function:

$gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));

where the $bottom=$panel->bottom. This is the only difference from my 
code, where I use $gd->height. I guess they are almost same (except the 
pad_bottom), we can see this in the code of 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22

OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for 
my highlight regions. The output is same, when using the library of 
Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")

OK. I might have not explained my question explicitly. My question is: 
if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I 
can get the right image I want (see the attached file 
"test.bioperl1.2.3.png"), where the highlight range will go from the 
roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
highlight region in its own track, not the whole panel. OK, did I 
explain clearly now? you can see the difference of the two images.

[I am not sure the mailist allow to attach image, otherwise, I put them 
in the following links:
test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
test.bioperl1.2.3.png:    
http://translog.genereg.net/test.bioperl1.2.3.png ]

You can test it and see the difference if you have both 1.2.3 and 1.6 on 
your computer?

Really want to know how this works in bioperl 1.2.3 (Even though this 
might be a bug at that version, or whatever)

Thanks

Xianjun
=============================================

# this generates the callback for highlighting a region
sub make_postgrid_callback {
  my $settings = shift;
  return unless ref $settings->{h_region};

  my @h_regions = map {
    my ($h_ref,$h_start,$h_end,$h_color) = 
/^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
    defined($h_ref) && $h_ref eq $settings->{ref}
                 ? [$h_start,$h_end,$h_color||'lightgrey']
                 : ()
  }
    @{$settings->{h_region}};

  return unless @h_regions;
  return hilite_regions_closure(@h_regions);
}

# this subroutine generates a Bio::Graphics::Panel callback closure
# suitable for hilighting a region of a panel.
# The args are a list of [start,end,color]
sub hilite_regions_closure {
  my @h_regions = @_;

  return sub {
    my $gd     = shift;
    my $panel  = shift;
    my $left   = $panel->pad_left;
    my $top    = $panel->top;
    my $bottom = $panel->bottom;
    for my $r (@h_regions) {
      my ($h_start,$h_end,$h_color) = @$r;
      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
      if ($end-$start <= 1) { $end++; $start-- } # so that we always see 
something
      # assuming top is 0 so as to ignore top padding
      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));
    }
  };
}


Scott Cain wrote:
> Hello Xianjun,
>
> I don't think that approach will work.  What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region.  For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69.  The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>   
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0005.png>

From malcolm.cook at gmail.com  Tue Jun 16 04:06:36 2009
From: malcolm.cook at gmail.com (Malcolm Cook)
Date: Tue, 16 Jun 2009 03:06:36 -0500
Subject: [Bioperl-l]  Alignment->slice() issue?
Message-ID: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>

Kevin,

I'm getting struck by this old issue you once coded around.

      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html

Any chance you could share your implementation with  fellow traveller...

??

Thanks,

Malcolm Cook
Stowers insitute for Medical research


From remi.planel at free.fr  Tue Jun 16 10:57:27 2009
From: remi.planel at free.fr (Remi Planel)
Date: Tue, 16 Jun 2009 16:57:27 +0200
Subject: [Bioperl-l] Hits Object
Message-ID: <4A37B2D7.70807@free.fr>

Hi all,

I couldn't find out from a Bio::Search::Result::ResultI object (obtain 
after parsing a blast report) a way to filter some of the hsps associated ?
By filter I mean eliminate for each hit some hsps I'm not interested in ?

Can I modify directly the Result object ?

Thanks,


From lsbrath at gmail.com  Tue Jun 16 11:42:37 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Tue, 16 Jun 2009 11:42:37 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
	undefined value
Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

sub hu_bl2seq_parser{
	my ($maid, $maid_dir) = @_;
	# Get the report
	my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
						   -report_type => 'blastn');
	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
	my $result=$in->next_result;
	my($hu_aln,$hu_mismatches);
	# Get info about the first hit
	my $hit = $result->next_hit;
	my $name = $hit->name;
	# get info about the first hsp of the first hit
	my $hsp = $hit->next_hsp;
	# get the alignment object
	my $aln = $hsp->get_aln;
	#my $percent_id = $hsp->percent_identity;
	#my $aln_length = $hsp->length('total');
	my @mismatches = $hsp->seq_inds('query','nomatch');
	my $aln_str="";
	# access the alignment string
	my $strIO=IO::String->new($aln_str);
	#  write the string alignio in clustalw format
	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
	# now the actual alignment string is accessable for printing or in
this case moving to a db table
	$alnio->write_aln($aln);
	$hu_aln=$aln_str;
	$hu_mismatches = scalar @mismatches;
	return($hu_aln, $hu_mismatches);
}

The problem is at "my $hit = $result->next_hit;"
Any help will be appreciated.
LomSpace


From cjfields at illinois.edu  Tue Jun 16 14:14:18 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:14:18 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu>

I'll check out the branch sometime today and run tests on it.  Thanks  
for the hard work Mark!

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From maj at fortinbras.us  Tue Jun 16 13:58:56 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:58:56 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>

Dear All,

There are tests for the new functionality of Bio::Restriction
now in t/Restriction on the branch, along with the withrefm.906
in t/data that revealed the bug in RON's post. All tests pass without
warnings on my machine (which is bioperl live, perl 5.10.10,
under Vista/cygwin - yes, I still don't have a real computer).
We're ready for a merge on my end.

Thanks all for your silent assent to these machinations.
cheers
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 16 13:51:14 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:51:14 -0400
Subject: [Bioperl-l] Hits Object
In-Reply-To: <4A37B2D7.70807@free.fr>
Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife>

Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe
http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will 
be answered there-
cheers, Mark


From cjfields at illinois.edu  Tue Jun 16 14:31:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:31:10 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>

Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  
merge.

Also (as mentioned some time back w/ Hilmar among others), we can  
probably delete this branch seeing as the code will be merged to trunk  
(it being a feature branch and all).  Worth doing the same for a few  
other feature branches as well.

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Tue Jun 16 15:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 14:07:44 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>

Sounds to me like a BioPerl bug.  Do you have some example data  
demonstrating the problem?

chris

On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:

> Kevin,
>
> I'm getting struck by this old issue you once coded around.
>
>      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>
> Any chance you could share your implementation with  fellow  
> traveller...
>
> ??
>
> Thanks,
>
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun 16 15:32:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 15:32:02 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on
	andundefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife>

lomspace-
this

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

should be

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

if you're reading the file. Then $result will have something in it when
you do $in->next_result

cheers, MAJ
----- Original Message ----- 
From: "Mgavi Brathwaite" <lsbrath at gmail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 16, 2009 11:42 AM
Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined 
value


> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> sub hu_bl2seq_parser{
> my ($maid, $maid_dir) = @_;
> # Get the report
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
>    -report_type => 'blastn');
> #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");
> #my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> my $result=$in->next_result;
> my($hu_aln,$hu_mismatches);
> # Get info about the first hit
> my $hit = $result->next_hit;
> my $name = $hit->name;
> # get info about the first hsp of the first hit
> my $hsp = $hit->next_hsp;
> # get the alignment object
> my $aln = $hsp->get_aln;
> #my $percent_id = $hsp->percent_identity;
> #my $aln_length = $hsp->length('total');
> my @mismatches = $hsp->seq_inds('query','nomatch');
> my $aln_str="";
> # access the alignment string
> my $strIO=IO::String->new($aln_str);
> #  write the string alignio in clustalw format
> my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> # now the actual alignment string is accessable for printing or in
> this case moving to a db table
> $alnio->write_aln($aln);
> $hu_aln=$aln_str;
> $hu_mismatches = scalar @mismatches;
> return($hu_aln, $hu_mismatches);
> }
>
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Tue Jun 16 15:46:40 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 16 Jun 2009 12:46:40 -0700
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
 undefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <4A37F6A0.1080907@cornell.edu>

Mgavi Brathwaite wrote:
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.

Your proximate problem seems to be that you are prepending a '>' to the 
filename in your invocation of Bio::SearchIO::new, which I think might 
cause it to write to the file instead of reading from it.  But also, you 
probably want to use next_result and next_hit in while loops, since they 
return undef when there are no more hits or hsps to parse.  This is what 
is causing your "can't call next_hit on undefined value" error. 
next_result() returns undef when there are no results to parse.

by while loops, I mean something like:

while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
      # insert the rest of your operations here
      }
}

Hope this helps.

Rob

> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
> 
> sub hu_bl2seq_parser{
> 	my ($maid, $maid_dir) = @_;
> 	# Get the report
> 	my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
> 						   -report_type => 'blastn');
> 	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
> 	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> 	my $result=$in->next_result;
> 	my($hu_aln,$hu_mismatches);
> 	# Get info about the first hit
> 	my $hit = $result->next_hit;
> 	my $name = $hit->name;
> 	# get info about the first hsp of the first hit
> 	my $hsp = $hit->next_hsp;
> 	# get the alignment object
> 	my $aln = $hsp->get_aln;
> 	#my $percent_id = $hsp->percent_identity;
> 	#my $aln_length = $hsp->length('total');
> 	my @mismatches = $hsp->seq_inds('query','nomatch');
> 	my $aln_str="";
> 	# access the alignment string
> 	my $strIO=IO::String->new($aln_str);
> 	#  write the string alignio in clustalw format
> 	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> 	# now the actual alignment string is accessable for printing or in
> this case moving to a db table
> 	$alnio->write_aln($aln);
> 	$hu_aln=$aln_str;
> 	$hu_mismatches = scalar @mismatches;
> 	return($hu_aln, $hu_mismatches);
> }
> 
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Tue Jun 16 16:10:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 16:10:34 -0400
Subject: [Bioperl-l] Bio::Restriction
	refactor[Was:Bio::Restriction::Analysis. Exception when using
	rebasefile.]
In-Reply-To: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
References: <4A2F622D.5060500@ron.dk><E80E6C1BC08D4E338739148BFE9BFAC0@NewLife><D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
	<A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife>

Right; will remove branch. Will go ahead with merge at 21:20 UTC.
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Tuesday, June 16, 2009 2:31 PM
Subject: Re: [Bioperl-l] Bio::Restriction 
refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]


> Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  merge.
>
> Also (as mentioned some time back w/ Hilmar among others), we can  probably 
> delete this branch seeing as the code will be merged to trunk  (it being a 
> feature branch and all).  Worth doing the same for a few  other feature 
> branches as well.
>
> chris
>
> On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:
>
>> Dear All,
>>
>> There are tests for the new functionality of Bio::Restriction
>> now in t/Restriction on the branch, along with the withrefm.906
>> in t/data that revealed the bug in RON's post. All tests pass without
>> warnings on my machine (which is bioperl live, perl 5.10.10,
>> under Vista/cygwin - yes, I still don't have a real computer).
>> We're ready for a merge on my end.
>>
>> Thanks all for your silent assent to these machinations.
>> cheers
>> Mark
>>
>> ----- Original Message ----- From: "Mark A. Jensen"  <maj at fortinbras.us>
>> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
>> Sent: Monday, June 15, 2009 7:49 PM
>> Subject: [Bioperl-l] Bio::Restriction refactor 
>> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>>
>>
>>> Dear All,
>>>
>>> The revamped Bio::Restriction::* in branch
>>>
>>> REPOS/bioperl-live/branches/restriction-refactor
>>>
>>> passes all existing tests, including those in t/Restriction.
>>> New tests will be added within the next day or so.
>>> The original bug occurred because only a subset of
>>> the possible rebase withrefm-formatted enzymes were
>>> handled; it choked on freshly-downloaded rebase
>>> files because of this.
>>>
>>> The refactored version now handles *all* rebase types,
>>> including those of rebase forms
>>>
>>> XXX^X                [ intrasite cutters, the main types
>>>                              built in to base.pm]
>>> XXXX(m/n)          [ right-end extrasite cutters ]
>>> (s/t)XXXX            [ left-end ditto ]
>>> (s/t)XXXX(m/n)    [ double-end ditto],
>>>
>>> palindromic and non-palindromic, as well as multisite
>>> enzymes that string together combinations of these
>>> forms. Much rationalization (well, seems rational to me
>>> anyway) and cruft removal in the affected code has also
>>> occurred. itype2.pm has been updated as well, to
>>> conform to the refactoring.
>>>
>>> If you're dying to try this now, get a working copy
>>> of the branch like so
>>>
>>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>>> restriction-refactor bioperl-rr
>>> $ cd bioperl-rr
>>> $ perl Build.PL
>>> $ ./Build test
>>> $ ./Build install
>>>
>>> This will only hammer your current installation in the
>>> $SITE_LIB/Bio/Restriction path; I worked only on
>>> a sparse checkout of the necessary files. To revert to your
>>> old install, do
>>>
>>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>>> $ ./Build install
>>>
>>> [In the possible event that these instructions are in error,
>>> there will be a response on this list in a matter of
>>> milliseconds, so stand by.]
>>>
>>> Happy coding-
>>> Mark
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, June 10, 2009 3:35 AM
>>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>>> rebasefile.
>>>
>>>
>>>> Hi,
>>>>
>>>> This is my first time using bioperl for restriction analysis, so  please 
>>>> bear with me, if this is a FAQ.
>>>>
>>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>>> the script shown at the bottom of the mail.
>>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>>
>>>> The scripts throws an exception - see below. But, if I comment out  the 
>>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>>> works.
>>>>
>>>> My problem is, that I need to use some of the enzymes that are  only 
>>>> available in rebase. So how do I get this working?
>>>>
>>>> Thanks for your attention.
>>>>
>>>> Best regards,
>>>> Rasmus Ory Nielsen
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script:
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>>
>>>> ------------- EXCEPTION -------------
>>>> MSG: Bad end parameter (11). End must be less than the total  length of 
>>>> sequence (total=7)
>>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>>> 5.10.0/Bio/PrimarySeq.pm:401
>>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>>> STACK toplevel ./restriction_test.pl:30
>>>> -------------------------------------
>>>>
>>>> [roni at ksdhcp ~]$
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script with the '-enzymes' argument commented out
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>> $VAR1 = [
>>>>          {
>>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>>            'end' => 15,
>>>>            'start' => '1'
>>>>          },
>>>>          {
>>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>>            'end' => 34,
>>>>            'start' => '16'
>>>>          }
>>>>        ];
>>>> [roni at ksdhcp ~]$
>>>>
>>>> ############################################################
>>>>
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::PrimarySeq;
>>>> use Bio::Restriction::IO;
>>>> use Bio::Restriction::Analysis;
>>>> use Data::Dumper;
>>>>
>>>> # create seq obj
>>>> my $seqobj = new Bio::PrimarySeq(
>>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>>    -primary_id => 'test',
>>>>    -molecule   => 'dna'
>>>> );
>>>>
>>>> # read rebase file
>>>> my $rebase_io = Bio::Restriction::IO->new(
>>>>    -file   => 'withrefm.906',
>>>>    -format => 'withrefm',
>>>> );
>>>> my $rebase_collection = $rebase_io->read;
>>>>
>>>> # start restriction analysis
>>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>>    -seq     => $seqobj,
>>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>>> out
>>>> );
>>>>
>>>> # retrieve fragment maps
>>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>>> print Dumper \@fragment_maps;
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From MEC at stowers.org  Tue Jun 16 16:13:33 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Tue, 16 Jun 2009 15:13:33 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A389@exchmb-02.stowers-institute.org>

Chris!

erm, yeah, I do....

... and I will schedule some time to code up a test and add it to AlignI's suite....

Malcolm
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Tuesday, June 16, 2009 2:08 PM
> To: Malcolm Cook
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Alignment->slice() issue?
> 
> Sounds to me like a BioPerl bug.  Do you have some example 
> data demonstrating the problem?
> 
> chris
> 
> On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:
> 
> > Kevin,
> >
> > I'm getting struck by this old issue you once coded around.
> >
> >      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> >
> > Any chance you could share your implementation with  fellow 
> > traveller...
> >
> > ??
> >
> > Thanks,
> >
> > Malcolm Cook
> > Stowers insitute for Medical research
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Tue Jun 16 22:47:39 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 22:47:39 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>

Dear All,

The refactored Bio::Restriction::* has been merged to trunk, with all
tests passing. [Anyone got a cigarette?]

cheers,
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Russell.Smithies at agresearch.co.nz  Tue Jun 16 23:21:22 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 17 Jun 2009 15:21:22 +1200
Subject: [Bioperl-l] Bio::Restriction
	refactor	[Was:Bio::Restriction::Analysis. Exception when
	using rebasefile.]
In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz>

Cigarettes are post-coitus and pre-firing squad.
What you'd be needing is a cigar (proud father)

;-)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Wednesday, 17 June 2009 2:48 p.m.
> To: bioperl-l at lists.open-bio.org
> Cc: Rasmus Ory Nielsen
> Subject: Re: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
> 
> Dear All,
> 
> The refactored Bio::Restriction::* has been merged to trunk, with all
> tests passing. [Anyone got a cigarette?]
> 
> cheers,
> Mark
> 
> ----- Original Message -----
> From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis.
> Exception when using rebasefile.]
> 
> 
> > Dear All,
> >
> > The revamped Bio::Restriction::* in branch
> >
> > REPOS/bioperl-live/branches/restriction-refactor
> >
> > passes all existing tests, including those in t/Restriction.
> > New tests will be added within the next day or so.
> > The original bug occurred because only a subset of
> > the possible rebase withrefm-formatted enzymes were
> > handled; it choked on freshly-downloaded rebase
> > files because of this.
> >
> > The refactored version now handles *all* rebase types,
> > including those of rebase forms
> >
> > XXX^X                [ intrasite cutters, the main types
> >                               built in to base.pm]
> > XXXX(m/n)          [ right-end extrasite cutters ]
> > (s/t)XXXX            [ left-end ditto ]
> > (s/t)XXXX(m/n)    [ double-end ditto],
> >
> > palindromic and non-palindromic, as well as multisite
> > enzymes that string together combinations of these
> > forms. Much rationalization (well, seems rational to me
> > anyway) and cruft removal in the affected code has also
> > occurred. itype2.pm has been updated as well, to
> > conform to the refactoring.
> >
> > If you're dying to try this now, get a working copy
> > of the branch like so
> >
> > $ svn co
> > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor
> > bioperl-rr
> > $ cd bioperl-rr
> > $ perl Build.PL
> > $ ./Build test
> > $ ./Build install
> >
> > This will only hammer your current installation in the
> > $SITE_LIB/Bio/Restriction path; I worked only on
> > a sparse checkout of the necessary files. To revert to your
> > old install, do
> >
> > $ cd $MY_OLD_BIOPERL_WORKINGDIR
> > $ ./Build install
> >
> > [In the possible event that these instructions are in error,
> > there will be a response on this list in a matter of
> > milliseconds, so stand by.]
> >
> > Happy coding-
> > Mark
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Rasmus Ory Nielsen" <ron at ron.dk>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, June 10, 2009 3:35 AM
> > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
> > rebasefile.
> >
> >
> >> Hi,
> >>
> >> This is my first time using bioperl for restriction analysis, so please
> bear
> >> with me, if this is a FAQ.
> >>
> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created
> the
> >> script shown at the bottom of the mail.
> >> My bioperl version is bioperl-live nightly from 09-Jun-2009.
> >>
> >> The scripts throws an exception - see below. But, if I comment out the
> >> '-enzymes' argument, so it uses the built-in collection of enzymes, it
> works.
> >>
> >> My problem is, that I need to use some of the enzymes that are only
> available
> >> in rebase. So how do I get this working?
> >>
> >> Thanks for your attention.
> >>
> >> Best regards,
> >> Rasmus Ory Nielsen
> >>
> >>
> >> ############################################################
> >> Output from the script:
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >>
> >> ------------- EXCEPTION -------------
> >> MSG: Bad end parameter (11). End must be less than the total length of
> >> sequence (total=7)
> >> STACK Bio::PrimarySeq::subseq
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> >> STACK Bio::Restriction::Analysis::_enzyme_sites
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> >> STACK Bio::Restriction::Analysis::_cuts
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> >> STACK Bio::Restriction::Analysis::cut
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> >> STACK Bio::Restriction::Analysis::fragment_maps
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> >> STACK toplevel ./restriction_test.pl:30
> >> -------------------------------------
> >>
> >> [roni at ksdhcp ~]$
> >>
> >>
> >> ############################################################
> >> Output from the script with the '-enzymes' argument commented out
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >> $VAR1 = [
> >>           {
> >>             'seq' => 'CTCGACCGTTAGCAA',
> >>             'end' => 15,
> >>             'start' => '1'
> >>           },
> >>           {
> >>             'seq' => 'AGCTTTCTACCGTTATCGT',
> >>             'end' => 34,
> >>             'start' => '16'
> >>           }
> >>         ];
> >> [roni at ksdhcp ~]$
> >>
> >> ############################################################
> >>
> >> #!/usr/bin/perl
> >> use strict;
> >> use warnings;
> >> use Bio::PrimarySeq;
> >> use Bio::Restriction::IO;
> >> use Bio::Restriction::Analysis;
> >> use Data::Dumper;
> >>
> >> # create seq obj
> >> my $seqobj = new Bio::PrimarySeq(
> >>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
> >>     -primary_id => 'test',
> >>     -molecule   => 'dna'
> >> );
> >>
> >> # read rebase file
> >> my $rebase_io = Bio::Restriction::IO->new(
> >>     -file   => 'withrefm.906',
> >>     -format => 'withrefm',
> >> );
> >> my $rebase_collection = $rebase_io->read;
> >>
> >> # start restriction analysis
> >> my $restriction_analysis = Bio::Restriction::Analysis->new(
> >>     -seq     => $seqobj,
> >>     -enzymes => $rebase_collection,    # it works with this line commented
> >> out
> >> );
> >>
> >> # retrieve fragment maps
> >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> >> print Dumper \@fragment_maps;
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From e.stupka at ucl.ac.uk  Wed Jun 17 07:29:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 12:29:08 +0100
Subject: [Bioperl-l] Next-gen modules
Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>

Dear all,

after several years of absence I am slowly coming back to Bioperl, and  
hope to contribute again to its development.

One area that I was thinking of starting from, since we are actively  
involved with it, is to improve BIoperl's support fo next-gen  
sequencing data, tools, etc. Since I am sure I have missed out on a  
lot of recent developments, do let me know if/what is useful.

One example that comes to mind is that the conversion of various  
formats to/from FASTQ does not seem to be supported. Some code can be  
found within Li Heng's script: http://maq.sourceforge.net/ 
fq_all2std.pl but it would be good if it could make its way into  
SeqIO? And similarly, potentially, for other next-gen sequence formats?

Similarly, there seems to be little in bioperl-run to support tools  
that have been developed in this area, such as Maq, BowTie, TopHat, etc?

Do let me know if there is a past thread on this, or other people  
actively developing, etc. so that I can find out what priorities are.

thanks and best regards to all (old friends and new),

Elia

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 08:19:04 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:19:04 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>

[ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ]
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Wed Jun 17 08:21:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 13:21:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com>

On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?

If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)

I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO

Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.

For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.

> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.

Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html

Regards,

Peter (at Biopython)


From maj at fortinbras.us  Wed Jun 17 08:02:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:02:11 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <92C15E3391F64BAF801754E924122540@NewLife>

Elia--
I say a definite +1; in fact, this sounds like it should be a Hot Topic 
(see http://www.bioperl.org/wiki/Category:Hot_Topics for some others
you might have missed in your hiatus...). I will create a page that 
can be a central point for wish lists, discussion, etc.

There has been much discussion of late about FASTQ 
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html

cheers from a newbie, 
Mark

----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 08:57:52 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 07:57:52 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>

Elia,

As Mark indicated, we recently discussed the lack of support for next- 
gen on list, at least re: fastq.  I may be hit with the same thing in  
a few months time myself, and I recall Jason and a few others also  
mentioning the same.  Heikki wrote some code for Illumina FASTQ for  
SeqIO and related modules but I don't believe it has been committed to  
trunk yet, so maybe he can answer.

 From prior discussions IIRC the issues were:

1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
Illumina 1.3) from one another (so maybe some optional validation), and
2) having a way for the Seq object to either 'know' what format is  
contained, or we use phred score and convert back and forth from that  
(I think the latter makes more sense).

Peter's suggestions also are reasonable, though does biopython have a  
separate module for each of these variations?  Our version (I believe)  
mainly varied the conversion within Bio::SeqIO::fastq itself based on  
the fastq variant passed in as a separate named argument.

As for the wrappers, we would most certainly welcome them!

chris

On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl,  
> and hope to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can  
> be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl 
>  but it would be good if it could make its way into SeqIO? And  
> similarly, potentially, for other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?
>
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 08:54:22 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 13:54:22 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>

Dear Mark,

thanks a lot for the pointers.

With regards to FASTQ parsing:

-my understanding by reading past threads is to work on a single  
format, i.e. FASTQ and to interpet the quality "flavours" as just  
quality conversions, right?

-However, I assume we would still want to support a simple way for the  
user to say format => 'fastq-solexa' using the nomenclature adopted in  
BioPython suggested by Peter, right?

-I also saw Heikki's "long essay", but did not yet compare to Heng  
Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we  
would hope they would produce identical outputs, will be a good check.

Finally, I saw Tristan's reply to Heikki's thread, so what is the  
status quo? Is it moving forward?

cheers

Elia


On 17 Jun 2009, at 13:02, Mark A. Jensen wrote:

> Elia--
> I say a definite +1; in fact, this sounds like it should be a Hot  
> Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some  
> others
> you might have missed in your hiatus...). I will create a page that  
> can be a central point for wish lists, discussion, etc.
>
> There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html
>
> cheers from a newbie, Mark
>
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:25:59 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:25:59 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
> Elia,
>
> As Mark indicated, we recently discussed the lack of support for next-gen on
> list, at least re: fastq. ?I may be hit with the same thing in a few months
> time myself, and I recall Jason and a few others also mentioning the same.
> ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but
> I don't believe it has been committed to trunk yet, so maybe he can answer.
>
> From prior discussions IIRC the issues were:
>
> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina
> 1.3) from one another (so maybe some optional validation), and

Following the python rule of thumb for being explicit, Biopython makes
the user specify which FASTQ variant is being used. I don't think you
can do anything else. Any attempted validation would have to be
heuristic based on the ASCII characters found, and would risk false
positive warnings.

> 2) having a way for the Seq object to either 'know' what format is
> contained, or we use phred score and convert back and forth from that (I
> think the latter makes more sense).

I think it could make sense for BioPerl to convert Solexa scores to/from
PHRED scores on the fly (especially now that Illumina is abandoning
the Solexa score system). Python style tries to avoid implicit conversions,
so Biopython doesn't automatically do a conversion from Solexa to
PHRED scores on parsing (but will on writing if the requested output
format requires this).

> Peter's suggestions also are reasonable, though does biopython have a
> separate module for each of these variations? ?Our version (I believe)
> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
> fastq variant passed in as a separate named argument.

Biopython's SeqIO gives the three FASTQ variants their own unique
names. This format name is a required argument for parsing/writing
(we don't try and guess the file format from the data contents). Internally
we have three separate FASTQ parsers/writers although they do share
code.

Other issues to keep in mind:

(3) There should be no warning parsing files where the optional repeated
title is missing on the "+" lines (as discussed earlier on the BioPerl list).

(4) When writing FASTQ files should BioPerl omit the optional repeated
title on the "+" line? Biopython omits this as I understand this to be
common practice, and can make a big different to file sizes - especially
on short read data from Solexa/Illumina.

(5) Also test reading and writing files with an optional description (as well
as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples,
e.g.

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC


(6) Test reading and writing files where the encoded quality string starts
with a "@" or a "+" character, e.g.
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html

Peter


From tristan.lefebure at gmail.com  Wed Jun 17 09:27:12 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 09:27:12 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <200906170927.13273.tristan.lefebure@gmail.com>

Hello,
Regarding next-gen sequences and bioperl, following my 
experience, another issue is bioperl speed. For example, if 
you want to trim bad quality bases at ends of 1E6 Solexa 
reads using Bio::SeqIO::fastq and some methods in 
Bio::Seq::Quality, well, you've got to be patient (but may 
be I missed some shortcuts...).

A pure perl solution will be between 100 to 1000x faster... 
Would it be possible to have an ultra-light quality object 
with few simple methods for next-gen reads?

I can contribute some tests if that sounds like an important 
point.

-Tristan


On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote:
> Elia--
> I say a definite +1; in fact, this sounds like it should
> be a Hot Topic (see
> http://www.bioperl.org/wiki/Category:Hot_Topics for some
> others you might have missed in your hiatus...). I will
> create a page that can be a central point for wish lists,
> discussion, etc.
>
> There has been much discussion of late about FASTQ
> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0
>30187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/
>029765.html
>
> cheers from a newbie,
> Mark
>
> ----- Original Message -----
> From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
> > Dear all,
> >
> > after several years of absence I am slowly coming back
> > to Bioperl, and hope to contribute again to its
> > development.
> >
> > One area that I was thinking of starting from, since we
> > are actively involved with it, is to improve BIoperl's
> > support fo next-gen sequencing data, tools, etc. Since
> > I am sure I have missed out on a lot of recent
> > developments, do let me know if/what is useful.
> >
> > One example that comes to mind is that the conversion
> > of various formats to/from FASTQ does not seem to be
> > supported. Some code can be found within Li Heng's
> > script: http://maq.sourceforge.net/ fq_all2std.pl but
> > it would be good if it could make its way into SeqIO?
> > And similarly, potentially, for other next-gen sequence
> > formats?
> >
> > Similarly, there seems to be little in bioperl-run to
> > support tools that have been developed in this area,
> > such as Maq, BowTie, TopHat, etc?
> >
> > Do let me know if there is a past thread on this, or
> > other people actively developing, etc. so that I can
> > find out what priorities are.
> >
> > thanks and best regards to all (old friends and new),
> >
> > Elia
> >
> > ---
> > Senior Lecturer, Bioinformatics
> > UCL Cancer Institute
> > Paul O' Gorman Building
> > University College London
> > Gower Street
> > WC1E 6BT
> > London
> > UK
> >
> > Office (UCL): +44 207 679 6493
> > Office (ICMS): +44 0207 8822374
> >
> > Mobile: +44 7597 566 194
> > Mobile (Italy): +39 338 8448801
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:54:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:54:45 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear Mark,
>
> thanks a lot for the pointers.
>
> With regards to FASTQ parsing:
>
> -my understanding by reading past threads is to work on a single format,
> i.e. FASTQ and to interpet the quality "flavours" as just quality
> conversions, right?
> -However, I assume we would still want to support a simple way for the user
> to say format => 'fastq-solexa' using the nomenclature adopted in BioPython
> suggested by Peter, right?

I think you will need a way for the user to say they have a Solexa, or
an Illumina 1.3+, or an original Sanger standard FASTQ file.

>From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I
assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format
and the variant
"flybase_chadoxml" format). Does BioPerl's SeqIO format system have any
concept of flavour that I am not aware of?

> -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code
> at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they
> would produce identical outputs, will be a good check.

Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful
guide (although it doesn't yet cope with the new Illumina 1.3+ variant),
but I don't trust it 100%. See e.g.
http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html
http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html

Peter


From john.marshall at sanger.ac.uk  Wed Jun 17 09:28:12 2009
From: john.marshall at sanger.ac.uk (John Marshall)
Date: Wed, 17 Jun 2009 14:28:12 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>

On 17 Jun 2009, at 12:29, Elia Stupka wrote:
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?

FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit  
in the not too distant future.  (First it needs some "blah blah"  
replaced with actual documentation and a test suite.)

Cheers,

     John

[1] http://www.ebi.ac.uk/~zerbino/velvet/


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From Kevin.M.Brown at asu.edu  Wed Jun 17 11:41:18 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 17 Jun 2009 08:41:18 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>

Warning: This is very ugly code and makes a few assumptions, such as the
alignment objects are stored in order of their start position. I made
this assumption as that is how I put them into the object to begin with.

=head1 C<slice>

Function to slice up an alignment sequence based on start and end
parameters
and returns a new alignment object.

slice($alignment, $start, $end)

=cut

sub slice
{
	my ($alignment, $start, $end, $new_align) = @_;

	$$new_align = new Bio::SimpleAlign;
	print $$alignment->no_sequences() . "\n";

	$$new_align->add_seq(
			   new Bio::LocatableSeq(
				   -seq =>
					 substr(
	
$$alignment->get_seq_by_pos(1)->seq(),
							$start - 1, $end
- $start + 1
						   ),
				   -id    =>
$$alignment->get_seq_by_pos(1)->display_id(),
				   -start =>
	
max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
				   -end => min(
	
$$alignment->get_seq_by_pos(1)->end - $start + 1,
							   $end - $start
+ 1
							  ),
				   -alphabet => 'dna',
				   -strand   =>
$$alignment->get_seq_by_pos(1)->strand()
			   )
	);

	# implement a binary search to determine a decent offset into
the alignment
	my $probe;
	
	if ($$alignment->no_sequences() <= 2) {
		$probe = $$alignment->no_sequences();
	}
	else {
	my ($L, $R) = (1, $$alignment->no_sequences());
	while (($R - $L) > 1)
	{
		$probe = floor(($R + $L) / 2);

		# gotta watch this.  Had the check backwards and so was
never going
		# in the right direction for the search.  If I reverse
these two
		# variables, then I have to either reverse the
conditions or change
		# the > to a <.
		if ($$alignment->get_seq_by_pos($probe)->start() >
$start)
		{
			$R = $probe;
		}
		else
		{
			$L = $probe;
		}
	}
	}
	# now go through the results that are after that point
	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
	{
		my $seq = $$alignment->get_seq_by_pos($i);
		last if ($seq->start() > $end);

		# Only concern ourselves with primers that land inside
the desired region
		# other primers will show up in the image maps for each
gene.
		if ($seq->start() >= $start && $seq->end() <= $end)
		{

			# values for the substr pullout of a given
sequence
			my $offset = max($start - $seq->start(), 0);
			my $length =
			  min($end, $seq->end()) - max($start,
$seq->start()) + 1;
			$$new_align->add_seq(
					 new Bio::LocatableSeq(
						 -seq   => $seq->seq(),
						 -id    =>
$seq->display_id(),
						 -start =>
max($seq->start - $start + 1, 1),
						 -end => min($seq->end -
$start + 1, $end - $start + 1),
						 -alphabet => 'dna',
						 -strand   =>
$seq->strand()
					 )
			);
		}
	}
	return 1;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Malcolm Cook
> Sent: Tuesday, June 16, 2009 1:07 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Alignment->slice() issue?
> 
> Kevin,
> 
> I'm getting struck by this old issue you once coded around.
> 
>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> 
> Any chance you could share your implementation with  fellow 
> traveller...
> 
> ??
> 
> Thanks,
> 
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jun 17 12:47:38 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 12:47:38 -0400
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>


Hi All, 

I thought I'd revisit this thread, since in the last couple weeks,
have used both techniques (bioperl-dev and branch from trunk) to
produce completed projects. My thoughts:

Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
new addition to the core api. There was no pressure to conform to the
existing api there. In particular, there was no implicit insistence to
make things work through Bio::Search::Utils, and I was free to factor
it out. The Tiling api was definitely unstable until the end, when it
was ported to the core. As I made regular reports to bioperl-l,
everything was transparent and up front, and I received excellent
suggestions there (as usual). 

For Bio::Restriction, using the branch was just as natural. Here, the
existing structure was well established, and all the work needed to
happen beneath the api. All old t/Restriction tests needed to pass,
and additional ones created for the new functionality. So here, using
bioperl-dev wasn't natural, even though some "experiments" needed to
be tried (some succeeded and some failed, as you can see in the
commentary at Bug #2855). Even though the new code turned out to
require substantial effort, the effort was required to fix a true bug
in the working core, and any fixes needed to work transparently with
respect to the users for whom this bug had not been an issue. Using
the branch made it relatively easy to merge quickly back into the core
when done, and there is a certain psychological pressure too provided
by an open branch which is helpful.

Hilmar raised the very good point in the previous discussion that
(essentially) bioperl-dev shouldn't become a sandbox with lots of
unfinished code scraps and derelict stuff that doesn't work. My view
is bioperl-dev will become a sandbox only if we treat it like
one. I've filled out the Bioperl-dev page on the wiki
(http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
some recognition to devs there whose modules become part of the
core may be a better way to insure that projects that are started on
bioperl-dev actually get finished, than to prescribe beforehand what
kinds of projects may get started. I believe this follows the adage of
liberality on what is accepted, and strictness on what is emitted.

cheers, 
MAJ


----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at duke.edu>
To: "Chase Miller" <chmille4 at gmail.com>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, May 21, 2009 4:00 PM
Subject: Re: [Bioperl-l] bioperl-dev or branch?


> Moving this question to the BioPerl list, which is where we need to  
> discuss this I think. Can someone refresh my memory on what the  
> Bioperl-dev repository is or was meant for? It doesn't seem documented  
> on the wiki.
> 
> My (admittedly vague) recollection is that bioperl-dev is basically  
> for highly experimental changes or functionality.
> 
> I'm not clear why everything else shouldn't go either into the main  
> trunk or into a branch. If there is a realistic expectation for  
> something to be folded into the main trunk sooner or later, what would  
> be the reasons for not putting it into a branch of the main  
> repository? If we are putting it into a separate repository, we're  
> waiving a lot of svn's support for merging and resolving concurrent  
> edits.
> 
> I would also go actually go a step further and suggest that even if  
> this GSoC project starts out on a branch (which I can see good reasons  
> for, such as eliminating fear to disrupt something), there should be a  
> plan to move to main trunk before the end of the project. We've had a  
> good tradition in BioPerl of developing directly on the main trunk. It  
> sometimes leads to occasional disruptions when lots of tests seem  
> failing, but it also encourages development discipline and make new  
> code to melt into the BioPerl code base without requiring any extra  
> steps by someone.
> 
> Any and all thoughts or comments welcome and appreciated!
> 
> -hilmar
> 
> On May 21, 2009, at 11:26 AM, Chase Miller wrote:
> 
>> This brings me to a question about where I should have my code  
>> repository.  Originally, I was going to use Bioperl-dev, but it was  
>> brought to my attention that that repository does not normally  
>> receive daily updates and it might not be the right place for my day  
>> to day development.  An alternative would be to use something like  
>> google code on a daily basis and commit to Bioperl-dev on a weekly  
>> basis.
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 13:06:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:06:44 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>


On Jun 17, 2009, at 8:25 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>
>> Elia,
>>
>> As Mark indicated, we recently discussed the lack of support for  
>> next-gen on
>> list, at least re: fastq.  I may be hit with the same thing in a  
>> few months
>> time myself, and I recall Jason and a few others also mentioning  
>> the same.
>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  
>> modules but
>> I don't believe it has been committed to trunk yet, so maybe he can  
>> answer.
>>
>> From prior discussions IIRC the issues were:
>>
>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
>> Illumina
>> 1.3) from one another (so maybe some optional validation), and
>
> Following the python rule of thumb for being explicit, Biopython makes
> the user specify which FASTQ variant is being used. I don't think you
> can do anything else. Any attempted validation would have to be
> heuristic based on the ASCII characters found, and would risk false
> positive warnings.

Right; I'm thinking along the same lines.  If anything the most we  
would allow is some level of validation, so if there were a degree of  
uncertainty about the format one could set a validation flag to check  
bounds during the parse and warn if they are exceeded.

>> 2) having a way for the Seq object to either 'know' what format is
>> contained, or we use phred score and convert back and forth from  
>> that (I
>> think the latter makes more sense).
>
> I think it could make sense for BioPerl to convert Solexa scores to/ 
> from
> PHRED scores on the fly (especially now that Illumina is abandoning
> the Solexa score system). Python style tries to avoid implicit  
> conversions,
> so Biopython doesn't automatically do a conversion from Solexa to
> PHRED scores on parsing (but will on writing if the requested output
> format requires this).
>
>> Peter's suggestions also are reasonable, though does biopython have a
>> separate module for each of these variations?  Our version (I  
>> believe)
>> mainly varied the conversion within Bio::SeqIO::fastq itself based  
>> on the
>> fastq variant passed in as a separate named argument.
>
> Biopython's SeqIO gives the three FASTQ variants their own unique
> names. This format name is a required argument for parsing/writing
> (we don't try and guess the file format from the data contents).  
> Internally
> we have three separate FASTQ parsers/writers although they do share
> code.

We could easily do the same if others agree.  Actually, if we  
specified that shorthand for a variant on a format would be designated  
as -format => 'format-variant', I think we could easily hack SeqIO to  
deal with that by splitting on '-' and passing everything to the  
constructor as (-format => 'format', -variant => 'variant').  Very  
little repeated code in this case, just an additional named parameter  
indicating the format variant (and the SeqIO class can do the type  
checking on that within the constructor).

> Other issues to keep in mind:
>
> (3) There should be no warning parsing files where the optional  
> repeated
> title is missing on the "+" lines (as discussed earlier on the  
> BioPerl list).

Agreed, though we'll have to check the current fastq parser to see if  
that's currently the case.  I thought that was fixed but maybe not?

> (4) When writing FASTQ files should BioPerl omit the optional repeated
> title on the "+" line? Biopython omits this as I understand this to be
> common practice, and can make a big different to file sizes -  
> especially
> on short read data from Solexa/Illumina.

Agreed, particularly if it's commonly encountered.

> (5) Also test reading and writing files with an optional description  
> (as well
> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  
> examples,
> e.g.
>
> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

Should be easy enough to implement with a simple regex.

> (6) Test reading and writing files where the encoded quality string  
> starts
> with a "@" or a "+" character, e.g.
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>
> Peter

Mark, getting all that? ;>

chris


From cjfields at illinois.edu  Wed Jun 17 13:09:54 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:09:54 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>


On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

The key issues affecting speed in bioperl are contained object  
instantiation and inheritance (and between those two, the latter much  
more so as it plays a role with contained objects as well as the  
container).

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

Moose/Perl6 roles/traits are one way around that issue, but we are a  
ways off from getting that running.  I think to get that working  
decently would be a from-ground-up endeavor (see my past posts on  
biomoose/bioperl6).

> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan

The quality objects themselves I don't think are that heavy; I think  
the main impediment is inheritance.  One could get around that a bit  
by using a direct_new method to create a blessed hash directly, then  
reimplement methods to lazily create any objects contained on the fly.

chris


From bill at genenformics.com  Wed Jun 17 13:03:16 2009
From: bill at genenformics.com (bill at genenformics.com)
Date: Wed, 17 Jun 2009 10:03:16 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com>

Hopefully this is helpful.

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648

Bill at genenformics

> Warning: This is very ugly code and makes a few assumptions, such as the
> alignment objects are stored in order of their start position. I made
> this assumption as that is how I put them into the object to begin with.
>
> =head1 C<slice>
>
> Function to slice up an alignment sequence based on start and end
> parameters
> and returns a new alignment object.
>
> slice($alignment, $start, $end)
>
> =cut
>
> sub slice
> {
> 	my ($alignment, $start, $end, $new_align) = @_;
>
> 	$$new_align = new Bio::SimpleAlign;
> 	print $$alignment->no_sequences() . "\n";
>
> 	$$new_align->add_seq(
> 			   new Bio::LocatableSeq(
> 				   -seq =>
> 					 substr(
>
> $$alignment->get_seq_by_pos(1)->seq(),
> 							$start - 1, $end
> - $start + 1
> 						   ),
> 				   -id    =>
> $$alignment->get_seq_by_pos(1)->display_id(),
> 				   -start =>
>
> max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
> 				   -end => min(
>
> $$alignment->get_seq_by_pos(1)->end - $start + 1,
> 							   $end - $start
> + 1
> 							  ),
> 				   -alphabet => 'dna',
> 				   -strand   =>
> $$alignment->get_seq_by_pos(1)->strand()
> 			   )
> 	);
>
> 	# implement a binary search to determine a decent offset into
> the alignment
> 	my $probe;
>
> 	if ($$alignment->no_sequences() <= 2) {
> 		$probe = $$alignment->no_sequences();
> 	}
> 	else {
> 	my ($L, $R) = (1, $$alignment->no_sequences());
> 	while (($R - $L) > 1)
> 	{
> 		$probe = floor(($R + $L) / 2);
>
> 		# gotta watch this.  Had the check backwards and so was
> never going
> 		# in the right direction for the search.  If I reverse
> these two
> 		# variables, then I have to either reverse the
> conditions or change
> 		# the > to a <.
> 		if ($$alignment->get_seq_by_pos($probe)->start() >
> $start)
> 		{
> 			$R = $probe;
> 		}
> 		else
> 		{
> 			$L = $probe;
> 		}
> 	}
> 	}
> 	# now go through the results that are after that point
> 	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
> 	{
> 		my $seq = $$alignment->get_seq_by_pos($i);
> 		last if ($seq->start() > $end);
>
> 		# Only concern ourselves with primers that land inside
> the desired region
> 		# other primers will show up in the image maps for each
> gene.
> 		if ($seq->start() >= $start && $seq->end() <= $end)
> 		{
>
> 			# values for the substr pullout of a given
> sequence
> 			my $offset = max($start - $seq->start(), 0);
> 			my $length =
> 			  min($end, $seq->end()) - max($start,
> $seq->start()) + 1;
> 			$$new_align->add_seq(
> 					 new Bio::LocatableSeq(
> 						 -seq   => $seq->seq(),
> 						 -id    =>
> $seq->display_id(),
> 						 -start =>
> max($seq->start - $start + 1, 1),
> 						 -end => min($seq->end -
> $start + 1, $end - $start + 1),
> 						 -alphabet => 'dna',
> 						 -strand   =>
> $seq->strand()
> 					 )
> 			);
> 		}
> 	}
> 	return 1;
> }
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Malcolm Cook
>> Sent: Tuesday, June 16, 2009 1:07 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Alignment->slice() issue?
>>
>> Kevin,
>>
>> I'm getting struck by this old issue you once coded around.
>>
>>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>>
>> Any chance you could share your implementation with  fellow
>> traveller...
>>
>> ??
>>
>> Thanks,
>>
>> Malcolm Cook
>> Stowers insitute for Medical research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Wed Jun 17 13:13:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 13:13:23 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>

I'm on the case! (but maybe not in realtime, today!)

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" 
<e.stupka at ucl.ac.uk>; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
Sent: Wednesday, June 17, 2009 1:06 PM
Subject: Re: [Bioperl-l] Next-gen modules


>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  wrote:
>>>
>>> Elia,
>>>
>>> As Mark indicated, we recently discussed the lack of support for  next-gen 
>>> on
>>> list, at least re: fastq.  I may be hit with the same thing in a  few months
>>> time myself, and I recall Jason and a few others also mentioning  the same.
>>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  modules 
>>> but
>>> I don't believe it has been committed to trunk yet, so maybe he can  answer.
>>>
>>> From prior discussions IIRC the issues were:
>>>
>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, 
>>> Illumina
>>> 1.3) from one another (so maybe some optional validation), and
>>
>> Following the python rule of thumb for being explicit, Biopython makes
>> the user specify which FASTQ variant is being used. I don't think you
>> can do anything else. Any attempted validation would have to be
>> heuristic based on the ASCII characters found, and would risk false
>> positive warnings.
>
> Right; I'm thinking along the same lines.  If anything the most we  would 
> allow is some level of validation, so if there were a degree of  uncertainty 
> about the format one could set a validation flag to check  bounds during the 
> parse and warn if they are exceeded.
>
>>> 2) having a way for the Seq object to either 'know' what format is
>>> contained, or we use phred score and convert back and forth from  that (I
>>> think the latter makes more sense).
>>
>> I think it could make sense for BioPerl to convert Solexa scores to/ from
>> PHRED scores on the fly (especially now that Illumina is abandoning
>> the Solexa score system). Python style tries to avoid implicit  conversions,
>> so Biopython doesn't automatically do a conversion from Solexa to
>> PHRED scores on parsing (but will on writing if the requested output
>> format requires this).
>>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations?  Our version (I  believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based  on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).  Internally
>> we have three separate FASTQ parsers/writers although they do share
>> code.
>
> We could easily do the same if others agree.  Actually, if we  specified that 
> shorthand for a variant on a format would be designated  as -format => 
> 'format-variant', I think we could easily hack SeqIO to  deal with that by 
> splitting on '-' and passing everything to the  constructor as (-format => 
> 'format', -variant => 'variant').  Very  little repeated code in this case, 
> just an additional named parameter  indicating the format variant (and the 
> SeqIO class can do the type  checking on that within the constructor).
>
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional  repeated
>> title is missing on the "+" lines (as discussed earlier on the  BioPerl 
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if  that's 
> currently the case.  I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes -  especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description  (as 
>> well
>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  examples,
>> e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string  starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From e.stupka at ucl.ac.uk  Wed Jun 17 13:49:38 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 18:49:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
Message-ID: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>

I would suggest developing the "standard" version first, then moving  
onto potential optimizations.

When we went through a similar argument in Ensembl about 8 years ago  
we ended up dropping Bio::Root completely...

If one is truly after performance for these large next-gen projects,  
it'd be down to pure piping, shell, and worrying about location and  
copying of files, sticking to systems-level as much as possible, and  
quite far from Bioperl altogether, so I think it's a whole different  
level of optimization issues, probably outside the scope of Bioperl.

Elia

On 17 Jun 2009, at 18:09, Chris Fields wrote:

>
> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>
>> Hello,
>> Regarding next-gen sequences and bioperl, following my
>> experience, another issue is bioperl speed. For example, if
>> you want to trim bad quality bases at ends of 1E6 Solexa
>> reads using Bio::SeqIO::fastq and some methods in
>> Bio::Seq::Quality, well, you've got to be patient (but may
>> be I missed some shortcuts...).
>
> The key issues affecting speed in bioperl are contained object  
> instantiation and inheritance (and between those two, the latter  
> much more so as it plays a role with contained objects as well as  
> the container).
>
> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>
> Moose/Perl6 roles/traits are one way around that issue, but we are a  
> ways off from getting that running.  I think to get that working  
> decently would be a from-ground-up endeavor (see my past posts on  
> biomoose/bioperl6).
>
>> A pure perl solution will be between 100 to 1000x faster...
>> Would it be possible to have an ultra-light quality object
>> with few simple methods for next-gen reads?
>>
>> I can contribute some tests if that sounds like an important
>> point.
>>
>> -Tristan
>
> The quality objects themselves I don't think are that heavy; I think  
> the main impediment is inheritance.  One could get around that a bit  
> by using a direct_new method to create a blessed hash directly, then  
> reimplement methods to lazily create any objects contained on the fly.
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 13:52:49 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:52:49 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>

I think this is a top priority for a fall BioPerl release, maybe 1.6.2  
(I am planning on a summer 1.6.1 release still).  Made it into a bug  
report for tracking:

http://bugzilla.open-bio.org/show_bug.cgi?id=2857

If no one works on this I may take it up after the 1.6.1 release.

chris

On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:

> I'm on the case! (but maybe not in realtime, today!)
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
> Sent: Wednesday, June 17, 2009 1:06 PM
> Subject: Re: [Bioperl-l] Next-gen modules
>
>
>>
>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>
>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>> Fields<cjfields at illinois.edu>  wrote:
>>>>
>>>> Elia,
>>>>
>>>> As Mark indicated, we recently discussed the lack of support for   
>>>> next-gen on
>>>> list, at least re: fastq.  I may be hit with the same thing in a   
>>>> few months
>>>> time myself, and I recall Jason and a few others also mentioning   
>>>> the same.
>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>> modules but
>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>> can  answer.
>>>>
>>>> From prior discussions IIRC the issues were:
>>>>
>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>> 1.0, Illumina
>>>> 1.3) from one another (so maybe some optional validation), and
>>>
>>> Following the python rule of thumb for being explicit, Biopython  
>>> makes
>>> the user specify which FASTQ variant is being used. I don't think  
>>> you
>>> can do anything else. Any attempted validation would have to be
>>> heuristic based on the ASCII characters found, and would risk false
>>> positive warnings.
>>
>> Right; I'm thinking along the same lines.  If anything the most we   
>> would allow is some level of validation, so if there were a degree  
>> of  uncertainty about the format one could set a validation flag to  
>> check  bounds during the parse and warn if they are exceeded.
>>
>>>> 2) having a way for the Seq object to either 'know' what format is
>>>> contained, or we use phred score and convert back and forth from   
>>>> that (I
>>>> think the latter makes more sense).
>>>
>>> I think it could make sense for BioPerl to convert Solexa scores  
>>> to/ from
>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>> the Solexa score system). Python style tries to avoid implicit   
>>> conversions,
>>> so Biopython doesn't automatically do a conversion from Solexa to
>>> PHRED scores on parsing (but will on writing if the requested output
>>> format requires this).
>>>
>>>> Peter's suggestions also are reasonable, though does biopython  
>>>> have a
>>>> separate module for each of these variations?  Our version (I   
>>>> believe)
>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>> based  on the
>>>> fastq variant passed in as a separate named argument.
>>>
>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>> names. This format name is a required argument for parsing/writing
>>> (we don't try and guess the file format from the data contents).   
>>> Internally
>>> we have three separate FASTQ parsers/writers although they do share
>>> code.
>>
>> We could easily do the same if others agree.  Actually, if we   
>> specified that shorthand for a variant on a format would be  
>> designated  as -format => 'format-variant', I think we could easily  
>> hack SeqIO to  deal with that by splitting on '-' and passing  
>> everything to the  constructor as (-format => 'format', -variant =>  
>> 'variant').  Very  little repeated code in this case, just an  
>> additional named parameter  indicating the format variant (and the  
>> SeqIO class can do the type  checking on that within the  
>> constructor).
>>
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional   
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the   
>>> BioPerl list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if  that's currently the case.  I thought that was fixed but maybe  
>> not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -   
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description  (as well
>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>> for  examples,
>>> e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string  starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 14:01:28 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:01:28 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
	<16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
Message-ID: <E0FAC5DB-470E-48E1-A30F-B64E2E63EB86@ucl.ac.uk>

If we reach a consensus on how/who/what, I will be happy to contribute  
some coding time in the coming days.

Would it be a good starting point to start adding the different  
formats as named in BioPython, and test support for reading/wrting  
them? I could start playing with that.

regards,

Elia

On 17 Jun 2009, at 18:52, Chris Fields wrote:

> I think this is a top priority for a fall BioPerl release, maybe  
> 1.6.2 (I am planning on a summer 1.6.1 release still).  Made it into  
> a bug report for tracking:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2857
>
> If no one works on this I may take it up after the 1.6.1 release.
>
> chris
>
> On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:
>
>> I'm on the case! (but maybe not in realtime, today!)
>>
>> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
>> >
>> To: "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
>> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
>> Sent: Wednesday, June 17, 2009 1:06 PM
>> Subject: Re: [Bioperl-l] Next-gen modules
>>
>>
>>>
>>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>>> Fields<cjfields at illinois.edu>  wrote:
>>>>>
>>>>> Elia,
>>>>>
>>>>> As Mark indicated, we recently discussed the lack of support  
>>>>> for  next-gen on
>>>>> list, at least re: fastq.  I may be hit with the same thing in  
>>>>> a  few months
>>>>> time myself, and I recall Jason and a few others also  
>>>>> mentioning  the same.
>>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>>> modules but
>>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>>> can  answer.
>>>>>
>>>>> From prior discussions IIRC the issues were:
>>>>>
>>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>>> 1.0, Illumina
>>>>> 1.3) from one another (so maybe some optional validation), and
>>>>
>>>> Following the python rule of thumb for being explicit, Biopython  
>>>> makes
>>>> the user specify which FASTQ variant is being used. I don't think  
>>>> you
>>>> can do anything else. Any attempted validation would have to be
>>>> heuristic based on the ASCII characters found, and would risk false
>>>> positive warnings.
>>>
>>> Right; I'm thinking along the same lines.  If anything the most  
>>> we  would allow is some level of validation, so if there were a  
>>> degree of  uncertainty about the format one could set a validation  
>>> flag to check  bounds during the parse and warn if they are  
>>> exceeded.
>>>
>>>>> 2) having a way for the Seq object to either 'know' what format is
>>>>> contained, or we use phred score and convert back and forth  
>>>>> from  that (I
>>>>> think the latter makes more sense).
>>>>
>>>> I think it could make sense for BioPerl to convert Solexa scores  
>>>> to/ from
>>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>>> the Solexa score system). Python style tries to avoid implicit   
>>>> conversions,
>>>> so Biopython doesn't automatically do a conversion from Solexa to
>>>> PHRED scores on parsing (but will on writing if the requested  
>>>> output
>>>> format requires this).
>>>>
>>>>> Peter's suggestions also are reasonable, though does biopython  
>>>>> have a
>>>>> separate module for each of these variations?  Our version (I   
>>>>> believe)
>>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>>> based  on the
>>>>> fastq variant passed in as a separate named argument.
>>>>
>>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>>> names. This format name is a required argument for parsing/writing
>>>> (we don't try and guess the file format from the data contents).   
>>>> Internally
>>>> we have three separate FASTQ parsers/writers although they do share
>>>> code.
>>>
>>> We could easily do the same if others agree.  Actually, if we   
>>> specified that shorthand for a variant on a format would be  
>>> designated  as -format => 'format-variant', I think we could  
>>> easily hack SeqIO to  deal with that by splitting on '-' and  
>>> passing everything to the  constructor as (-format => 'format', - 
>>> variant => 'variant').  Very  little repeated code in this case,  
>>> just an additional named parameter  indicating the format variant  
>>> (and the SeqIO class can do the type  checking on that within the  
>>> constructor).
>>>
>>>> Other issues to keep in mind:
>>>>
>>>> (3) There should be no warning parsing files where the optional   
>>>> repeated
>>>> title is missing on the "+" lines (as discussed earlier on the   
>>>> BioPerl list).
>>>
>>> Agreed, though we'll have to check the current fastq parser to see  
>>> if  that's currently the case.  I thought that was fixed but maybe  
>>> not?
>>>
>>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>>> repeated
>>>> title on the "+" line? Biopython omits this as I understand this  
>>>> to be
>>>> common practice, and can make a big different to file sizes -   
>>>> especially
>>>> on short read data from Solexa/Illumina.
>>>
>>> Agreed, particularly if it's commonly encountered.
>>>
>>>> (5) Also test reading and writing files with an optional  
>>>> description  (as well
>>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>>> for  examples,
>>>> e.g.
>>>>
>>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>>
>>> Should be easy enough to implement with a simple regex.
>>>
>>>> (6) Test reading and writing files where the encoded quality  
>>>> string  starts
>>>> with a "@" or a "+" character, e.g.
>>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>>
>>>> Peter
>>>
>>> Mark, getting all that? ;>
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From tristan.lefebure at gmail.com  Wed Jun 17 14:09:42 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 14:09:42 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <200906171409.42558.tristan.lefebure@gmail.com>

Thanks both for the light.

That probably means that the place bioperl will take in the 
handling of the next-gen sequencing raw data (i.e. reads) is 
very limited, nope? (at least until bioperl6). A single GA2 
solexa lane generates about 9 million reads, and I would 
really not called that a big project...

BTW, is there a simple way to see object instantiation and 
inheritance, as well as time consumption for each, when once 
calls next_seq() (or any other method)?

-Tristan

On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
> I would suggest developing the "standard" version first,
> then moving onto potential optimizations.
>
> When we went through a similar argument in Ensembl about
> 8 years ago we ended up dropping Bio::Root completely...
>
> If one is truly after performance for these large
> next-gen projects, it'd be down to pure piping, shell,
> and worrying about location and copying of files,
> sticking to systems-level as much as possible, and quite
> far from Bioperl altogether, so I think it's a whole
> different level of optimization issues, probably outside
> the scope of Bioperl.
>
> Elia
>
> On 17 Jun 2009, at 18:09, Chris Fields wrote:
> > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
> >> Hello,
> >> Regarding next-gen sequences and bioperl, following my
> >> experience, another issue is bioperl speed. For
> >> example, if you want to trim bad quality bases at ends
> >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
> >> methods in Bio::Seq::Quality, well, you've got to be
> >> patient (but may be I missed some shortcuts...).
> >
> > The key issues affecting speed in bioperl are contained
> > object instantiation and inheritance (and between those
> > two, the latter much more so as it plays a role with
> > contained objects as well as the container).
> >
> > http://www.bioperl.org/wiki/Why_BioPerl_is_slow
> >
> > Moose/Perl6 roles/traits are one way around that issue,
> > but we are a ways off from getting that running.  I
> > think to get that working decently would be a
> > from-ground-up endeavor (see my past posts on
> > biomoose/bioperl6).
> >
> >> A pure perl solution will be between 100 to 1000x
> >> faster... Would it be possible to have an ultra-light
> >> quality object with few simple methods for next-gen
> >> reads?
> >>
> >> I can contribute some tests if that sounds like an
> >> important point.
> >>
> >> -Tristan
> >
> > The quality objects themselves I don't think are that
> > heavy; I think the main impediment is inheritance.  One
> > could get around that a bit by using a direct_new
> > method to create a blessed hash directly, then
> > reimplement methods to lazily create any objects
> > contained on the fly.
> >
> > chris
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801


From bix at sendu.me.uk  Wed Jun 17 14:20:00 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 19:20:00 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <4A3933D0.4040808@sendu.me.uk>

Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my 
> experience, another issue is bioperl speed. For example, if 
> you want to trim bad quality bases at ends of 1E6 Solexa 
> reads using Bio::SeqIO::fastq and some methods in 
> Bio::Seq::Quality, well, you've got to be patient (but may 
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant 
set of users out there who are dealing with next-gen sequencing and 
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at 
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster... 
> Would it be possible to have an ultra-light quality object 
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the 
speedup is to not create any Bio::Seq* objects but just return the data 
directly. At that point it's not taking much advantage of BioPerl. But 
certainly it could be done...


From e.stupka at ucl.ac.uk  Wed Jun 17 14:39:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:39:08 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk>

We are using bioperl for simple pre and post-processing of data for  
full Solexa runs, and although it might not be ideal, the scripting  
with Bioperl is not a major killer. When I was referring to large,  
heavy pipelines I was thinking of pipelines that deal with many Solexa  
runs as one project (e.g. 1000 genomes) who really cannot afford any  
bottleneck in their pipelines, because that affects directly their  
storage.

cheers

Elia


On 17 Jun 2009, at 19:09, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...
>
> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan
>
> On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
>> I would suggest developing the "standard" version first,
>> then moving onto potential optimizations.
>>
>> When we went through a similar argument in Ensembl about
>> 8 years ago we ended up dropping Bio::Root completely...
>>
>> If one is truly after performance for these large
>> next-gen projects, it'd be down to pure piping, shell,
>> and worrying about location and copying of files,
>> sticking to systems-level as much as possible, and quite
>> far from Bioperl altogether, so I think it's a whole
>> different level of optimization issues, probably outside
>> the scope of Bioperl.
>>
>> Elia
>>
>> On 17 Jun 2009, at 18:09, Chris Fields wrote:
>>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my
>>>> experience, another issue is bioperl speed. For
>>>> example, if you want to trim bad quality bases at ends
>>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
>>>> methods in Bio::Seq::Quality, well, you've got to be
>>>> patient (but may be I missed some shortcuts...).
>>>
>>> The key issues affecting speed in bioperl are contained
>>> object instantiation and inheritance (and between those
>>> two, the latter much more so as it plays a role with
>>> contained objects as well as the container).
>>>
>>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>>>
>>> Moose/Perl6 roles/traits are one way around that issue,
>>> but we are a ways off from getting that running.  I
>>> think to get that working decently would be a
>>> from-ground-up endeavor (see my past posts on
>>> biomoose/bioperl6).
>>>
>>>> A pure perl solution will be between 100 to 1000x
>>>> faster... Would it be possible to have an ultra-light
>>>> quality object with few simple methods for next-gen
>>>> reads?
>>>>
>>>> I can contribute some tests if that sounds like an
>>>> important point.
>>>>
>>>> -Tristan
>>>
>>> The quality objects themselves I don't think are that
>>> heavy; I think the main impediment is inheritance.  One
>>> could get around that a bit by using a direct_new
>>> method to create a blessed hash directly, then
>>> reimplement methods to lazily create any objects
>>> contained on the fly.
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 14:40:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 13:40:05 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu>

On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...

I don't think it's impossible.  If you parse any very long list of  
sequences in order it will be very slow, yes, but if they were indexed  
or loaded into a DB lookups would of course be magnitudes faster.

We already have perl-based indexing for fastq (Bio::Index::Fastq), so  
maybe something could be built on top of that. I haven't looked but we  
can also wrap other C/C++-based parsers as well. BioLib, for instance,  
has bindings to io_lib, so maybe that could be (ab)used in some way.

> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan

As a simple benchmark, at one point all feature tag information was  
converted into Bio::Annotations.  I reverted that behavior to be  
simple tag/value again and had a pretty decent bump:

http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark

Also, I tried reimplementing some parsers as generic 'event'-based  
driver/handler and they were slightly faster, the key roadblock being  
instantation again.  If I didn't create Features/Annotations I saw a  
significant speedup.  That's not entirely unexpected, as SeqFeatures  
also contain Locations (in turn that can contain subLocations) and  
(until recently) tag-based Bio::Annotation by default.  Annotations  
are collected in an Annotation::Collection and can contain other  
objects I believe (Ontology terms, etc).

The overall lesson is, if you don't have very heavy objects being  
created the overhead is actually quite small; it's only when you  
greedily instantiate everything that you run into problems.

chris


From cjfields at illinois.edu  Wed Jun 17 15:05:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:05:03 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <E92652A7-7622-4183-8DC3-596E6593C587@illinois.edu>

On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote:

> I would suggest developing the "standard" version first, then moving  
> onto potential optimizations.

Yes, agreed.

> When we went through a similar argument in Ensembl about 8 years ago  
> we ended up dropping Bio::Root completely...

They (strangely enough) still use it in a few modules and require  
bioperl 1.2.3, but (in my experience) the latest bioperl works just  
fine.  I asked about that and never got a response.

> If one is truly after performance for these large next-gen projects,  
> it'd be down to pure piping, shell, and worrying about location and  
> copying of files, sticking to systems-level as much as possible, and  
> quite far from Bioperl altogether, so I think it's a whole different  
> level of optimization issues, probably outside the scope of Bioperl.
>
> Elia

In the end I don't think we can run it using perl alone, no, and I  
believe using BioPerl by itself will not be the optimal solution, but  
it can probably interface with something that is.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 15:14:04 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:14:04 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk>

Excellent, I was thinking of working on Maq and BowTie as priorities.

Elia

On 17 Jun 2009, at 14:28, John Marshall wrote:

> On 17 Jun 2009, at 12:29, Elia Stupka wrote:
>> Similarly, there seems to be little in bioperl-run to support tools  
>> that have been developed in this area, such as Maq, BowTie, TopHat,  
>> etc?
>
> FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to  
> submit in the not too distant future.  (First it needs some "blah  
> blah" replaced with actual documentation and a test suite.)
>
> Cheers,
>
>    John
>
> [1] http://www.ebi.ac.uk/~zerbino/velvet/
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From michael.watson at bbsrc.ac.uk  Wed Jun 17 15:15:20 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 17 Jun 2009 20:15:20 +0100
Subject: [Bioperl-l] Next-gen modules
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk>

In answer to your question, yes!  We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results.  This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java.  Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem.

 
________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala
Sent: Wed 17/06/2009 7:20 PM
To: tristan.lefebure at gmail.com
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Next-gen modules


Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 17 15:30:15 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:30:15 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>

On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> Hello,
>> Regarding next-gen sequences and bioperl, following my experience,  
>> another issue is bioperl speed. For example, if you want to trim  
>> bad quality bases at ends of 1E6 Solexa reads using  
>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>> you've got to be patient (but may be I missed some shortcuts...).
>
> This is my concern as well. Or, rather, is there actually a  
> significant set of users out there who are dealing with next-gen  
> sequencing and would consider using BioPerl for their work?
>
> I'm working with all the 1000-genomes data at the Sanger, and we at  
> least are probably never going to use BioPerl for the work.

Are you using pure perl or (gasp) something else?  ;>

Judging by the feedback there are definitely a set of users who would  
like to integrate nextgen into bioperl somehow, probably to take  
advantage of other aspects of bioperl.

>> A pure perl solution will be between 100 to 1000x faster... Would  
>> it be possible to have an ultra-light quality object with few  
>> simple methods for next-gen reads?
>
> The fastq parser itself already seems pretty fast. The way to get  
> the speedup is to not create any Bio::Seq* objects but just return  
> the data directly. At that point it's not taking much advantage of  
> BioPerl. But certainly it could be done...


I suppose the best way to assess what needs to be done is come up with  
a set of 'use cases' specifying what users want so we can design  
around them, otherwise we're shooting in the dark.

I'm personally wondering if this could be done as a sequence database,  
something similar in theme to Lincoln's SeqFeature::Store, but  
sequence only, and returns quality objects in a similar manner (ala  
Storable)?  Not sure whether that's feasible, but it's appears at  
least scalable.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 15:37:26 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:37:26 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<4C3D793879C64A5E84C67FE313C86FA4@NewLife>
Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>

Dear all,

I tried to summarize today's discussion with what seems to be the  
"shaping consensus" on the Wiki page:

http://www.bioperl.org/wiki/Nextgen_in_Bioperl

good night,

Elia


On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:

> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>  ]
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From e.stupka at ucl.ac.uk  Wed Jun 17 16:06:35 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:06:35 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>

Interesting that you mention the database issue. We found that for  
specific memory/CPU intenstive things we also switch to using dbs. For  
example, after many years of loyal use of disconnected_ranges we  
switched to a simple SQL implementation of it, because of the large  
performance gains it would give us.  Similarly in Ensembl as well as  
in the old days of bioperl-db we opted for doing subseq within SQL  
where possible.

Some lean way of SQL'izing specific components could be less  
"disruptive" than avoiding object creation and provide significant  
gains in performance. Could be set as an optional flag, and could use  
temporary ad hoc SQL databases?

Still, priority now is to make SeqIO compliant with all those formats,  
than we can worry about performance :)

Elia

On 17 Jun 2009, at 20:30, Chris Fields wrote:

> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience,  
>>> another issue is bioperl speed. For example, if you want to trim  
>>> bad quality bases at ends of 1E6 Solexa reads using  
>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>> you've got to be patient (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a  
>> significant set of users out there who are dealing with next-gen  
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at  
>> least are probably never going to use BioPerl for the work.
>
> Are you using pure perl or (gasp) something else?  ;>
>
> Judging by the feedback there are definitely a set of users who  
> would like to integrate nextgen into bioperl somehow, probably to  
> take advantage of other aspects of bioperl.
>
>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>> it be possible to have an ultra-light quality object with few  
>>> simple methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get  
>> the speedup is to not create any Bio::Seq* objects but just return  
>> the data directly. At that point it's not taking much advantage of  
>> BioPerl. But certainly it could be done...
>
>
> I suppose the best way to assess what needs to be done is come up  
> with a set of 'use cases' specifying what users want so we can  
> design around them, otherwise we're shooting in the dark.
>
> I'm personally wondering if this could be done as a sequence  
> database, something similar in theme to Lincoln's SeqFeature::Store,  
> but sequence only, and returns quality objects in a similar manner  
> (ala Storable)?  Not sure whether that's feasible, but it's appears  
> at least scalable.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:29:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:29:31 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife>
	<540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife>

Thanks Elia for those wiki notes--
[I would say you received an enthusiatic 'welcome back'!]
cheers, 
Mark
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 3:37 PM
Subject: Re: [Bioperl-l] Next-gen modules


> Dear all,
> 
> I tried to summarize today's discussion with what seems to be the  
> "shaping consensus" on the Wiki page:
> 
> http://www.bioperl.org/wiki/Nextgen_in_Bioperl
> 
> good night,
> 
> Elia
> 
> 
> On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:
> 
>> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>>  ]
>> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 17, 2009 7:29 AM
>> Subject: [Bioperl-l] Next-gen modules
>>
>>
>>> Dear all,
>>> after several years of absence I am slowly coming back to Bioperl,  
>>> and  hope to contribute again to its development.
>>> One area that I was thinking of starting from, since we are  
>>> actively  involved with it, is to improve BIoperl's support fo next- 
>>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>>> on a  lot of recent developments, do let me know if/what is useful.
>>> One example that comes to mind is that the conversion of various   
>>> formats to/from FASTQ does not seem to be supported. Some code can  
>>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>>> fq_all2std.pl but it would be good if it could make its way into   
>>> SeqIO? And similarly, potentially, for other next-gen sequence  
>>> formats?
>>> Similarly, there seems to be little in bioperl-run to support  
>>> tools  that have been developed in this area, such as Maq, BowTie,  
>>> TopHat, etc?
>>> Do let me know if there is a past thread on this, or other people   
>>> actively developing, etc. so that I can find out what priorities are.
>>> thanks and best regards to all (old friends and new),
>>> Elia
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 16:35:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 15:35:38 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>

So, #1 priority is to get fastq up-to-speed, then maybe assess other  
options.

Illuminating discussion, thanks Elia!

urgh, excuse unintended bad pun above...

chris

On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Interesting that you mention the database issue. We found that for  
> specific memory/CPU intenstive things we also switch to using dbs.  
> For example, after many years of loyal use of disconnected_ranges we  
> switched to a simple SQL implementation of it, because of the large  
> performance gains it would give us.  Similarly in Ensembl as well as  
> in the old days of bioperl-db we opted for doing subseq within SQL  
> where possible.
>
> Some lean way of SQL'izing specific components could be less  
> "disruptive" than avoiding object creation and provide significant  
> gains in performance. Could be set as an optional flag, and could  
> use temporary ad hoc SQL databases?
>
> Still, priority now is to make SeqIO compliant with all those  
> formats, than we can worry about performance :)
>
> Elia
>
> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>>
>> Are you using pure perl or (gasp) something else?  ;>
>>
>> Judging by the feedback there are definitely a set of users who  
>> would like to integrate nextgen into bioperl somehow, probably to  
>> take advantage of other aspects of bioperl.
>>
>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>>
>>
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>>
>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 16:36:31 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:36:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>

Better than colorspaced discussions for sure ;)

Elia

On 17 Jun 2009, at 21:35, Chris Fields wrote:

> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
>
> Illuminating discussion, thanks Elia!
>
> urgh, excuse unintended bad pun above...
>
> chris
>
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges  
>> we switched to a simple SQL implementation of it, because of the  
>> large performance gains it would give us.  Similarly in Ensembl as  
>> well as in the old days of bioperl-db we opted for doing subseq  
>> within SQL where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>> Would it be possible to have an ultra-light quality object with  
>>>>> few simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just  
>>>> return the data directly. At that point it's not taking much  
>>>> advantage of BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:54:00 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife>

unintended! Does that mean your delete key's broke...?
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Elia Stupka" <e.stupka at ucl.ac.uk>
Cc: <bioperl-l at lists.open-bio.org>; <tristan.lefebure at gmail.com>
Sent: Wednesday, June 17, 2009 4:35 PM
Subject: Re: [Bioperl-l] Next-gen modules


> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
> 
> Illuminating discussion, thanks Elia!
> 
> urgh, excuse unintended bad pun above...
> 
> chris
> 
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
> 
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges we  
>> switched to a simple SQL implementation of it, because of the large  
>> performance gains it would give us.  Similarly in Ensembl as well as  
>> in the old days of bioperl-db we opted for doing subseq within SQL  
>> where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>>> it be possible to have an ultra-light quality object with few  
>>>>> simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just return  
>>>> the data directly. At that point it's not taking much advantage of  
>>>> BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hartzell at alerce.com  Wed Jun 17 16:40:03 2009
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 17 Jun 2009 13:40:03 -0700
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com>

Sendu Bala writes:
 > Tristan Lefebure wrote:
 > > Hello,
 > > Regarding next-gen sequences and bioperl, following my 
 > > experience, another issue is bioperl speed. For example, if 
 > > you want to trim bad quality bases at ends of 1E6 Solexa 
 > > reads using Bio::SeqIO::fastq and some methods in 
 > > Bio::Seq::Quality, well, you've got to be patient (but may 
 > > be I missed some shortcuts...).
 > 
 > This is my concern as well. Or, rather, is there actually a significant 
 > set of users out there who are dealing with next-gen sequencing and 
 > would consider using BioPerl for their work?
 > 
 > I'm working with all the 1000-genomes data at the Sanger, and we at 
 > least are probably never going to use BioPerl for the work.
 > [...]

Is it purely a speed issue, or are there other issues (e.g. stability,
correctness, compatibility) that are contributing to your decision?

What *are* you using?

g.


From bix at sendu.me.uk  Wed Jun 17 18:10:57 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:10:57 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <4A3969F1.8080002@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience, 
>>> another issue is bioperl speed. For example, if you want to trim bad 
>>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and 
>>> some methods in Bio::Seq::Quality, well, you've got to be patient 
>>> (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a 
>> significant set of users out there who are dealing with next-gen 
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at 
>> least are probably never going to use BioPerl for the work.
> 
> Are you using pure perl or (gasp) something else?  ;>

We use some perl stuff, some C stuff. My own stuff is OO perl, but much 
lighter weight than BioPerl. Absolute minimal object creation.


>>> A pure perl solution will be between 100 to 1000x faster... Would it 
>>> be possible to have an ultra-light quality object with few simple 
>>> methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get the 
>> speedup is to not create any Bio::Seq* objects but just return the 
>> data directly. At that point it's not taking much advantage of 
>> BioPerl. But certainly it could be done...
> 
> I suppose the best way to assess what needs to be done is come up with a 
> set of 'use cases' specifying what users want so we can design around 
> them, otherwise we're shooting in the dark.

Indeed. Though at least I think we can all agree it would be nice to 
have the functionality there even if it's slow. There will always be at 
least some use-cases where the run speed doesn't matter.


> I'm personally wondering if this could be done as a sequence database, 
> something similar in theme to Lincoln's SeqFeature::Store, but sequence 
> only, and returns quality objects in a similar manner (ala Storable)?  
> Not sure whether that's feasible, but it's appears at least scalable.

I think not. Well, at least SeqFeature::Store doesn't scale. Try storing 
millions of features in a database and watch it crawl to complete 
unusability. I can't imagine a db scaling to holding hundreds of TB of 
data either. I'm also not sure what the benefit is. There are already 
high-speed ways of indexing your fastq or bam files.


From bix at sendu.me.uk  Wed Jun 17 18:24:50 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:24:50 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
Message-ID: <4A396D32.5070909@sendu.me.uk>

George Hartzell wrote:
> Sendu Bala writes:
>  > Tristan Lefebure wrote:
>  > > Hello,
>  > > Regarding next-gen sequences and bioperl, following my 
>  > > experience, another issue is bioperl speed. For example, if 
>  > > you want to trim bad quality bases at ends of 1E6 Solexa 
>  > > reads using Bio::SeqIO::fastq and some methods in 
>  > > Bio::Seq::Quality, well, you've got to be patient (but may 
>  > > be I missed some shortcuts...).
>  > 
>  > This is my concern as well. Or, rather, is there actually a significant 
>  > set of users out there who are dealing with next-gen sequencing and 
>  > would consider using BioPerl for their work?
>  > 
>  > I'm working with all the 1000-genomes data at the Sanger, and we at 
>  > least are probably never going to use BioPerl for the work.
>  > [...]
> 
> Is it purely a speed issue, or are there other issues (e.g. stability,
> correctness, compatibility) that are contributing to your decision?

Too heavy-weight, too slow, too memory intensive, missing too much 
functionality in any case. If I have to write new parsers and wrappers, 
I may as well make them fast (which means they don't "fit" into BioPerl).


> What *are* you using?

There are already great tools written in C that do all the heavy lifting 
and the rest is done in perl written for speed and low memory.


From cjfields at illinois.edu  Wed Jun 17 18:38:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 17:38:26 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3969F1.8080002@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>

On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>> Are you using pure perl or (gasp) something else?  ;>
>
> We use some perl stuff, some C stuff. My own stuff is OO perl, but  
> much lighter weight than BioPerl. Absolute minimal object creation.

Makes sense.

>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>
> Indeed. Though at least I think we can all agree it would be nice to  
> have the functionality there even if it's slow. There will always be  
> at least some use-cases where the run speed doesn't matter.

Agreed.

>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>
> I think not. Well, at least SeqFeature::Store doesn't scale. Try  
> storing millions of features in a database and watch it crawl to  
> complete unusability. I can't imagine a db scaling to holding  
> hundreds of TB of data either. I'm also not sure what the benefit  
> is. There are already high-speed ways of indexing your fastq or bam  
> files.

Interesting that you ran into issues with SF::Store; wonder if object  
storage is the limiting factor there, or if it is something else.  
Anyone else having this issue?

chris


From cjfields at illinois.edu  Wed Jun 17 21:08:55 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 20:08:55 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A396D32.5070909@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
	<4A396D32.5070909@sendu.me.uk>
Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu>

On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris


From jbarrick at msu.edu  Wed Jun 17 23:10:43 2009
From: jbarrick at msu.edu (Jeffrey Barrick)
Date: Wed, 17 Jun 2009 23:10:43 -0400
Subject: [Bioperl-l] svn error
Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu>

Hi all,

I've been trying to download the latest version of "bioperl-live"  
through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion 
] and I keep getting an "svn: Found malformed header in revision file"  
error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to  
stop prematurely.

I also get the error when trying to browse that directory, for example:
http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB

Any ideas?

Thanks,
   --Jeff


From hlapp at gmx.net  Wed Jun 17 21:51:16 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 17 Jun 2009 20:51:16 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <C8873056-793B-4FEE-94EE-3341087478D1@gmx.net>


On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Similarly in Ensembl as well as in the old days of bioperl-db we  
> opted for doing subseq within SQL where possible.


BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL,  
unless you manipulate the sequence, or make it a non-persistent object.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Jun 18 02:45:17 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 18 Jun 2009 07:45:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
	<550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
Message-ID: <4A39E27D.9040807@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:
 >
>>> I'm personally wondering if this could be done as a sequence 
>>> database, something similar in theme to Lincoln's SeqFeature::Store, 
>>> but sequence only, and returns quality objects in a similar manner 
>>> (ala Storable)?  Not sure whether that's feasible, but it's appears 
>>> at least scalable.
>>
>> I think not. Well, at least SeqFeature::Store doesn't scale. Try 
>> storing millions of features in a database and watch it crawl to 
>> complete unusability. I can't imagine a db scaling to holding hundreds 
>> of TB of data either. I'm also not sure what the benefit is. There are 
>> already high-speed ways of indexing your fastq or bam files.
> 
> Interesting that you ran into issues with SF::Store; wonder if object 
> storage is the limiting factor there, or if it is something else.

Object storage certainly was an issue, which is why I patched it to 
(optionally) not store objects. That helped a great deal, but ultimately 
only increased the number of features you could store before it slowed 
down; it didn't solve the problem completely.


From Xianjun.Dong at bccs.uib.no  Thu Jun 18 06:15:47 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Thu, 18 Jun 2009 12:15:47 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4A33D850.1020203@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no>
Message-ID: <4A3A13D3.7050208@ii.uib.no>

Hi, Scott,

Do you mind to have a look of the code (below my signature) if I use the 
-postgrid callback correctly?
I still cannnot get the background for the whole panel.

Thanks

Xianjun


Xianjun Dong wrote:
> Hi, Scott
>
> Before I gave up my own whole solution to use GBrowse, I still want to 
> bother you once:
>
> As you suggested, I put -postgrid option when the panel, which will 
> call a function to draw the background. The code below is almost 
> copied from the online POD of Bio::Graphics::Panel (see 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
> )
>
> But it still does not work. Could you help to have a look? I paste it 
> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while 
> the gap drawing function is gap_it, not draw_gap. I guess it's a typo. 
> or not?)
>
> THanks
>
> Xianjun
>
> ----------------------------------------------- mytestcode.pl 
> --------------------------
>
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 = 
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = 
> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = 
> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans4 = 
> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans5 = 
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans  = 
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 = 
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
> -source=>'a');
> my $trans41 = 
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>                                             -length=>1050,
>                                             -start =>0,
>                                             -pad_left=>12,
>                                             -pad_right=>12
>                                             -postgrid=>\&gap_it);
>
> sub gap_it {
>     my $gd    = shift;
>     my $panel = shift;
>     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>     my $top                  = $panel->top;
>     my $bottom               = $gd->height, #panel->bottom;
>     my $gray                 = $panel->translate_color('red');
>     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
> }
> # the following track works as I expected in bioperl 1.2.3, but not in 
> 1.5 and 1.6
> #$panel->add_track([$trans41,$trans31],
> #          -glyph   => 'background',
> #                  -block_bgcolor => sub{return (shift->source eq 
> 'a')?'#cccccc':'#fffc22'},
> #                  );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>                  -glyph=>'arrow',
>                  -double=>1,
>                  -tick=>2);
>
> $panel->add_track($trans,
>          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>                  -fgcolor => 'darkred',
>                  -bgcolor => 'darkred',
>                  -title => '$source',
>                  -link => 
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
> #EnsEMBL
>                  );
>   print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in 
> Bioperl 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
>
>
>
>
>
>
>
>
>
> Scott Cain wrote:
>> Hi Xianjun,
>>
>> I understand what you want to do, as the current version of gbrowse
>> does this, which uses bioperl 1.6.  Without digging through the code,
>> I can't tell you exactly how this works and you didn't send your code
>> that uses this callback, so I can't try it either.
>>
>> One thing that is different between your code and gbrowse is that each
>> of the tracks is actually a seperate panel (to allow track dragging),
>> so it possible that this sort of callback doesn't work for
>> Bio::Graphics any more.
>>
>> Scott
>>
>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> 
>> wrote:
>>  
>>> Hi, Scott
>>>
>>> Thanks for your reply first.
>>>
>>> I still have question: I dig out the code from GBrowse (which I 
>>> paste below). Method make_postgrid_callback gets all highlight 
>>> region and then use hilite_regions_closure function to draw them 
>>> out, using the following GD function:
>>>
>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>
>>> where the $bottom=$panel->bottom. This is the only difference from 
>>> my code, where I use $gd->height. I guess they are almost same 
>>> (except the pad_bottom), we can see this in the code of 
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 
>>>
>>>
>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, 
>>> for my highlight regions. The output is same, when using the library 
>>> of Bioperl 1.6 (or 1.5). You can see the attached image 
>>> ("test.bioperl1.6.png")
>>>
>>> OK. I might have not explained my question explicitly. My question 
>>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 
>>> 1.2.3), I can get the right image I want (see the attached file 
>>> "test.bioperl1.2.3.png"), where the highlight range will go from the 
>>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
>>> highlight region in its own track, not the whole panel. OK, did I 
>>> explain clearly now? you can see the difference of the two images.
>>>
>>> [I am not sure the mailist allow to attach image, otherwise, I put 
>>> them in the following links:
>>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>>> test.bioperl1.2.3.png:    
>>> http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>
>>> You can test it and see the difference if you have both 1.2.3 and 
>>> 1.6 on your computer?
>>>
>>> Really want to know how this works in bioperl 1.2.3 (Even though 
>>> this might be a bug at that version, or whatever)
>>>
>>> Thanks
>>>
>>> Xianjun
>>> =============================================
>>>
>>> # this generates the callback for highlighting a region
>>> sub make_postgrid_callback {
>>>  my $settings = shift;
>>>  return unless ref $settings->{h_region};
>>>
>>>  my @h_regions = map {
>>>    my ($h_ref,$h_start,$h_end,$h_color) = 
>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>>                 : ()
>>>  }
>>>    @{$settings->{h_region}};
>>>
>>>  return unless @h_regions;
>>>  return hilite_regions_closure(@h_regions);
>>> }
>>>
>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>> # suitable for hilighting a region of a panel.
>>> # The args are a list of [start,end,color]
>>> sub hilite_regions_closure {
>>>  my @h_regions = @_;
>>>
>>>  return sub {
>>>    my $gd     = shift;
>>>    my $panel  = shift;
>>>    my $left   = $panel->pad_left;
>>>    my $top    = $panel->top;
>>>    my $bottom = $panel->bottom;
>>>    for my $r (@h_regions) {
>>>      my ($h_start,$h_end,$h_color) = @$r;
>>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always 
>>> see something
>>>      # assuming top is 0 so as to ignore top padding
>>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>    }
>>>  };
>>> }
>>>
>>>
>>> Scott Cain wrote:
>>>
>>> Hello Xianjun,
>>>
>>> I don't think that approach will work.  What you almost certainly need
>>> to do is a postgrid callback that does the drawing of the highlighted
>>> region.  For example code of how to do this, take a look at the
>>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>>> -postgrid is a method of Bio::Graphics::Panel.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun 
>>> Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>>
>>>
>>> HI,
>>>
>>> I am not sure this is the right place I can get help.
>>>
>>> I've suffered by a problem for several days: I want to highlight 
>>> parts of
>>> regions in my track, using a different background color. To do that, I
>>> defined a glyph named "background", based on the
>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>> method, by adding code like below:
>>>
>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>>
>>> # the script is pasted at the end
>>>
>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>> highlight regions into a list of features, and add_track with
>>> -glyph=>'background'. (see the following script, test.pl) This 
>>> really works
>>> as I expect, which will add a colored block at background of all 
>>> tracks in a
>>> panel (including the ruler arrow). You can see the output image in 
>>> attached
>>> file "test.bioperl1.2.3.png"
>>>
>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it 
>>> does not
>>> work. Well, it works, but the highlight part only shrink to a low 
>>> height,
>>> instead of covering all tracks in the panel. I also attached the output
>>> here, see the file "test.bioperl1.6.png".
>>>
>>> I tried to think about the reason, the 'background' module is based 
>>> on the
>>> generic module. What can cause the difference? Is it because 
>>> $gd->height is
>>> different, or the tracks followed with 'background' track can not 
>>> draw from
>>> the first position?
>>>
>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
>>> person
>>> solve problem, wise person avoid problem"...) But another problem is 
>>> coming:
>>> Bio::Graphics in Bioperl 1.2.3 does not support 
>>> $panel->create_web_map()
>>> function, which means I have to use some higher version if I want to 
>>> create
>>> web map for my graphics, but then I have to give up using highlight
>>> background.
>>>
>>> OK. It's long enough for my first-time submission here. Hope someone 
>>> can
>>> throw me some clue.
>>>
>>> Thanks ahead!!
>>>
>>> Xianjun
>>>
>>>
>>> ==================== test.pl =======================
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 = 
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 = 
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 = 
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans  =
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
>>>
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
>>>
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>                                            -length=>1050,
>>>                                            -start =>0,
>>>                                            -pad_left=>12,
>>>                                            -pad_right=>12);
>>>
>>> # the following track works as I expected in bioperl 1.2.3, but not 
>>> in 1.5
>>> and 1.6
>>> $panel->add_track([$trans41,$trans31],
>>>         -glyph   => 'background',
>>>                 -block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>>                 );
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>                 -glyph=>'arrow',
>>>                 -double=>1,
>>>                 -tick=>2);
>>>
>>> $panel->add_track($trans,
>>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>>                 -fgcolor => 'darkred',
>>>                 -bgcolor => 'darkred',
>>>                 -title => '$source',
>>>                 -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
>>> #EnsEMBL
>>>                 );
>>>  print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in 
>>> Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>> 1;
>>>
>>> ==================== background.pm =======================
>>> package Bio::Graphics::Glyph::background;
>>>
>>> use strict;
>>> use base 'Bio::Graphics::Glyph::generic';
>>> sub pad_top{
>>>  return 0;
>>> }
>>>
>>> sub draw_component {
>>>  my $self = shift;
>>>  #$self->SUPER::draw_component(@_);
>>>  my ($gd,$dx,$dy) = @_;
>>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>
>>>  # draw an arrow to indicate the direction of transcript
>>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>>  $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>> }
>>>
>>> 1;
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>>     
>>
>>   
>

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From charles.tilford at bms.com  Thu Jun 18 09:38:34 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 09:38:34 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
Message-ID: <4A3A435A.8000505@bms.com>

Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. 
Can anyone confirm?

Hi all,

I'm using the SCF Bio::SeqIO module to parse trace data out of 
chromatograms. The SCF files are being produced by phred using the "-cd" 
parameter. The traces come out great, and the corresponding base calls 
from the .phd files align with the peaks wonderfully when I visualize 
them on a rendered trace. However, only the A bases align to the 
appropriate trace channel, the rest are mixed up. I find that if I do 
the following re-mapping, the phred base calls match the

SeqIO : Remapped
A : A
C : G
G : T
T : C

The relevant part of Bio::SeqIO::scf is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9

... which indicates that it expects the pack()ed trace data to be in 
order ATGC. The base call parsing code is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8

... which is unpacking in order ACGT. As far as I can tell, the relevant 
official SCF documentation is here:

http://staden.sourceforge.net/manual/formats_unix_4.html

... which indicates that both trace and base order should be ACGT 
(matching the SeqIO unpack() for bases, but not traces). My empirical 
channel unscrambling mapping implies order ACTG, which is different from 
either of the two orders above. The sequence from the SCF file (should 
be that from original AB1 file, I think) is not perfectly identical to 
that called by phred, but is very similar (to be expected); that is, I 
don't need to remap C, G and T to get it to align with the phred data.

So it looks like the SeqIO module is not mapping the sections of the 
packed trace data to the appropriate bases. The unpack order is 
different than the staden documentation ... but so is the order I impose 
to correct the problem. I am still unclear as to the differences between 
V2 and V3 of the format. The major difference appears to be coding the 
trace absolutely (V2) or relatively to prior values (V3); I'd expect if 
I was using one format and SeqIO was trying to parse the other that I 
would get garbage out. Running in verbose reports "scf.pm is working 
with a version 2 scf."

Thoughts on this would be appreciated - can anyone confirm a problem 
with trace extraction from SCF?

I'm hoping that once I convince our admin to (properly) install 
staden::read that I can work directly with the ab1 files, but I need to 
stopgap on SCF for the time being....

-CAT


From cjfields at illinois.edu  Thu Jun 18 11:31:08 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 10:31:08 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>

Charles,

The best way to make sure this is addressed is to file a ticket (bug  
report) on it so we can properly track it.  I have a local  
installation of io_lib and I believe we also have Geneious installed  
locally (both of which read SCF), so I can work on confirming that.   
If it stays on the list it may not get answered and a possible bug  
report will be lost (to possibly bite someone else later).

AFAIK this module doesn't use staden::read but is pure perl.  You are  
more than welcome to try out Bio::SeqIO::staden::read, but I have to  
warn you that most of us are looking at replacing it's functionality  
at some point with BioLib bindings to io_lib (more stable) and so we  
don't intend on following up with bug fixes.

Note: there is also Bio::SCF (non-bp):

http://search.cpan.org/~lds/Bio-SCF-1.01/

chris

On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:

> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of  
> chromatograms. The SCF files are being produced by phred using the "- 
> cd" parameter. The traces come out great, and the corresponding base  
> calls from the .phd files align with the peaks wonderfully when I  
> visualize them on a rendered trace. However, only the A bases align  
> to the appropriate trace channel, the rest are mixed up. I find that  
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in  
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the  
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT  
> (matching the SeqIO unpack() for bases, but not traces). My  
> empirical channel unscrambling mapping implies order ACTG, which is  
> different from either of the two orders above. The sequence from the  
> SCF file (should be that from original AB1 file, I think) is not  
> perfectly identical to that called by phred, but is very similar (to  
> be expected); that is, I don't need to remap C, G and T to get it to  
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the  
> packed trace data to the appropriate bases. The unpack order is  
> different than the staden documentation ... but so is the order I  
> impose to correct the problem. I am still unclear as to the  
> differences between V2 and V3 of the format. The major difference  
> appears to be coding the trace absolutely (V2) or relatively to  
> prior values (V3); I'd expect if I was using one format and SeqIO  
> was trying to parse the other that I would get garbage out. Running  
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem  
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install  
> staden::read that I can work directly with the ab1 files, but I need  
> to stopgap on SCF for the time being....
>
> -CAT


From MEC at stowers.org  Thu Jun 18 11:42:48 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Thu, 18 Jun 2009 10:42:48 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>

Charles,

Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF

	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm

It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.

Its not in the bioperl project but it is an easy install from CPAN.

I am familiar with staden::read installation woes.  

Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
  

#!/usr/bin/env perl

# PURPOSE: extract from AB1 files into fasta format the sequence in
# the 'clear range' defined by 3 parameters.  If there is no clear
# range, emit warning and skip the sequence.  The fasta 'defline'
# identifier is taken as the sample name.  Other useful attributes are
# also embedded into the defline using attribute=value syntax.

# USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1

# NOTE: 20 4 20 is ABI default settings

# EXAMPLE:
# ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta

# AUTHOR: malcolm_cook at stowers-institute.org

use strict;
use warnings;
use Bio::Trace::ABIF;
use Text::Wrap qw(wrap);
$Text::Wrap::columns = 72;	# wrap the sequence

use File::Basename;
my ($window_width,
    $bad_bases_threshold,
    $quality_threshold,
    @ARGV) = @ARGV;

my $abif = Bio::Trace::ABIF->new();

sub main {} {
  foreach (@ARGV) {
    $abif->open_abif($_) or die "error opening $_ as ABIF";
    my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
								   $bad_bases_threshold,
								   $quality_threshold
								  );
    my $sample_score = $abif->sample_score(
					   $window_width,
					   $bad_bases_threshold,
					   $quality_threshold
					  );
    #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
    #							       $quality_threshold,
    #							       0, # ==> trim_ends
    #							      );
    #    my $length_of_read = $abif->length_of_read(
    #				    $window_width,
    #				    $quality_threshold,
    #				    # $method
    #				   );
    my $defline = 
      join "\t", 
	$abif->sample_name,
	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
	  (map {my $method = $_;
		"$method=". ($abif->$method() || '')}
	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
	     # sample_tracking_id - don't use this - it is internal to ABI software
	     "clear_range_start=$clear_range_start",
	       "clear_range_stop=$clear_range_stop",
		 "sample_score=$sample_score",
		   #"contiguous_read_length=$contiguous_read_length",
		   #"length_of_read=$length_of_read",
		   ;
    if ($clear_range_start == -1) {
      warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
      next;
    }
    my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
    print ">$defline\n$seq\n";
    $abif->close_abif();

  }
}

main ();


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Charles Tilford
> Sent: Thursday, June 18, 2009 8:39 AM
> To: BioPerl List
> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
> 
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
> channels. 
> Can anyone confirm?
> 
> Hi all,
> 
> I'm using the SCF Bio::SeqIO module to parse trace data out 
> of chromatograms. The SCF files are being produced by phred 
> using the "-cd" 
> parameter. The traces come out great, and the corresponding 
> base calls from the .phd files align with the peaks 
> wonderfully when I visualize them on a rendered trace. 
> However, only the A bases align to the appropriate trace 
> channel, the rest are mixed up. I find that if I do the 
> following re-mapping, the phred base calls match the
> 
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
> 
> The relevant part of Bio::SeqIO::scf is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE9
> 
> ... which indicates that it expects the pack()ed trace data 
> to be in order ATGC. The base call parsing code is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE8
> 
> ... which is unpacking in order ACGT. As far as I can tell, 
> the relevant official SCF documentation is here:
> 
> http://staden.sourceforge.net/manual/formats_unix_4.html
> 
> ... which indicates that both trace and base order should be 
> ACGT (matching the SeqIO unpack() for bases, but not traces). 
> My empirical channel unscrambling mapping implies order ACTG, 
> which is different from either of the two orders above. The 
> sequence from the SCF file (should be that from original AB1 
> file, I think) is not perfectly identical to that called by 
> phred, but is very similar (to be expected); that is, I don't 
> need to remap C, G and T to get it to align with the phred data.
> 
> So it looks like the SeqIO module is not mapping the sections 
> of the packed trace data to the appropriate bases. The unpack 
> order is different than the staden documentation ... but so 
> is the order I impose to correct the problem. I am still 
> unclear as to the differences between
> V2 and V3 of the format. The major difference appears to be 
> coding the trace absolutely (V2) or relatively to prior 
> values (V3); I'd expect if I was using one format and SeqIO 
> was trying to parse the other that I would get garbage out. 
> Running in verbose reports "scf.pm is working with a version 2 scf."
> 
> Thoughts on this would be appreciated - can anyone confirm a 
> problem with trace extraction from SCF?
> 
> I'm hoping that once I convince our admin to (properly) 
> install staden::read that I can work directly with the ab1 
> files, but I need to stopgap on SCF for the time being....
> 
> -CAT
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From carze at som.umaryland.edu  Thu Jun 18 13:51:43 2009
From: carze at som.umaryland.edu (Cesar Arze)
Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT)
Subject: [Bioperl-l]  Problems parsing scientific name from a Genbank file
Message-ID: <24095355.post@talk.nabble.com>


Hi all,
   I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
addressed.

When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
NC_005945:

  ORGANISM  Bacillus anthracis str. Sterne
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus
            cereus group.

Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.

Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
-- 
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From charles.tilford at bms.com  Thu Jun 18 15:59:01 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 15:59:01 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
References: <4A3A435A.8000505@bms.com>
	<49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
Message-ID: <4A3A9C85.4000603@bms.com>

Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug  
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl. 
Yes, that's my understanding too. I'm using the SeqIO module because of 
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>   
I have that installed, but have not tried it out yet.

Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>   
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of  
>> chromatograms. The SCF files are being produced by phred using the "- 
>> cd" parameter. The traces come out great, and the corresponding base  
>> calls from the .phd files align with the peaks wonderfully when I  
>> visualize them on a rendered trace. However, only the A bases align  
>> to the appropriate trace channel, the rest are mixed up. I find that  
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in  
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the  
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT  
>> (matching the SeqIO unpack() for bases, but not traces). My  
>> empirical channel unscrambling mapping implies order ACTG, which is  
>> different from either of the two orders above. The sequence from the  
>> SCF file (should be that from original AB1 file, I think) is not  
>> perfectly identical to that called by phred, but is very similar (to  
>> be expected); that is, I don't need to remap C, G and T to get it to  
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the  
>> packed trace data to the appropriate bases. The unpack order is  
>> different than the staden documentation ... but so is the order I  
>> impose to correct the problem. I am still unclear as to the  
>> differences between V2 and V3 of the format. The major difference  
>> appears to be coding the trace absolutely (V2) or relatively to  
>> prior values (V3); I'd expect if I was using one format and SeqIO  
>> was trying to parse the other that I would get garbage out. Running  
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem  
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install  
>> staden::read that I can work directly with the ab1 files, but I need  
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>     
>
>
>
>   


From charles.tilford at bms.com  Thu Jun 18 16:02:53 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 16:02:53 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
Message-ID: <4A3A9D6D.2010106@bms.com>

Cook, Malcolm wrote:
> Charles,
>
> Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF
>
> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>
> It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.
>
> Its not in the bioperl project but it is an easy install from CPAN.
>   
Thanks - we installed that a few weeks ago, and it was on my list of 
things to try, but I had not gotten to it yet since I was getting data 
out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, 
the fact that I need to unscramble it makes me nervous... Thanks, too, 
for the example code. I'll try out the Bio::Trace::ABIF module and see 
if it works with our files.

Thanks,
CAT
> I am familiar with staden::read installation woes.  
>
> Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
> #!/usr/bin/env perl
>
> # PURPOSE: extract from AB1 files into fasta format the sequence in
> # the 'clear range' defined by 3 parameters.  If there is no clear
> # range, emit warning and skip the sequence.  The fasta 'defline'
> # identifier is taken as the sample name.  Other useful attributes are
> # also embedded into the defline using attribute=value syntax.
>
> # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1
>
> # NOTE: 20 4 20 is ABI default settings
>
> # EXAMPLE:
> # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta
>
> # AUTHOR: malcolm_cook at stowers-institute.org
>
> use strict;
> use warnings;
> use Bio::Trace::ABIF;
> use Text::Wrap qw(wrap);
> $Text::Wrap::columns = 72;	# wrap the sequence
>
> use File::Basename;
> my ($window_width,
>     $bad_bases_threshold,
>     $quality_threshold,
>     @ARGV) = @ARGV;
>
> my $abif = Bio::Trace::ABIF->new();
>
> sub main {} {
>   foreach (@ARGV) {
>     $abif->open_abif($_) or die "error opening $_ as ABIF";
>     my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
> 								   $bad_bases_threshold,
> 								   $quality_threshold
> 								  );
>     my $sample_score = $abif->sample_score(
> 					   $window_width,
> 					   $bad_bases_threshold,
> 					   $quality_threshold
> 					  );
>     #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
>     #							       $quality_threshold,
>     #							       0, # ==> trim_ends
>     #							      );
>     #    my $length_of_read = $abif->length_of_read(
>     #				    $window_width,
>     #				    $quality_threshold,
>     #				    # $method
>     #				   );
>     my $defline = 
>       join "\t", 
> 	$abif->sample_name,
> 	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
> 	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
> 	  (map {my $method = $_;
> 		"$method=". ($abif->$method() || '')}
> 	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
> 	     # sample_tracking_id - don't use this - it is internal to ABI software
> 	     "clear_range_start=$clear_range_start",
> 	       "clear_range_stop=$clear_range_stop",
> 		 "sample_score=$sample_score",
> 		   #"contiguous_read_length=$contiguous_read_length",
> 		   #"length_of_read=$length_of_read",
> 		   ;
>     if ($clear_range_start == -1) {
>       warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
>       next;
>     }
>     my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
>     print ">$defline\n$seq\n";
>     $abif->close_abif();
>
>   }
> }
>
> main ();
>
>
>
>
>
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Charles Tilford
>> Sent: Thursday, June 18, 2009 8:39 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
>>
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
>> channels. 
>> Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out 
>> of chromatograms. The SCF files are being produced by phred 
>> using the "-cd" 
>> parameter. The traces come out great, and the corresponding 
>> base calls from the .phd files align with the peaks 
>> wonderfully when I visualize them on a rendered trace. 
>> However, only the A bases align to the appropriate trace 
>> channel, the rest are mixed up. I find that if I do the 
>> following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data 
>> to be in order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, 
>> the relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be 
>> ACGT (matching the SeqIO unpack() for bases, but not traces). 
>> My empirical channel unscrambling mapping implies order ACTG, 
>> which is different from either of the two orders above. The 
>> sequence from the SCF file (should be that from original AB1 
>> file, I think) is not perfectly identical to that called by 
>> phred, but is very similar (to be expected); that is, I don't 
>> need to remap C, G and T to get it to align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections 
>> of the packed trace data to the appropriate bases. The unpack 
>> order is different than the staden documentation ... but so 
>> is the order I impose to correct the problem. I am still 
>> unclear as to the differences between
>> V2 and V3 of the format. The major difference appears to be 
>> coding the trace absolutely (V2) or relatively to prior 
>> values (V3); I'd expect if I was using one format and SeqIO 
>> was trying to parse the other that I would get garbage out. 
>> Running in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a 
>> problem with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) 
>> install staden::read that I can work directly with the ab1 
>> files, but I need to stopgap on SCF for the time being....
>>
>> -CAT
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     


From cjfields at illinois.edu  Thu Jun 18 16:27:02 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 15:27:02 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A9D6D.2010106@bms.com>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
	<4A3A9D6D.2010106@bms.com>
Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu>


On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote:

> Cook, Malcolm wrote:
>> Charles,
>>
>> Another possible stopgap that might work for you, if you're working  
>> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to  
>> use Bio::Trace::ABIF
>>
>> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>>
>> It works great and includes implementation of ABIs algorithm  
>> allowing to (re)compute trace clear ranges using kc-basecallers  
>> quality scores and any windowing/quality parameters.
>>
>> Its not in the bioperl project but it is an easy install from CPAN.
>>
> Thanks - we installed that a few weeks ago, and it was on my list of  
> things to try, but I had not gotten to it yet since I was getting  
> data out of the SCF SeqIO module. Even though the SeqIO::scf data  
> looks ok, the fact that I need to unscramble it makes me nervous...  
> Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF  
> module and see if it works with our files.
>
> Thanks,
> CAT

You definitely shouldn't need to unscramble it; my guess is this is a  
legit bug that just has gone unnoticed.  I see that you have filed a  
ticket on it so we can at least track it.  Thanks!

chris


From scott at scottcain.net  Thu Jun 18 23:25:35 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:25:35 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A3A13D3.7050208@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>

Hi Xianjun,

The attached script (which is not too different from yours--I only did
a little clean up and made the padding consistent) makes the attached
image, which is what I think you want.  I'm using bioperl-live.

Scott


On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott,
>
> Do you mind to have a look of the code (below my signature) if I use the
> -postgrid callback correctly?
> I still cannnot get the background for the whole panel.
>
> Thanks
>
> Xianjun
>
>
> Xianjun Dong wrote:
>>
>> Hi, Scott
>>
>> Before I gave up my own whole solution to use GBrowse, I still want to
>> bother you once:
>>
>> As you suggested, I put -postgrid option when the panel, which will call a
>> function to draw the background. The code below is almost copied from the
>> online POD of Bio::Graphics::Panel (see
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>> )
>>
>> But it still does not work. Could you help to have a look? I paste it
>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>
>> THanks
>>
>> Xianjun
>>
>> ----------------------------------------------- mytestcode.pl
>> --------------------------
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 =
>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 =
>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 =
>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans ?=
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>
>> sub gap_it {
>> ? ?my $gd ? ?= shift;
>> ? ?my $panel = shift;
>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>> }
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> #$panel->add_track([$trans41,$trans31],
>> # ? ? ? ? ?-glyph ? => 'background',
>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>> # ? ? ? ? ? ? ? ? ?);
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>> ? ? ? ? ? ? ? ? -double=>1,
>> ? ? ? ? ? ? ? ? -tick=>2);
>>
>> $panel->add_track($trans,
>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -title => '$source',
>> ? ? ? ? ? ? ? ? -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>> ? ? ? ? ? ? ? ? );
>> ?print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Scott Cain wrote:
>>>
>>> Hi Xianjun,
>>>
>>> I understand what you want to do, as the current version of gbrowse
>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>> I can't tell you exactly how this works and you didn't send your code
>>> that uses this callback, so I can't try it either.
>>>
>>> One thing that is different between your code and gbrowse is that each
>>> of the tracks is actually a seperate panel (to allow track dragging),
>>> so it possible that this sort of callback doesn't work for
>>> Bio::Graphics any more.
>>>
>>> Scott
>>>
>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>> wrote:
>>>
>>>>
>>>> Hi, Scott
>>>>
>>>> Thanks for your reply first.
>>>>
>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>> hilite_regions_closure function to draw them out, using the following GD
>>>> function:
>>>>
>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>
>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>> pad_bottom), we can see this in the code of
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>
>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>
>>>> OK. I might have not explained my question explicitly. My question is:
>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>> where the highlight range will go from the roof to the floor. While in
>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>> difference of the two images.
>>>>
>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>> in the following links:
>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>> test.bioperl1.2.3.png:
>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>
>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>> your computer?
>>>>
>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>> might be a bug at that version, or whatever)
>>>>
>>>> Thanks
>>>>
>>>> Xianjun
>>>> =============================================
>>>>
>>>> # this generates the callback for highlighting a region
>>>> sub make_postgrid_callback {
>>>> ?my $settings = shift;
>>>> ?return unless ref $settings->{h_region};
>>>>
>>>> ?my @h_regions = map {
>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>> ? ? ? ? ? ? ? ?: ()
>>>> ?}
>>>> ? @{$settings->{h_region}};
>>>>
>>>> ?return unless @h_regions;
>>>> ?return hilite_regions_closure(@h_regions);
>>>> }
>>>>
>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>> # suitable for hilighting a region of a panel.
>>>> # The args are a list of [start,end,color]
>>>> sub hilite_regions_closure {
>>>> ?my @h_regions = @_;
>>>>
>>>> ?return sub {
>>>> ? my $gd ? ? = shift;
>>>> ? my $panel ?= shift;
>>>> ? my $left ? = $panel->pad_left;
>>>> ? my $top ? ?= $panel->top;
>>>> ? my $bottom = $panel->bottom;
>>>> ? for my $r (@h_regions) {
>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>> something
>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>> ? }
>>>> ?};
>>>> }
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>> Hello Xianjun,
>>>>
>>>> I don't think that approach will work. ?What you almost certainly need
>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>> region. ?For example code of how to do this, take a look at the
>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>
>>>> HI,
>>>>
>>>> I am not sure this is the right place I can get help.
>>>>
>>>> I've suffered by a problem for several days: I want to highlight parts
>>>> of
>>>> regions in my track, using a different background color. To do that, I
>>>> defined a glyph named "background", based on the
>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>> method, by adding code like below:
>>>>
>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>>
>>>> # the script is pasted at the end
>>>>
>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>> highlight regions into a list of features, and add_track with
>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>> works
>>>> as I expect, which will add a colored block at background of all tracks
>>>> in a
>>>> panel (including the ruler arrow). You can see the output image in
>>>> attached
>>>> file "test.bioperl1.2.3.png"
>>>>
>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>> not
>>>> work. Well, it works, but the highlight part only shrink to a low
>>>> height,
>>>> instead of covering all tracks in the panel. I also attached the output
>>>> here, see the file "test.bioperl1.6.png".
>>>>
>>>> I tried to think about the reason, the 'background' module is based on
>>>> the
>>>> generic module. What can cause the difference? Is it because $gd->height
>>>> is
>>>> different, or the tracks followed with 'background' track can not draw
>>>> from
>>>> the first position?
>>>>
>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>> person
>>>> solve problem, wise person avoid problem"...) But another problem is
>>>> coming:
>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>> function, which means I have to use some higher version if I want to
>>>> create
>>>> web map for my graphics, but then I have to give up using highlight
>>>> background.
>>>>
>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>> throw me some clue.
>>>>
>>>> Thanks ahead!!
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> ==================== test.pl =======================
>>>> #!/usr/bin/perl
>>>>
>>>> use strict;
>>>> use lib "$ENV{HOME}/lib";
>>>>
>>>> use Bio::Graphics;
>>>> use Bio::Graphics::Feature;
>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>
>>>> # processed_transcript
>>>> my $trans1 =
>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>> my $trans2 =
>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>> my $trans3 =
>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans4 =
>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans5 =
>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>> my $trans ?=
>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>
>>>> # hightlight
>>>> my $trans31 =
>>>>
>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>> -source=>'a');
>>>> my $trans41 =
>>>>
>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>> -source=>'b');
>>>>
>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>
>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>> 1.5
>>>> and 1.6
>>>> $panel->add_track([$trans41,$trans31],
>>>> ? ? ? ?-glyph ? => 'background',
>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>> 'a')?'#cccccc':'#fffc22'},
>>>> ? ? ? ? ? ? ? ?);
>>>>
>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>
>>>> $panel->add_track($trans,
>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>> ? ? ? ? ? ? ? ?-link =>
>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>> ?#EnsEMBL
>>>> ? ? ? ? ? ? ? ?);
>>>> ?print $panel->png;
>>>>
>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>> Bioperl
>>>> 1.2.3
>>>> my $map = $panel->create_web_map("image");
>>>> $panel->finished();
>>>>
>>>> 1;
>>>>
>>>> ==================== background.pm =======================
>>>> package Bio::Graphics::Glyph::background;
>>>>
>>>> use strict;
>>>> use base 'Bio::Graphics::Glyph::generic';
>>>> sub pad_top{
>>>> ?return 0;
>>>> }
>>>>
>>>> sub draw_component {
>>>> ?my $self = shift;
>>>> ?#$self->SUPER::draw_component(@_);
>>>> ?my ($gd,$dx,$dy) = @_;
>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>
>>>> ?# draw an arrow to indicate the direction of transcript
>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>> }
>>>>
>>>> 1;
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>>
>>>
>>>
>>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid.pl
Type: application/x-perl
Size: 2140 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid_highlight.png
Type: image/png
Size: 7195 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment-0002.png>

From scott at scottcain.net  Thu Jun 18 23:30:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:30:37 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
	<4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com>

Actually, to be clear, that's bioperl-live and Bio::Graphics version
1.96 from CPAN.

On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain<scott at scottcain.net> wrote:
> Hi Xianjun,
>
> The attached script (which is not too different from yours--I only did
> a little clean up and made the padding consistent) makes the attached
> image, which is what I think you want. ?I'm using bioperl-live.
>
> Scott
>
>
> On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>> Hi, Scott,
>>
>> Do you mind to have a look of the code (below my signature) if I use the
>> -postgrid callback correctly?
>> I still cannnot get the background for the whole panel.
>>
>> Thanks
>>
>> Xianjun
>>
>>
>> Xianjun Dong wrote:
>>>
>>> Hi, Scott
>>>
>>> Before I gave up my own whole solution to use GBrowse, I still want to
>>> bother you once:
>>>
>>> As you suggested, I put -postgrid option when the panel, which will call a
>>> function to draw the background. The code below is almost copied from the
>>> online POD of Bio::Graphics::Panel (see
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>>> )
>>>
>>> But it still does not work. Could you help to have a look? I paste it
>>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>>
>>> THanks
>>>
>>> Xianjun
>>>
>>> ----------------------------------------------- mytestcode.pl
>>> --------------------------
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 =
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 =
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 =
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans ?=
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>>
>>> sub gap_it {
>>> ? ?my $gd ? ?= shift;
>>> ? ?my $panel = shift;
>>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>>> }
>>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>>> and 1.6
>>> #$panel->add_track([$trans41,$trans31],
>>> # ? ? ? ? ?-glyph ? => 'background',
>>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>> # ? ? ? ? ? ? ? ? ?);
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>>> ? ? ? ? ? ? ? ? -double=>1,
>>> ? ? ? ? ? ? ? ? -tick=>2);
>>>
>>> $panel->add_track($trans,
>>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -title => '$source',
>>> ? ? ? ? ? ? ? ? -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>>> ? ? ? ? ? ? ? ? );
>>> ?print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Scott Cain wrote:
>>>>
>>>> Hi Xianjun,
>>>>
>>>> I understand what you want to do, as the current version of gbrowse
>>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>>> I can't tell you exactly how this works and you didn't send your code
>>>> that uses this callback, so I can't try it either.
>>>>
>>>> One thing that is different between your code and gbrowse is that each
>>>> of the tracks is actually a seperate panel (to allow track dragging),
>>>> so it possible that this sort of callback doesn't work for
>>>> Bio::Graphics any more.
>>>>
>>>> Scott
>>>>
>>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi, Scott
>>>>>
>>>>> Thanks for your reply first.
>>>>>
>>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>>> hilite_regions_closure function to draw them out, using the following GD
>>>>> function:
>>>>>
>>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>>
>>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>>> pad_bottom), we can see this in the code of
>>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>>
>>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>>
>>>>> OK. I might have not explained my question explicitly. My question is:
>>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>>> where the highlight range will go from the roof to the floor. While in
>>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>>> difference of the two images.
>>>>>
>>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>>> in the following links:
>>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>>> test.bioperl1.2.3.png:
>>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>>
>>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>>> your computer?
>>>>>
>>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>>> might be a bug at that version, or whatever)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Xianjun
>>>>> =============================================
>>>>>
>>>>> # this generates the callback for highlighting a region
>>>>> sub make_postgrid_callback {
>>>>> ?my $settings = shift;
>>>>> ?return unless ref $settings->{h_region};
>>>>>
>>>>> ?my @h_regions = map {
>>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>>> ? ? ? ? ? ? ? ?: ()
>>>>> ?}
>>>>> ? @{$settings->{h_region}};
>>>>>
>>>>> ?return unless @h_regions;
>>>>> ?return hilite_regions_closure(@h_regions);
>>>>> }
>>>>>
>>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>>> # suitable for hilighting a region of a panel.
>>>>> # The args are a list of [start,end,color]
>>>>> sub hilite_regions_closure {
>>>>> ?my @h_regions = @_;
>>>>>
>>>>> ?return sub {
>>>>> ? my $gd ? ? = shift;
>>>>> ? my $panel ?= shift;
>>>>> ? my $left ? = $panel->pad_left;
>>>>> ? my $top ? ?= $panel->top;
>>>>> ? my $bottom = $panel->bottom;
>>>>> ? for my $r (@h_regions) {
>>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>>> something
>>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>> ? }
>>>>> ?};
>>>>> }
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>> Hello Xianjun,
>>>>>
>>>>> I don't think that approach will work. ?What you almost certainly need
>>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>>> region. ?For example code of how to do this, take a look at the
>>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>>> wrote:
>>>>>
>>>>>
>>>>> HI,
>>>>>
>>>>> I am not sure this is the right place I can get help.
>>>>>
>>>>> I've suffered by a problem for several days: I want to highlight parts
>>>>> of
>>>>> regions in my track, using a different background color. To do that, I
>>>>> defined a glyph named "background", based on the
>>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>>> method, by adding code like below:
>>>>>
>>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>>
>>>>> # the script is pasted at the end
>>>>>
>>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>>> highlight regions into a list of features, and add_track with
>>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>>> works
>>>>> as I expect, which will add a colored block at background of all tracks
>>>>> in a
>>>>> panel (including the ruler arrow). You can see the output image in
>>>>> attached
>>>>> file "test.bioperl1.2.3.png"
>>>>>
>>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>>> not
>>>>> work. Well, it works, but the highlight part only shrink to a low
>>>>> height,
>>>>> instead of covering all tracks in the panel. I also attached the output
>>>>> here, see the file "test.bioperl1.6.png".
>>>>>
>>>>> I tried to think about the reason, the 'background' module is based on
>>>>> the
>>>>> generic module. What can cause the difference? Is it because $gd->height
>>>>> is
>>>>> different, or the tracks followed with 'background' track can not draw
>>>>> from
>>>>> the first position?
>>>>>
>>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>>> person
>>>>> solve problem, wise person avoid problem"...) But another problem is
>>>>> coming:
>>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>>> function, which means I have to use some higher version if I want to
>>>>> create
>>>>> web map for my graphics, but then I have to give up using highlight
>>>>> background.
>>>>>
>>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>>> throw me some clue.
>>>>>
>>>>> Thanks ahead!!
>>>>>
>>>>> Xianjun
>>>>>
>>>>>
>>>>> ==================== test.pl =======================
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use strict;
>>>>> use lib "$ENV{HOME}/lib";
>>>>>
>>>>> use Bio::Graphics;
>>>>> use Bio::Graphics::Feature;
>>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>>
>>>>> # processed_transcript
>>>>> my $trans1 =
>>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>>> my $trans2 =
>>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>>> my $trans3 =
>>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans4 =
>>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans5 =
>>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>>> my $trans ?=
>>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>>
>>>>> # hightlight
>>>>> my $trans31 =
>>>>>
>>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>>> -source=>'a');
>>>>> my $trans41 =
>>>>>
>>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>>> -source=>'b');
>>>>>
>>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>>
>>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>>> 1.5
>>>>> and 1.6
>>>>> $panel->add_track([$trans41,$trans31],
>>>>> ? ? ? ?-glyph ? => 'background',
>>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>>> 'a')?'#cccccc':'#fffc22'},
>>>>> ? ? ? ? ? ? ? ?);
>>>>>
>>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>>
>>>>> $panel->add_track($trans,
>>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>>> ? ? ? ? ? ? ? ?-link =>
>>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>>> ?#EnsEMBL
>>>>> ? ? ? ? ? ? ? ?);
>>>>> ?print $panel->png;
>>>>>
>>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>>> Bioperl
>>>>> 1.2.3
>>>>> my $map = $panel->create_web_map("image");
>>>>> $panel->finished();
>>>>>
>>>>> 1;
>>>>>
>>>>> ==================== background.pm =======================
>>>>> package Bio::Graphics::Glyph::background;
>>>>>
>>>>> use strict;
>>>>> use base 'Bio::Graphics::Glyph::generic';
>>>>> sub pad_top{
>>>>> ?return 0;
>>>>> }
>>>>>
>>>>> sub draw_component {
>>>>> ?my $self = shift;
>>>>> ?#$self->SUPER::draw_component(@_);
>>>>> ?my ($gd,$dx,$dy) = @_;
>>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>>
>>>>> ?# draw an arrow to indicate the direction of transcript
>>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>> }
>>>>>
>>>>> 1;
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087
> Ontario Institute for Cancer Research
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From roy.chaudhuri at gmail.com  Fri Jun 19 06:34:24 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 19 Jun 2009 11:34:24 +0100
Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file
In-Reply-To: <24095355.post@talk.nabble.com>
References: <24095355.post@talk.nabble.com>
Message-ID: <4A3B69B0.8080305@gmail.com>

Hi Cesar,

I can replicate this using an old Bioperl (version 1.5.2), but it 
appears to be fixed in version 1.6 and bioperl-live - the 
scientific_name method returns "Bacillus anthracis str. Sterne".

Hope this helps.
Roy.

Cesar Arze wrote:
> Hi all,
>    I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
> 
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
> 
>   ORGANISM  Bacillus anthracis str. Sterne
>             Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
>             cereus group.
> 
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
> 
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.


From cjfields at illinois.edu  Fri Jun 19 16:57:36 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 19 Jun 2009 15:57:36 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
Message-ID: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>

So, to follow up (and make sure we don't have any overlapping tuits)  
we should probably determine who wants to work on what (i.e. fastq  
updating, etc). I think it's possible to quickly add in Solexa/ 
Illumina/Sanger fastq similar to BioPython, just don't want to step on  
anyone's toes if they are halfway through doing this.

chris

On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:

> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for  
>>> specific memory/CPU intenstive things we also switch to using dbs.  
>>> For example, after many years of loyal use of disconnected_ranges  
>>> we switched to a simple SQL implementation of it, because of the  
>>> large performance gains it would give us.  Similarly in Ensembl as  
>>> well as in the old days of bioperl-db we opted for doing subseq  
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less  
>>> "disruptive" than avoiding object creation and provide significant  
>>> gains in performance. Could be set as an optional flag, and could  
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those  
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>> significant set of users out there who are dealing with next-gen  
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who  
>>>> would like to integrate nextgen into bioperl somehow, probably to  
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>> Would it be possible to have an ultra-light quality object with  
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>> return the data directly. At that point it's not taking much  
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up  
>>>> with a set of 'use cases' specifying what users want so we can  
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence  
>>>> database, something similar in theme to Lincoln's  
>>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Sat Jun 20 04:46:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 20 Jun 2009 09:46:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations? ?Our version (I believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).
>> Internally we have three separate FASTQ parsers/writers although
>> they do share code.
>
> We could easily do the same if others agree. ?Actually, if we specified that
> shorthand for a variant on a format would be designated as -format =>
> 'format-variant', I think we could easily hack SeqIO to deal with that by
> splitting on '-' and passing everything to the constructor as (-format =>
> 'format', -variant => 'variant'). ?Very little repeated code in this case,
> just an additional named parameter indicating the format variant (and the
> SeqIO class can do the type checking on that within the constructor).

Yes, when I started using names like "fastq-solexa" I did have in mind
"main-variant" naming convention, and potentially Biopython may one
day actually use this structure when allocating a Bio.SeqIO job to the
appropriate parser or writer.

For now, the Biopython list of formats is fairly short (and there are
relatively few of these sub-formats) so to keep things simple we just
have a flat mapping from the format name (e.g. "fasta", "fastq",
"fastq-solexa") to the parser/write code.

Peter


From e.stupka at ucl.ac.uk  Sat Jun 20 16:12:18 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Sat, 20 Jun 2009 21:12:18 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
	<E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
Message-ID: <F99E2F7F-05F7-462B-A3ED-96E09746994B@ucl.ac.uk>

Hi Chris,

I agree. I have not written a single line of code so far, while Heikki  
has some (but has been silent for a while) and you have perhaps some  
code ready to roll. I am happy to help where needed, just let me know  
what you'd like me to focus on. If you want to go ahead and implement  
the fastq staff discussed I can focus on bioperl-run.

cheers

Elia


On 19 Jun 2009, at 21:57, Chris Fields wrote:

> So, to follow up (and make sure we don't have any overlapping tuits)  
> we should probably determine who wants to work on what (i.e. fastq  
> updating, etc). I think it's possible to quickly add in Solexa/ 
> Illumina/Sanger fastq similar to BioPython, just don't want to step  
> on anyone's toes if they are halfway through doing this.
>
> chris
>
> On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
>
>> Better than colorspaced discussions for sure ;)
>>
>> Elia
>>
>> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>>
>>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>>> other options.
>>>
>>> Illuminating discussion, thanks Elia!
>>>
>>> urgh, excuse unintended bad pun above...
>>>
>>> chris
>>>
>>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>>
>>>> Interesting that you mention the database issue. We found that  
>>>> for specific memory/CPU intenstive things we also switch to using  
>>>> dbs. For example, after many years of loyal use of  
>>>> disconnected_ranges we switched to a simple SQL implementation of  
>>>> it, because of the large performance gains it would give us.   
>>>> Similarly in Ensembl as well as in the old days of bioperl-db we  
>>>> opted for doing subseq within SQL where possible.
>>>>
>>>> Some lean way of SQL'izing specific components could be less  
>>>> "disruptive" than avoiding object creation and provide  
>>>> significant gains in performance. Could be set as an optional  
>>>> flag, and could use temporary ad hoc SQL databases?
>>>>
>>>> Still, priority now is to make SeqIO compliant with all those  
>>>> formats, than we can worry about performance :)
>>>>
>>>> Elia
>>>>
>>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>>
>>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>>
>>>>>> Tristan Lefebure wrote:
>>>>>>> Hello,
>>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>>> experience, another issue is bioperl speed. For example, if  
>>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>>> shortcuts...).
>>>>>>
>>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>>> significant set of users out there who are dealing with next- 
>>>>>> gen sequencing and would consider using BioPerl for their work?
>>>>>>
>>>>>> I'm working with all the 1000-genomes data at the Sanger, and  
>>>>>> we at least are probably never going to use BioPerl for the work.
>>>>>
>>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>>
>>>>> Judging by the feedback there are definitely a set of users who  
>>>>> would like to integrate nextgen into bioperl somehow, probably  
>>>>> to take advantage of other aspects of bioperl.
>>>>>
>>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>>> Would it be possible to have an ultra-light quality object  
>>>>>>> with few simple methods for next-gen reads?
>>>>>>
>>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>>> return the data directly. At that point it's not taking much  
>>>>>> advantage of BioPerl. But certainly it could be done...
>>>>>
>>>>>
>>>>> I suppose the best way to assess what needs to be done is come  
>>>>> up with a set of 'use cases' specifying what users want so we  
>>>>> can design around them, otherwise we're shooting in the dark.
>>>>>
>>>>> I'm personally wondering if this could be done as a sequence  
>>>>> database, something similar in theme to Lincoln's  
>>>>> SeqFeature::Store, but sequence only, and returns quality  
>>>>> objects in a similar manner (ala Storable)?  Not sure whether  
>>>>> that's feasible, but it's appears at least scalable.
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> ---
>>>> Senior Lecturer, Bioinformatics
>>>> UCL Cancer Institute
>>>> Paul O' Gorman Building
>>>> University College London
>>>> Gower Street
>>>> WC1E 6BT
>>>> London
>>>> UK
>>>>
>>>> Office (UCL): +44 207 679 6493
>>>> Office (ICMS): +44 0207 8822374
>>>>
>>>> Mobile: +44 7597 566 194
>>>> Mobile (Italy): +39 338 8448801
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From lincoln.stein at gmail.com  Sat Jun 20 17:01:43 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Sat, 20 Jun 2009 17:01:43 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <6dce9a0b0906201401j40175dbdscd71360396fe9f7a@mail.gmail.com>

Hi All,

Apropos of this, I am about to release to CPAN a BioPerl interface to SAM
and BAM files. The documentation is still in progress, but you can get CVS
access here:

% cvs -d :pserver:anonymous at gmod.cvs.sourceforge.net:/cvsroot/gmod co
gbrowse-adaptors/Bio-SamTools

Lincoln

On Wed, Jun 17, 2009 at 7:29 AM, Elia Stupka <e.stupka at ucl.ac.uk> wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From hartzell at alerce.com  Mon Jun 22 09:18:20 2009
From: hartzell at alerce.com (George Hartzell)
Date: Mon, 22 Jun 2009 06:18:20 -0700
Subject: [Bioperl-l] Anyone at YAPC?
Message-ID: <19007.33948.411442.197063@already.dhcp.gene.com>


I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.

g.


From cjfields1 at gmail.com  Mon Jun 22 10:05:56 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Mon, 22 Jun 2009 09:05:56 -0500
Subject: [Bioperl-l] changing parameters in Bio::Tools::Run::RemoteBlast
In-Reply-To: <F52FFB80A7304749B467C46E10A2869D@jonas>
References: <F52FFB80A7304749B467C46E10A2869D@jonas>
Message-ID: <67ABC7E3-216E-4F5A-B18E-A775A6B4D8F7@gmail.com>

Jonas,

The best place to send questions is to the mail list (which I've  
cc'd).  If you reply make sure to keep the mail list in the reply-to.

There are two ways to set the parameters you want.  I'll show you what  
I consider the best, but I have no way to test it ATM.

$factory->submit_parameter($foo => 'bar')

is the syntax for setting PUT parameters.  Sad to see they didn't  
provide you with the exact PUT parameter names (as follows):

Max target sequences = 100 # MAX_NUM_SEQ
Expect threshold = 10  # EXPECT
Gap Costs = Existence 11 Extension 1   # GAPCOSTS
Compositional adjustments = Conditional compositional score matrix  
adjustment # COMPOSITION_BASED_STATISTICS

'Compositional adjustments' is as follows (from command-line blastall):

   -C  Use composition-based score adjustments for blastp or tblastn:
       As first character:
       D or d: default (equivalent to T)
       0 or F or f: no composition-based statistics
       2 or T or t: Composition-based score adjustments as in  
Bioinformatics 21:902-911,
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
           2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics  
21:902-911,
           2005, unconditionally
       For programs other than tblastn, must either be absent or be D,  
F or 0.
            As second character, if first character is equivalent to  
1, 2, or 3:

After the factory line and prior to the BLAST call you can add in the  
following (completely untested, excuse any possible mistakes) code:

my %put = (
    MAX_NUM_SEQ => 100,
    EXPECT      => 10,
    GAPCOSTS    => '11 1',
    COMPOSITION_BASED_STATISTICS => 2 # could be 1 as well
);

for my $putName (keys %put) {
    $self->submit_parameter($putName,$put{$putName});
}


chris

On Jun 22, 2009, at 8:14 AM, Jonas Schaer wrote:

> Hi there,
> I hope it's OK to ask you a question about the bio perl module   
> Bio::Tools::Run::RemoteBlast.
> My problem is, that I get different results using this perl-skript:
>
> #######################################################################################################################################################################################
>  use Bio::Seq::SeqFactory;
>  use Bio::Tools::Run::RemoteBlast;
>  use strict;
>  my @blast_report;
>  my $prog = 'blastp';
>  my $db   = 'nr';
>  my $e_val= '1e-10';
>  my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>  #my $input = @_;
>  my  
> $ 
> blast_seq 
> = 
> 'MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE 
> ';
>  #$v is just to turn on and off the messages
>  my $v = 1;
>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
> 'Bio::PrimarySeq');
>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>  
> "$blast_seq");
>  my $filename='temp2.out';
>  my $r = $factory->submit_blast($seq);
>  print STDERR "waiting..." if( $v > 0 );
>    while ( my @rids = $factory->each_rid )
>    {
>        foreach my $rid ( @rids )
>        {
>            my $rc = $factory->retrieve_blast($rid);
>            if( !ref($rc) )
>            {
>                if( $rc < 0 )
>                {
>                    $factory->remove_rid($rid);
>                }
>                print STDERR "." if ( $v > 0 );
>            }
>                else
>                {
>                    my $result = $rc->next_result();
>                    $factory->save_output($filename);
>                    $factory->remove_rid($rid);
>                    print "\nQuery Name: ", $result->query_name(),  
> "\n";
>                    while ( my $hit = $result->next_hit )
>                    {
>                        next unless ( $v > 0);
>                        print "\thit name is ", $hit->name, "\n";
>                        while( my $hsp = $hit->next_hsp )
>                        {
>                            print "\t\tscore is ", $hsp->score, "\n";
>                        }
>                    }
>                }
>        }
>
>
>    }
> @blast_report = get_file_data ($filename);
> return @blast_report;
>
>
> sub get_file_data
> {
>    use strict;
>    my($filename) = @_;
>    use strict;
>    use warnings;
>    # Initialize variables
>    my @filedata = ( );
>    unless( open(GET_FILE_DATA, $filename) )
>    {
>        print STDERR "Cannot open file \"$filename\"\n\n";
>        exit;
>    }
>    @filedata = <GET_FILE_DATA>;
>    close GET_FILE_DATA;
>    print @filedata;
>    return @filedata;
> }
>
> #######################################################################################################################################################################################
>
> ... and the blastp on the ncbi-homepage. The people from NCBI wrote  
> me that I have to change some parameters:
> ""
> You need to have the following:
>
>
> Max target sequences = 100
> Expect threshold = 10
> Gap Costs = Existence 11 Extension 1
> Compositional adjustments = Conditional compositional score matrix  
> adjustment""
>
> Could you please tell me exactly how to change this parameters  
> within my perl-skript? I think I have to use the "put" command, but  
> I just cannot find out, how...
>
> Regards and thank you so much in advance :),
>
> Jonas Schaer


From biopython at maubp.freeserve.co.uk  Mon Jun 22 10:24:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Jun 2009 15:24:55 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
> Peter wrote:
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional repeated
>> title is missing on the "+" lines (as discussed earlier on the BioPerl
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if that's
> currently the case. ?I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes - especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description (as
>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>> for examples, e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris

Another couple of points that I should have remembered earlier,
related to converting between PHRED scores and Solexa scores.
On the bright side, with Illumina abandoning the Solexa scores
in pipeline 1.3+, these issues will go away with time:

(7) If BioPerl will be converting Solexa scores to/from PHRED
scores as integers automatically (as discussed earlier), make
sure you round to the nearest whole number (don't just truncate
with a call to int!). MAQ does this by adding 0.5 before calling
int (while in Biopython I just use Python's round function).

(8) When asked to write out an old Solexa style FASTQ file,
what will you do if given a standard Sanger FASTQ file (or a
new Illumina 1.3+ FASTQ file) containing a base with PHRED
quality zero? This maps to a Solexa quality of minus infinity...
Right now the development version of Biopython will throw an
error in this situation, but mapping to the lowest observed
Solexa score might be reasonable.

Peter


From cjfields at illinois.edu  Mon Jun 22 09:54:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 08:54:22 -0500
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <19007.33948.411442.197063@already.dhcp.gene.com>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
Message-ID: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>

I think some of the regular #bioperl folk are there (Jay Hannah, R.  
Buels, etc).  May be worth going on IRC to find everyone.

I'm giving serious thought to going next year if I can get enough work  
done towards a perl6 or Moose-based bioperl.

chris

On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:

>
> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>
> g.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vofford at rvc.ac.uk  Mon Jun 22 12:10:43 2009
From: vofford at rvc.ac.uk (Offord, Victoria)
Date: Mon, 22 Jun 2009 17:10:43 +0100
Subject: [Bioperl-l] Clustalw
Message-ID: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>

Hi,

 
Can anyone help and tell me where I am going wrong please J 

I am getting this error from the following script:

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
-output=gcg   -matrix=BLOSUM -ktuple=2
-outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
file or directory

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357

STACK: Bio::Tools::Run::Alignment::Clustalw::_run
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756

STACK: Bio::Tools::Run::Alignment::Clustalw::align
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515

STACK: tester.pl:25

-----------------------------------------------------------

 
#--------------------------------------------SCRIPT---------------------
--------------------------#

#!/usr/bin/perl -w

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';

use Bio::Seq;

 
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');

 my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);

 
my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";

my $b =
"NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";

my $seq1 = Bio::Seq->new ( -seq  => $a,

                           -id   => 'real',

                           -desc => 'this is a real Seq');

 my $seq2 = Bio::Seq->new ( -seq  => $b,

                           -id   => 'test',

                           -desc => 'this is a test Seq');


my @seq_array = ($seq1,$seq2);

 
my $seq_array_ref = \@seq_array;

my $aln = $factory->align($seq_array_ref);

 
From Kevin.M.Brown at asu.edu  Mon Jun 22 12:48:27 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 22 Jun 2009 09:48:27 -0700
Subject: [Bioperl-l] Clustalw
In-Reply-To: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
References: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9BAF@EX02.asurite.ad.asu.edu>

Do you have ClustalW installed and in your path? 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Offord, Victoria
> Sent: Monday, June 22, 2009 9:11 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Clustalw
> 
> Hi,
> 
>  
> 
> Can anyone help and tell me where I am going wrong please J 
> 
> I am getting this error from the following script:
> 
>  
> 
>  
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
> -output=gcg   -matrix=BLOSUM -ktuple=2
> -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
> file or directory
> 
> STACK: Error::throw
> 
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::_run
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::align
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515
> 
> STACK: tester.pl:25
> 
> -----------------------------------------------------------
> 
>  
> 
>  
> 
>  
> 
>  
> 
> #--------------------------------------------SCRIPT-----------
> ----------
> --------------------------#
> 
> #!/usr/bin/perl -w
> 
> use Bio::Tools::Run::Alignment::Clustalw;
> 
> $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';
> 
> use Bio::Seq;
> 
>  
> 
>  my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
> 
>  my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
> 
>  
> 
> my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";
> 
> my $b =
> "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";
> 
> my $seq1 = Bio::Seq->new ( -seq  => $a,
> 
>                            -id   => 'real',
> 
>                            -desc => 'this is a real Seq');
> 
>  my $seq2 = Bio::Seq->new ( -seq  => $b,
> 
>                            -id   => 'test',
> 
>                            -desc => 'this is a test Seq');
> 
> 
>                            
> 
> my @seq_array = ($seq1,$seq2);
> 
>  
> 
> my $seq_array_ref = \@seq_array;
> 
> my $aln = $factory->align($seq_array_ref);
> 
>  
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jun 22 15:20:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 14:20:14 -0500
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
	<6DF025D32D664F61BC64B49184A2E6DD@NewLife>
Message-ID: <4766E259-B184-4552-817E-FBBB3A71A17F@illinois.edu>

On Jun 17, 2009, at 11:47 AM, Mark A. Jensen wrote:

> Hi All,
> I thought I'd revisit this thread, since in the last couple weeks,
> have used both techniques (bioperl-dev and branch from trunk) to
> produce completed projects. My thoughts:
>
> Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
> new addition to the core api. There was no pressure to conform to the
> existing api there. In particular, there was no implicit insistence to
> make things work through Bio::Search::Utils, and I was free to factor
> it out. The Tiling api was definitely unstable until the end, when it
> was ported to the core. As I made regular reports to bioperl-l,
> everything was transparent and up front, and I received excellent
> suggestions there (as usual).
> For Bio::Restriction, using the branch was just as natural. Here, the
> existing structure was well established, and all the work needed to
> happen beneath the api. All old t/Restriction tests needed to pass,
> and additional ones created for the new functionality. So here, using
> bioperl-dev wasn't natural, even though some "experiments" needed to
> be tried (some succeeded and some failed, as you can see in the
> commentary at Bug #2855). Even though the new code turned out to
> require substantial effort, the effort was required to fix a true bug
> in the working core, and any fixes needed to work transparently with
> respect to the users for whom this bug had not been an issue. Using
> the branch made it relatively easy to merge quickly back into the core
> when done, and there is a certain psychological pressure too provided
> by an open branch which is helpful.
>
> Hilmar raised the very good point in the previous discussion that
> (essentially) bioperl-dev shouldn't become a sandbox with lots of
> unfinished code scraps and derelict stuff that doesn't work. My view
> is bioperl-dev will become a sandbox only if we treat it like
> one. I've filled out the Bioperl-dev page on the wiki
> (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
> some recognition to devs there whose modules become part of the
> core may be a better way to insure that projects that are started on
> bioperl-dev actually get finished, than to prescribe beforehand what
> kinds of projects may get started. I believe this follows the adage of
> liberality on what is accepted, and strictness on what is emitted.
>
> cheers, MAJ

The main reason I wanted a bioperl-dev is for some code or  
implementations that don't seem to fit on a branch or directly into  
core, but would definitely be of use.  The tendency in the past has  
been to accept anything that works into core (the 'bazaar' approach).   
Initially that worked well, but the long-term end result has become  
potentially unmaintainable code bloat.  Committing new code to a  
branch isn't a great idea either, primarily b/c the code may be lost  
to the branch if it isn't followed up and remerged into trunk.  And  
forcing the code to fit into bioperl (or vice versa, which happened  
re: Feature Annotation) isn't the best way either.

Like Hilmar, though, I don't want dev to become a (sandbox|code  
dumping ground) either, so I think some additional discussion is  
warranted if anyone else wants to chime in.

chris


From mauricio at open-bio.org  Mon Jun 22 15:56:33 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Mon, 22 Jun 2009 14:56:33 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <A53006055C854297AAA58F6650F4F867@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
Message-ID: <4A3FE1F1.40607@open-bio.org>

Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 
release and latest code from bioperl-live. Also added bioperl-dev and 
bioperl-pise to the list.

Cheers,
Mauricio.


Mark A. Jensen wrote:
> cheers Mauricio! MAJ
> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
> <mauricio at open-bio.org>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
> <bioperl-l at bioperl.org>
> Sent: Thursday, June 11, 2009 12:46 PM
> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
> 
> 
>> Hi Mark,
>>
>> I'll take a look into this sometime between today and tomorrow. Will 
>> keep you posted. Thanks for the heads up :)
>>
>> Mauricio.
>>
>>
>> Mark A. Jensen wrote:
>>> Hi Chris and list-
>>> Will documentation for release 1.6 be available in pdoc on 
>>> doc.bioperl.org?
>>> I notice also that autogenerated documentation for bioperl-live 
>>> doesn't contain
>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>> cheers, Mark
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> 
> 


From cjfields at illinois.edu  Mon Jun 22 16:29:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:29:46 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
Message-ID: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>

On Jun 22, 2009, at 9:24 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
>> Peter wrote:
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional  
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the  
>>> BioPerl
>>> list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if that's
>> currently the case.  I thought that was fixed but maybe not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -  
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description (as
>>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>>> for examples, e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>
> Another couple of points that I should have remembered earlier,
> related to converting between PHRED scores and Solexa scores.
> On the bright side, with Illumina abandoning the Solexa scores
> in pipeline 1.3+, these issues will go away with time:
>
> (7) If BioPerl will be converting Solexa scores to/from PHRED
> scores as integers automatically (as discussed earlier), make
> sure you round to the nearest whole number (don't just truncate
> with a call to int!). MAQ does this by adding 0.5 before calling
> int (while in Biopython I just use Python's round function).

That can probably be done with sprintf if needed.  It avoids a call to  
POSIX functions.

> (8) When asked to write out an old Solexa style FASTQ file,
> what will you do if given a standard Sanger FASTQ file (or a
> new Illumina 1.3+ FASTQ file) containing a base with PHRED
> quality zero? This maps to a Solexa quality of minus infinity...
> Right now the development version of Biopython will throw an
> error in this situation, but mapping to the lowest observed
> Solexa score might be reasonable.
>
> Peter

Maybe address with a warning followed by assigning to the lowest  
solexa score?

chris


From cjfields at illinois.edu  Mon Jun 22 16:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:27:32 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <D9414186-E1DD-47B5-A0CF-9B96CD8151F8@illinois.edu>

np.  Thanks Mauricio!

chris

On Jun 22, 2009, at 2:56 PM, Mauricio Herrera Cuadra wrote:

> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0  
> release and latest code from bioperl-live. Also added bioperl-dev  
> and bioperl-pise to the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org 
>> >
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" <bioperl-l at bioperl.org 
>> >
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow.  
>>> Will keep you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on  
>>>> doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live  
>>>> doesn't contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun 22 22:46:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 22:46:58 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <78130116A84C4D989F3BCC217E8C5ACE@NewLife>

Done-- fortinbras-public/bioperl-max-0.1.1 is at ami-b55dbbdc; rakudo cloned at 
00:44 UTC,
parrot @ r39729, bioperl-live @ 15800, nexml @ r1136.
cheers!
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  do you 
> have mysql or pg?
>
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  rakudo 
> and we could do some damage...
>
> chris
>
> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
>
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 22 23:22:48 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 23:22:48 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife><4A3134EB.4080702@open-bio.org><A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <8B93DCE168434F608620AF17CAF12A9F@NewLife>

awesome, MHC- cheers and thanks-MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Monday, June 22, 2009 3:56 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release 
> and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to 
> the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
>> <mauricio at open-bio.org>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
>> <bioperl-l at bioperl.org>
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>
>>
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow. Will keep 
>>> you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live doesn't 
>>>> contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From pmr at ebi.ac.uk  Tue Jun 23 07:00:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 12:00:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
Message-ID: <4A40B5D6.40504@ebi.ac.uk>

We just added FASTQ parsing to EMBOSS and faced the same issues.

Parsing was easy - find the '@' line, read sequence until the '+' line
is reached, then read (seqlen) quality characters ... and check the next
line starts with '@'

Quality scores are kept as phred values. Phred of 0 means unknown, which
in Solexa is -5 (0.75 error rate = could be anything). We assume lower
quality scores are from alignments rather than single reads.

We gave up on trying to guess the quality score standard and require
users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
format files. If we only want the sequence then we don't care so we allow
"fastq" as a sequence format and ignore the quality scores in that case.

We also allow the integer quality score format ... is anyone still using
that (it looks horrible to me :-)

Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.

Any further tips would be very useful.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Tue Jun 23 07:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 12:29:56 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40B5D6.40504@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
Message-ID: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>

On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> We just added FASTQ parsing to EMBOSS and faced the same issues.
>

I was going to chat to you about this at BOSC, and suggest this be
added to EMBOSS - but you are well ahead of me ;)

> Parsing was easy - find the '@' line, read sequence until the '+' line
> is reached, then read (seqlen) quality characters ... and check the next
> line starts with '@'

That is basically what I did for Biopython.

> Quality scores are kept as phred values. Phred of 0 means unknown,
> which in Solexa is -5 (0.75 error rate = could be anything).

A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
quite follow your leap that this corresponds to a Solexa quality of -5. Could
you clarify?

> We assume lower quality scores are from alignments rather than single reads.

Did you mean to say "higher quality scores" (i.e. lower probability of error),
e.g a PHRED score of 80 which you can get from MAQ doing read mapping
or something consensus based.

> We gave up on trying to guess the quality score standard and require
> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
> format files. If we only want the sequence then we don't care so we allow
> "fastq" as a sequence format and ignore the quality scores in that case.

What format names have you used? Ideally we'd have the same names
in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
"fastq-illumina").

> We also allow the integer quality score format ... is anyone still using
> that (it looks horrible to me :-)

Do you mean the QUAL file format holding PHRED scores? Roche provide
tools to turn their SFF files into FASTA and QUAL files, so they are still used.

> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.
>
> Any further tips would be very useful.

Great. See you at BOSC 2009!

Peter
(Biopython)


From pmr at ebi.ac.uk  Tue Jun 23 08:22:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 13:22:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <4A40C909.40803@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>
> 
> I was going to chat to you about this at BOSC, and suggest this be
> added to EMBOSS - but you are well ahead of me ;)

Not that well ahead really ... someone asked for it in our BoF at
BOSC/ISMB last year so we thought we'd better get it done before this
one. it was implemented a couple of days ago :-)

>> Parsing was easy - find the '@' line, read sequence until the '+' line
>> is reached, then read (seqlen) quality characters ... and check the next
>> line starts with '@'
> 
> That is basically what I did for Biopython.
> 
>> Quality scores are kept as phred values. Phred of 0 means unknown,
>> which in Solexa is -5 (0.75 error rate = could be anything).
> 
> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
> quite follow your leap that this corresponds to a Solexa quality of -5. Could
> you clarify?

Phred score is -10 log(p) where p is the probability of error. A phred
of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
(3/4 chance that any base you pick is wrong).

Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
why Solexa scores can go down to -5 in their fastq format.

>> We assume lower quality scores are from alignments rather than single reads.
> 
> Did you mean to say "higher quality scores" (i.e. lower probability of error),
> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
> or something consensus based.

Actually I mean both. Error probabilities below 0.75 for a single base
are silly, and error probabilities below 0.0001 make sense only when two
or more high quality bases are aligned.

>> We gave up on trying to guess the quality score standard and require
>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>> format files. If we only want the sequence then we don't care so we allow
>> "fastq" as a sequence format and ignore the quality scores in that case.
> 
> What format names have you used? Ideally we'd have the same names
> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
> "fastq-illumina").

We don't normally use '-' in our format names so we have fastqsanger,
fastqsolexa, fastqillumina and fastqint. None of these have been tried
on users as yet.

The '-' names look nice though. We can consider introducing them. Do you
have a full list of format names (sequence, feature, alignment, etc.) we
can try to conform to?

>> We also allow the integer quality score format ... is anyone still using
>> that (it looks horrible to me :-)
> 
> Do you mean the QUAL file format holding PHRED scores? Roche provide
> tools to turn their SFF files into FASTA and QUAL files, so they are still used.

Probably ... unless there is a Solexa version too.

regards,

Peter


From rmb32 at cornell.edu  Tue Jun 23 10:28:08 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 23 Jun 2009 07:28:08 -0700
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
	<FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
Message-ID: <4A40E678.8010709@cornell.edu>

Yep, YAPC is great!  This is my first one.  I saw a guy walking around 
here with a nametag that I thought said "Mark Jensen".  MAJ, are you here?

Rob

Chris Fields wrote:
> I think some of the regular #bioperl folk are there (Jay Hannah, R. 
> Buels, etc).  May be worth going on IRC to find everyone.
> 
> I'm giving serious thought to going next year if I can get enough work 
> done towards a perl6 or Moose-based bioperl.
> 
> chris
> 
> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
> 
>>
>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>
>> g.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From maj at fortinbras.us  Tue Jun 23 11:54:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 23 Jun 2009 11:54:24 -0400
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <4A40E678.8010709@cornell.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com><FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
	<4A40E678.8010709@cornell.edu>
Message-ID: <DD5C6FE6AC5842CEAA4487EEC65AC726@NewLife>

I think there are about 75000 of us; that one ain't me, I'm afraid. Maybe next 
year! cheers  MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "bioperl-l List" <bioperl-l at bioperl.org>
Sent: Tuesday, June 23, 2009 10:28 AM
Subject: Re: [Bioperl-l] Anyone at YAPC?


> Yep, YAPC is great!  This is my first one.  I saw a guy walking around here 
> with a nametag that I thought said "Mark Jensen".  MAJ, are you here?
>
> Rob
>
> Chris Fields wrote:
>> I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, 
>> etc).  May be worth going on IRC to find everyone.
>>
>> I'm giving serious thought to going next year if I can get enough work done 
>> towards a perl6 or Moose-based bioperl.
>>
>> chris
>>
>> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
>>
>>>
>>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>>
>>> g.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Tue Jun 23 16:34:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 15:34:48 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <21116F70-93A3-4539-9BE2-61C838BA730E@illinois.edu>


On Jun 23, 2009, at 7:22 AM, Peter Rice wrote:

> Peter wrote:
> ...
>>> Parsing was easy - find the '@' line, read sequence until the '+'  
>>> line
>>> is reached, then read (seqlen) quality characters ... and check  
>>> the next
>>> line starts with '@'
>>
>> That is basically what I did for Biopython.

This is now what bioperl will do (at least when I commit changes today  
or tomorrow).

> ...
>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so  
>>> we allow
>>> "fastq" as a sequence format and ignore the quality scores in that  
>>> case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do  
> you
> have a full list of format names (sequence, feature, alignment,  
> etc.) we
> can try to conform to?

We (bioperl) are using biopython's convention of format-variant, or at  
least that's how I'm coding it up.  With SeqIO it's fairly easy to  
check for the format variant prior to loading the class and pass it in  
as a second named parameter.

I have actually thought of adding in fastqint as an option (it would  
be fairly easy to do).

chris


From cjfields at illinois.edu  Tue Jun 23 17:04:25 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 16:04:25 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <49A4AD93-69FB-406E-8FFB-99C74A457402@illinois.edu>

Just so we're on the same page data-wise, would there be a common set  
of fastq data files to use for tests?  I am using some from SRA (which  
is all converted to Sanger).  Just need a few small ones for older  
solexa and newer illumina.

chris

On Jun 23, 2009, at 6:29 AM, Peter wrote:

> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July  
>> 15th.
>>
>> Any further tips would be very useful.
>
> Great. See you at BOSC 2009!
>
> Peter
> (Biopython)


From biopython at maubp.freeserve.co.uk  Tue Jun 23 17:39:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 22:39:48 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>

On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter wrote:
>> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>>
>>
>> I was going to chat to you about this at BOSC, and suggest this be
>> added to EMBOSS - but you are well ahead of me ;)
>
> Not that well ahead really ... someone asked for it in our BoF at
> BOSC/ISMB last year so we thought we'd better get it done before this
> one. it was implemented a couple of days ago :-)
>

Well, ahead of my asking!

>>> Quality scores are kept as phred values. Phred of 0 means unknown,
>>> which in Solexa is -5 (0.75 error rate = could be anything).
>>
>> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
>> quite follow your leap that this corresponds to a Solexa quality of -5. Could
>> you clarify?
>
> Phred score is -10 log(p) where p is the probability of error. A phred
> of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
> (3/4 chance that any base you pick is wrong).
>
> Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
> why Solexa scores can go down to -5 in their fastq format.
>
>>> We assume lower quality scores are from alignments rather than
>>> single reads.
>>
>> Did you mean to say "higher quality scores" (i.e. lower probability of error),
>> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
>> or something consensus based.
>
> Actually I mean both. Error probabilities below 0.75 for a single base
> are silly, and error probabilities below 0.0001 make sense only when two
> or more high quality bases are aligned.

I see what you mean - a probability of error of 0.75 matches that
for a random base call, obvious when you put it like that. Of course,
there is this nasty little thought at the back of my mind that sooner
or later someone will use FASTQ files for proteins (e.g. from some
mass-spec protein sequencing).

A probability less than that (e.g. 0) is actually worse than random and
could be considered as mean "we're pretty sure this isn't the stated
letter". But that would be silly, as you say.

>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so we allow
>>> "fastq" as a sequence format and ignore the quality scores in that case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do you
> have a full list of format names (sequence, feature, alignment, etc.) we
> can try to conform to?

See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Getting EMBOSS to conforming should be trivial - in general when
picking a format name for Biopython's SeqIO or AlignIO (and we
have avoided multiple aliases with one exception) we have tried to
use anything shared by BioPerl and EMBOSS. The FASTQ variants
are unusual in that Biopython got to invent some names.

In future where would be a good place to discuss these kinds of
cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

>>> We also allow the integer quality score format ... is anyone still
>>> using that (it looks horrible to me :-)
>>
>> Do you mean the QUAL file format holding PHRED scores?
>> Roche provide tools to turn their SFF files into FASTA and
>> QUAL files, so they are still used.
>
> Probably ... unless there is a Solexa version too.

We may be talking at cross purposes here, this is QUAL format:
http://www.bioperl.org/wiki/Qual_sequence_format

Peter


From pmr at ebi.ac.uk  Wed Jun 24 07:48:23 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 12:48:23 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>	
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>	
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
Message-ID: <4A421287.4000203@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> The '-' names look nice though. We can consider introducing them. Do you
>> have a full list of format names (sequence, feature, alignment, etc.) we
>> can try to conform to?
> 
> See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Thanks. I'll take a look at those.

> Getting EMBOSS to conforming should be trivial - in general when
> picking a format name for Biopython's SeqIO or AlignIO (and we
> have avoided multiple aliases with one exception) we have tried to
> use anything shared by BioPerl and EMBOSS. The FASTQ variants
> are unusual in that Biopython got to invent some names.
> 
> In future where would be a good place to discuss these kinds of
> cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

I was planning to suggest a get-together at BOSC in Stockholm so we can
identify common cross-platform issues. I'm sure there are many ways we
can conform with naming and interfaces and perhaps even share code.

>>>> We also allow the integer quality score format ... is anyone still
>>>> using that (it looks horrible to me :-)
>>> Do you mean the QUAL file format holding PHRED scores?
>>> Roche provide tools to turn their SFF files into FASTA and
>>> QUAL files, so they are still used.
>> Probably ... unless there is a Solexa version too.
> 
> We may be talking at cross purposes here, this is QUAL format:
> http://www.bioperl.org/wiki/Qual_sequence_format

Yes that is different. We'll worry about separate QUAL files later (we
already find separate GFF files a pain for features) and still with the
"fastqint" format name.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Wed Jun 24 10:56:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 15:56:13 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A421287.4000203@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
Message-ID: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>

On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> I was planning to suggest a get-together at BOSC in Stockholm so we can
> identify common cross-platform issues. I'm sure there are many ways we
> can conform with naming and interfaces and perhaps even share code.
>

That would be a good idea - but while there are quite a few Biopython
people at BOSC this year, I don't know if there will be many from BioPerl
(there isn't a BioPerl update talk scheduled).

>>>>> We also allow the integer quality score format ... is anyone still
>>>>> using that (it looks horrible to me :-)
>>>> Do you mean the QUAL file format holding PHRED scores?
>>>> Roche provide tools to turn their SFF files into FASTA and
>>>> QUAL files, so they are still used.
>>> Probably ... unless there is a Solexa version too.
>>
>> We may be talking at cross purposes here, this is QUAL format:
>> http://www.bioperl.org/wiki/Qual_sequence_format
>
> Yes that is different. We'll worry about separate QUAL files later (we
> already find separate GFF files a pain for features) and still with the
> "fastqint" format name.

So when you say "fastqint" are you talking about something else?
Could you show us an example record in this format?

Peter
[I need to remember to proof read my evening emails more carefully]


From vecchi.b at gmail.com  Wed Jun 24 12:13:02 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:13:02 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
Message-ID: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>

Jay asked me to forward this to the list, since he sometimes has problems
getting his mails delivered.
Feel free to suggest topics for the bioperl hackathon to take place tomorrow
and on friday!

Bruno.


From: Jay Hannah <jay at jays.net>
Date: June 24, 2009 11:55:42 AM EDT
To: Bioperl <bioperl-l at bioperl.org>
Subject: Hackathon tomorrow (I think)

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

  http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla.

Come yell at me (us?) in IRC:

  http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at illinois.edu  Wed Jun 24 12:22:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:22:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
Message-ID: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>


On Jun 24, 2009, at 9:56 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> I was planning to suggest a get-together at BOSC in Stockholm so we  
>> can
>> identify common cross-platform issues. I'm sure there are many ways  
>> we
>> can conform with naming and interfaces and perhaps even share code.
>>
>
> That would be a good idea - but while there are quite a few Biopython
> people at BOSC this year, I don't know if there will be many from  
> BioPerl
> (there isn't a BioPerl update talk scheduled).

Most of us are caught up with other work, though I will likely be able  
to dedicate more time to it in the ext few months.

Also doesn't help that my travel stipend doesn't start until Aug. 1.

>>>>>> We also allow the integer quality score format ... is anyone  
>>>>>> still
>>>>>> using that (it looks horrible to me :-)
>>>>> Do you mean the QUAL file format holding PHRED scores?
>>>>> Roche provide tools to turn their SFF files into FASTA and
>>>>> QUAL files, so they are still used.
>>>> Probably ... unless there is a Solexa version too.
>>>
>>> We may be talking at cross purposes here, this is QUAL format:
>>> http://www.bioperl.org/wiki/Qual_sequence_format
>>
>> Yes that is different. We'll worry about separate QUAL files later  
>> (we
>> already find separate GFF files a pain for features) and still with  
>> the
>> "fastqint" format name.
>
> So when you say "fastqint" are you talking about something else?
> Could you show us an example record in this format?
>
> Peter
> [I need to remember to proof read my evening emails more carefully]

The same as fastq, except the ASCII quality is converted to actual  
score:

@4_1_912_360
AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
+4_1_912_360
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40  
40 40 40 40 40 40 26 40 40 14 39 40 40
@4_1_54_483
TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
+4_1_54_483
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40  
28 40 40 40 40 40 40 16 40 40 5 40 40
chris


From cjfields at illinois.edu  Wed Jun 24 12:26:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:26:22 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
Message-ID: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>

1) Any help towards bugzilla fixes would be most welcome.
2) Better GFF3 integration
3) Typed but lightweight seqfeatures
4) Bio::Moose?

I can dedicate more time to the latter two in about a month, but I'll  
be tied up until then.  Let me know if anyone needs collab on biomoose  
on github; Mark Jensen's already added.

chris

On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:

> Jay asked me to forward this to the list, since he sometimes has  
> problems
> getting his mails delivered.
> Feel free to suggest topics for the bioperl hackathon to take place  
> tomorrow
> and on friday!
>
> Bruno.
>
>
> From: Jay Hannah <jay at jays.net>
> Date: June 24, 2009 11:55:42 AM EDT
> To: Bioperl <bioperl-l at bioperl.org>
> Subject: Hackathon tomorrow (I think)
>
> Hola,
>
> So a few of us here at YAPC might try to be productive tomorrow (and
> Friday?).
>
> I don't know if we have any commit bits attending.
>
> Feel free to suggest things:
>
>  http://yapc10.org/yn2009/wiki?node=BioPerl
>
> Or point me to list(s) of things. Perhaps we'll try to help out in  
> Bugzilla.
>
> Come yell at me (us?) in IRC:
>
>  http://www.bioperl.org/wiki/Irc
>
> Thanks,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 24 12:27:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 17:27:39 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
Message-ID: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>

On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu> wrote:
>> So when you say "fastqint" are you talking about something else?
>> Could you show us an example record in this format?
>>
>> Peter
>
> The same as fastq, except the ASCII quality is converted to actual score:
>
> @4_1_912_360
> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
> +4_1_912_360
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40
> 40 40 40 40 26 40 40 14 39 40 40
> @4_1_54_483
> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
> +4_1_54_483
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40
> 40 40 40 40 40 16 40 40 5 40 40

OK - and who uses this "Integer FASTQ" files?

Peter


From vecchi.b at gmail.com  Wed Jun 24 12:40:50 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:40:50 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
Message-ID: <1a0c1b750906240940t7c0003f9hf10eb30c0d85a5ce@mail.gmail.com>

>
> Is there a todo list for biomoose? I'd be glad to hack in, but I'm afraid
> to step into someone else's work or to do things without general agreement.
> It would be nice to have directions for small sized chunks of work to do.
> In any case, count me in!
>
> 2009/6/24 Chris Fields <cjfields at illinois.edu>
>
> 1) Any help towards bugzilla fixes would be most welcome.
>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>> 4) Bio::Moose?
>>
>> I can dedicate more time to the latter two in about a month, but I'll be
>> tied up until then.  Let me know if anyone needs collab on biomoose on
>> github; Mark Jensen's already added.
>>
>> chris
>>
>>
>> On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:
>>
>>  Jay asked me to forward this to the list, since he sometimes has problems
>>> getting his mails delivered.
>>> Feel free to suggest topics for the bioperl hackathon to take place
>>> tomorrow
>>> and on friday!
>>>
>>> Bruno.
>>>
>>>
>>> From: Jay Hannah <jay at jays.net>
>>> Date: June 24, 2009 11:55:42 AM EDT
>>> To: Bioperl <bioperl-l at bioperl.org>
>>> Subject: Hackathon tomorrow (I think)
>>>
>>> Hola,
>>>
>>> So a few of us here at YAPC might try to be productive tomorrow (and
>>> Friday?).
>>>
>>> I don't know if we have any commit bits attending.
>>>
>>> Feel free to suggest things:
>>>
>>>  http://yapc10.org/yn2009/wiki?node=BioPerl
>>>
>>> Or point me to list(s) of things. Perhaps we'll try to help out in
>>> Bugzilla.
>>>
>>> Come yell at me (us?) in IRC:
>>>
>>>  http://www.bioperl.org/wiki/Irc
>>>
>>> Thanks,
>>>
>>> Jay Hannah
>>> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From jay at jays.net  Wed Jun 24 12:44:51 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 12:44:51 -0400
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
Message-ID: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>

On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> Let me know if anyone needs collab on biomoose on github; Mark  
> Jensen's already added.

Anything on github should be trivial, even with no perms -- we can  
just fork and then send you (whoever) pull requests. github++  :)

> 1) Any help towards bugzilla fixes would be most welcome.

I don't know how to make any progress in bugzilla if no one has a  
commit bit...?

> 2) Better GFF3 integration
> 3) Typed but lightweight seqfeatures

Are there bugzilla tickets (or somewhere) describing those?

I wonder if anyone can help me get out of sporadic MailMan purgatory...

Thanks,

j


From cjfields at illinois.edu  Wed Jun 24 12:54:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:54:06 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
Message-ID: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>


On Jun 24, 2009, at 11:27 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>> So when you say "fastqint" are you talking about something else?
>>> Could you show us an example record in this format?
>>>
>>> Peter
>>
>> The same as fastq, except the ASCII quality is converted to actual  
>> score:
>>
>> @4_1_912_360
>> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
>> +4_1_912_360
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40  
>> 40 40 40
>> 40 40 40 40 26 40 40 14 39 40 40
>> @4_1_54_483
>> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
>> +4_1_54_483
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40  
>> 40 28 40
>> 40 40 40 40 40 16 40 40 5 40 40
>
> OK - and who uses this "Integer FASTQ" files?
>
> Peter

Not sure, but it is covered by MAQ via the conversion script (as FASTQ- 
int):

http://maq.sourceforge.net/fq_all2std.pl

chris


From jay at jays.net  Wed Jun 24 11:55:42 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 11:55:42 -0400
Subject: [Bioperl-l] Hackathon tomorrow (I think)
Message-ID: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and  
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

    http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in  
Bugzilla.

Come yell at me (us?) in IRC:

    http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From bernd.web at gmail.com  Wed Jun 24 13:11:51 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 24 Jun 2009 19:11:51 +0200
Subject: [Bioperl-l] Bioperl_scripts
Message-ID: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>

Hi,

The bioperl scripts section at
http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
examples.
However, it quite a number of scripts cannot be found anymore and return errors:

For example for the first link (scripts/install_bioperl_scripts.pl)
Filesystem has no item: File not found: revision 15800, path
'/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
/usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245

Also all scripts in the Bio::Graphics section cannot be found.
Is the http://www.bioperl.org/wiki/Bioperl_scripts page still supported?


Regards,
Bernd


From cjfields at illinois.edu  Wed Jun 24 16:57:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 15:57:51 -0500
Subject: [Bioperl-l] Bioperl_scripts
In-Reply-To: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
References: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
Message-ID: <5AF99205-F977-45A1-B4AF-C3858A5727FD@illinois.edu>


On Jun 24, 2009, at 12:11 PM, Bernd Web wrote:

> Hi,
>
> The bioperl scripts section at
> http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
> examples.
> However, it quite a number of scripts cannot be found anymore and  
> return errors:
>
> For example for the first link (scripts/install_bioperl_scripts.pl)
> Filesystem has no item: File not found: revision 15800, path
> '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
> /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245
>
> Also all scripts in the Bio::Graphics section cannot be found.
> Is the http://www.bioperl.org/wiki/Bioperl_scripts page still  
> supported?
>
> Regards,
> Bernd

Re: Bio::Graphics, all modules and related scripts have been moved to  
a separate repo and CPAN release (latest):

http://search.cpan.org/~lds/Bio-Graphics-1.96/

Beyond that I would consider all scripts and the wiki page supported.   
It's best to file this to bugzilla as a documentation issue so we fix  
it and don't about forget it amongst the flurry of email.

chris


From cjfields at illinois.edu  Wed Jun 24 17:10:34 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 16:10:34 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
Message-ID: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>


On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:

> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>> Let me know if anyone needs collab on biomoose on github; Mark  
>> Jensen's already added.
>
> Anything on github should be trivial, even with no perms -- we can  
> just fork and then send you (whoever) pull requests. github++  :)
>
>> 1) Any help towards bugzilla fixes would be most welcome.
>
> I don't know how to make any progress in bugzilla if no one has a  
> commit bit...?

For some reason I thought you had a commit bit; we can add you in if  
needed.  Anyway, patches are most definitely welcome ;>

>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>
> Are there bugzilla tickets (or somewhere) describing those?

No as the issues are more complex than one single bug, but we do have  
something to help track for the time being:

http://www.bioperl.org/wiki/GFF_Refactor
http://www.bioperl.org/wiki/Align_Refactor

I'll probably file TODOs during the process for those refactors.  The  
easiest to tackle would be probably be Align/LocatableSeq refactors.

> I wonder if anyone can help me get out of sporadic MailMan  
> purgatory...
>
> Thanks,
>
> j

-c

PS - Don't feel constrained by the above.  There are many many areas  
to contribute to.


From pmr at ebi.ac.uk  Wed Jun 24 18:44:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 23:44:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
Message-ID: <4A42AC51.3090809@ebi.ac.uk>

Chris Fields wrote:
> Not sure, but it is covered by MAQ via the conversion script (as 
> FASTQ-int):

Are the scores phred or Solexa?

Peter Rice


From adlai at refenestration.com  Wed Jun 24 22:08:31 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 04:08:31 +0200
Subject: [Bioperl-l] Extreme newbie question.
Message-ID: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>

I have been trying to install BioPerl for a while now and after  
pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
Fink installation, a >cpan installation and removing my .cpan folder I  
am still at square 0. I do not want to do anymore damage to my  
computer, yet I really need a working install (especially to interface  
with remote DBs like GenBank. Can anyone give me some advice here?  
After each attempt, I have tried to run perldoc bptutorial.pl and  
tried test scripts with "use Bio::Perl" in the headers and I just  
receive  error mesages like the following:

Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level /Library/ 
Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/ 
Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
Library/Perl/5.8.1 .) at trsh.pl line 1.

I have been working from the OReilly book astering Perl for  
Bioinformatics and the INSTALL file and have scoured around the  
BioPerl website and am still stuck.

Thanks in advance,

Adlai


From kpclancy at hotmail.com  Wed Jun 24 22:31:17 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Wed, 24 Jun 2009 20:31:17 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net> 
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
Message-ID: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>


is there an intention to have a hackathon at ISMB this weekend - I know there is a 2 day BOSC 
kevin

> From: cjfields at illinois.edu
> To: jay at jays.net
> Date: Wed, 24 Jun 2009 16:10:34 -0500
> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> 
> 
> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> 
> > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >> Let me know if anyone needs collab on biomoose on github; Mark  
> >> Jensen's already added.
> >
> > Anything on github should be trivial, even with no perms -- we can  
> > just fork and then send you (whoever) pull requests. github++  :)
> >
> >> 1) Any help towards bugzilla fixes would be most welcome.
> >
> > I don't know how to make any progress in bugzilla if no one has a  
> > commit bit...?
> 
> For some reason I thought you had a commit bit; we can add you in if  
> needed.  Anyway, patches are most definitely welcome ;>
> 
> >> 2) Better GFF3 integration
> >> 3) Typed but lightweight seqfeatures
> >
> > Are there bugzilla tickets (or somewhere) describing those?
> 
> No as the issues are more complex than one single bug, but we do have  
> something to help track for the time being:
> 
> http://www.bioperl.org/wiki/GFF_Refactor
> http://www.bioperl.org/wiki/Align_Refactor
> 
> I'll probably file TODOs during the process for those refactors.  The  
> easiest to tackle would be probably be Align/LocatableSeq refactors.
> 
> > I wonder if anyone can help me get out of sporadic MailMan  
> > purgatory...
> >
> > Thanks,
> >
> > j
> 
> -c
> 
> PS - Don't feel constrained by the above.  There are many many areas  
> to contribute to.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 24 23:54:28 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 22:54:28 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
Message-ID: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>

I have no idea; I don't think there are many bioperl devs attending  
this year unfortunately.  Any meetings in the next year where we could  
set up a bioperl hackathon?  I will likely be available to attend if  
it's stateside...

chris

On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:

>
> is there an intention to have a hackathon at ISMB this weekend - I  
> know there is a 2 day BOSC
> kevin
>
>> From: cjfields at illinois.edu
>> To: jay at jays.net
>> Date: Wed, 24 Jun 2009 16:10:34 -0500
>> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
>>
>>
>> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
>>
>>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>>>> Let me know if anyone needs collab on biomoose on github; Mark
>>>> Jensen's already added.
>>>
>>> Anything on github should be trivial, even with no perms -- we can
>>> just fork and then send you (whoever) pull requests. github++  :)
>>>
>>>> 1) Any help towards bugzilla fixes would be most welcome.
>>>
>>> I don't know how to make any progress in bugzilla if no one has a
>>> commit bit...?
>>
>> For some reason I thought you had a commit bit; we can add you in if
>> needed.  Anyway, patches are most definitely welcome ;>
>>
>>>> 2) Better GFF3 integration
>>>> 3) Typed but lightweight seqfeatures
>>>
>>> Are there bugzilla tickets (or somewhere) describing those?
>>
>> No as the issues are more complex than one single bug, but we do have
>> something to help track for the time being:
>>
>> http://www.bioperl.org/wiki/GFF_Refactor
>> http://www.bioperl.org/wiki/Align_Refactor
>>
>> I'll probably file TODOs during the process for those refactors.  The
>> easiest to tackle would be probably be Align/LocatableSeq refactors.
>>
>>> I wonder if anyone can help me get out of sporadic MailMan
>>> purgatory...
>>>
>>> Thanks,
>>>
>>> j
>>
>> -c
>>
>> PS - Don't feel constrained by the above.  There are many many areas
>> to contribute to.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jun 25 10:00:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 09:00:47 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <CB4314ED-4076-42AD-96CC-64CB429929D5@illinois.edu>


On Jun 24, 2009, at 5:44 PM, Peter Rice wrote:

> Chris Fields wrote:
>> Not sure, but it is covered by MAQ via the conversion script (as  
>> FASTQ-int):
>
> Are the scores phred or Solexa?
>
> Peter Rice

Not sure actually.  The perl script I linked to looks like it converts  
using the same scale as solexa (illumina 1.0).

chris


From chmille4 at gmail.com  Thu Jun 25 10:46:26 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 10:46:26 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
Message-ID: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>

Hi all,

Quick question I came across while writing the Bio::Nexml module.

I'm trying to link taxon data to a Bio::LocatableSeq object inside a
Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
SeqFeatures, but according to this HowTo (
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
considered to refer to a portion of a sequence, whereas something like taxon
data would refer to the entire sequence and should be handled as an
annotation. However, as far as I can tell Bio::LocatableSeq does not support
annotation objects.
What would be the best way to relate taxon data to a single sequence inside
an alignment?


Thanks,
Chase


From Kevin.M.Brown at asu.edu  Thu Jun 25 11:21:02 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 08:21:02 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink

That error suggests that the install fails and you need to figure out
why from the install error messages. I suspect you aren't doing the
install as root, but as a normal user who lacks the needed permissions
to change files in certain directories. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Adlai Burman
> Sent: Wednesday, June 24, 2009 7:09 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Extreme newbie question.
> 
> I have been trying to install BioPerl for a while now and after  
> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
> Fink installation, a >cpan installation and removing my .cpan 
> folder I  
> am still at square 0. I do not want to do anymore damage to my  
> computer, yet I really need a working install (especially to 
> interface  
> with remote DBs like GenBank. Can anyone give me some advice here?  
> After each attempt, I have tried to run perldoc bptutorial.pl and  
> tried test scripts with "use Bio::Perl" in the headers and I just  
> receive  error mesages like the following:
> 
> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level 
> /Library/ 
> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl 
> /Network/Library/ 
> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
> Library/Perl/5.8.1 .) at trsh.pl line 1.
> 
> I have been working from the OReilly book astering Perl for  
> Bioinformatics and the INSTALL file and have scoured around the  
> BioPerl website and am still stuck.
> 
> Thanks in advance,
> 
> Adlai
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From David.Messina at sbc.su.se  Thu Jun 25 12:39:22 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 25 Jun 2009 18:39:22 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <628aabb70906250939l7d1116d0sec9efa2c16235c75@mail.gmail.com>

Hi Adlai,
Did the Bioperl tests run successfully? Did you get the impression that the
installation was successful?

If not, what are the errors you see during the install process?

I ask because the error you included in your message is not necessarily
indicative of a failed installation (it could just be a path issue).

By the way, as I think is indicated somewhere in the installation
instructions, you don't actually need to install Bioperl to use most of its
functionality. Simply having the Bio/ directory in your PERL5LIB path is
enough.


Dave


From cjfields at illinois.edu  Thu Jun 25 13:02:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 12:02:48 -0500
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
Message-ID: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>

On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:

> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something  
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not  
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence  
> inside
> an alignment?
>
> Thanks,
> Chase

 From working with feature/annotation-rich alignment formats such as  
stockholm I found this is one of the areas for Align that needs some  
rethinking. One way to work around this w/o major refactoring is to  
have a full-length SeqFeature (pointing to the proper LocatableSeq)  
that stores the Bio::Annotation.  I don't necessarily like that  
approach as a long-term solution, though, as it's a little hacky and  
indirect, but it might get you started (just mark it as TODO so we can  
catch it at some point).

For a long-term solution I don't think the answer is as simple as  
making LocatableSeq Bio::AnnotatableI; that would not be congruent  
with the PrimarySeq implementation (which is not AnnotatableI).   
LocatableSeq is supposed to represent a simple PrimarySeq that can be  
mapped to other sequences via start/end/strand, and thus inherits from  
both Bio::PrimarySeq (note lack of 'I') and RangeI.

Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and  
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the  
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow  
e.g. features.

I personally think option #2 is easiest, as this means anything that  
is-a PrimarySeq is also AnnotatableI, and it might not break past  
scripts.  Not sure how this would affect overall performance though.

chris


From me at miguel.weapps.com  Thu Jun 25 10:09:29 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Thu, 25 Jun 2009 16:09:29 +0200
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <94da4c880906250709j7b2cb78dk77710bd43e20fd42@mail.gmail.com>

Dear all,
Is there a way to run muscle silently via
Bio::Tools::Run::Alignment::Muscle?

Cheers,

-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]

+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From chmille4 at gmail.com  Thu Jun 25 13:57:25 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 13:57:25 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> 
	<3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
Message-ID: <991fb8210906251057i25bbe511r84f5d1319f191421@mail.gmail.com>

Ok, I'll use the full length SeqFeature for now and mark it with a TODO.
 Thanks for the help.
Chase

On Thu, Jun 25, 2009 at 1:02 PM, Chris Fields <cjfields at illinois.edu> wrote:

> On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:
>
>  Hi all,
>>
>> Quick question I came across while writing the Bio::Nexml module.
>>
>> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
>> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
>> SeqFeatures, but according to this HowTo (
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
>> considered to refer to a portion of a sequence, whereas something like
>> taxon
>> data would refer to the entire sequence and should be handled as an
>> annotation. However, as far as I can tell Bio::LocatableSeq does not
>> support
>> annotation objects.
>> What would be the best way to relate taxon data to a single sequence
>> inside
>> an alignment?
>>
>> Thanks,
>> Chase
>>
>
> From working with feature/annotation-rich alignment formats such as
> stockholm I found this is one of the areas for Align that needs some
> rethinking. One way to work around this w/o major refactoring is to have a
> full-length SeqFeature (pointing to the proper LocatableSeq) that stores the
> Bio::Annotation.  I don't necessarily like that approach as a long-term
> solution, though, as it's a little hacky and indirect, but it might get you
> started (just mark it as TODO so we can catch it at some point).
>
> For a long-term solution I don't think the answer is as simple as making
> LocatableSeq Bio::AnnotatableI; that would not be congruent with the
> PrimarySeq implementation (which is not AnnotatableI).  LocatableSeq is
> supposed to represent a simple PrimarySeq that can be mapped to other
> sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq
> (note lack of 'I') and RangeI.
>
> Three options:
> 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and
> Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
> 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the
> PrimarySeq AnnotationCollection).
> 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g.
> features.
>
> I personally think option #2 is easiest, as this means anything that is-a
> PrimarySeq is also AnnotatableI, and it might not break past scripts.  Not
> sure how this would affect overall performance though.
>
> chris
>


From Kevin.M.Brown at asu.edu  Thu Jun 25 14:54:19 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 11:54:19 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA08F@EX02.asurite.ad.asu.edu>

Please keep your replies on the list. 

> -----Original Message-----
> From: Adlai Burman [mailto:adlai at refenestration.com] 
> Sent: Thursday, June 25, 2009 11:39 AM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Extreme newbie question.
> 
> Thanks, Kevin.
> I did install everything using sudo. I will try again and pay  
> attention to the error log. I hope I did not introduce any conflicts  
> or weird path problems.
> 
> Adlai
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
> 
> > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >
> > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
> >
> > That error suggests that the install fails and you need to 
> figure out
> > why from the install error messages. I suspect you aren't doing the
> > install as root, but as a normal user who lacks the needed 
> permissions
> > to change files in certain directories.
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >> Adlai Burman
> >> Sent: Wednesday, June 24, 2009 7:09 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] Extreme newbie question.
> >>
> >> I have been trying to install BioPerl for a while now and after
> >> pummeling my hard drive (Mac OS 10.5 intel) with several 
> attempts at
> >> Fink installation, a >cpan installation and removing my .cpan
> >> folder I
> >> am still at square 0. I do not want to do anymore damage to my
> >> computer, yet I really need a working install (especially to
> >> interface
> >> with remote DBs like GenBank. Can anyone give me some advice here?
> >> After each attempt, I have tried to run perldoc bptutorial.pl and
> >> tried test scripts with "use Bio::Perl" in the headers and I just
> >> receive  error mesages like the following:
> >>
> >> Can't locate Bio/Perl.pm in @INC (@INC contains: 
> /home/users/dag/lib/
> >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
> >> /Library/
> >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
> >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
> >> /Network/Library/
> >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
> >> Network/Library/Perl 
> /System/Library/Perl/Extras/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/Extras/5.8.8 
> /Library/Perl/5.8.6 /
> >> Library/Perl/5.8.1 .) at trsh.pl line 1.
> >>
> >> I have been working from the OReilly book astering Perl for
> >> Bioinformatics and the INSTALL file and have scoured around the
> >> BioPerl website and am still stuck.
> >>
> >> Thanks in advance,
> >>
> >> Adlai
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> 
> 


From adlai at refenestration.com  Thu Jun 25 14:59:10 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 20:59:10 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
Message-ID: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>

Hey again, I'm right into trying to install again and I now get a new  
error:

Client not fully configured, please proceed with configuring.
  o conf init urllist

any ideas?

Adlai

On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:

> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>
> That error suggests that the install fails and you need to figure out
> why from the install error messages. I suspect you aren't doing the
> install as root, but as a normal user who lacks the needed permissions
> to change files in certain directories.
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Adlai Burman
>> Sent: Wednesday, June 24, 2009 7:09 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Extreme newbie question.
>>
>> I have been trying to install BioPerl for a while now and after
>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>> Fink installation, a >cpan installation and removing my .cpan
>> folder I
>> am still at square 0. I do not want to do anymore damage to my
>> computer, yet I really need a working install (especially to
>> interface
>> with remote DBs like GenBank. Can anyone give me some advice here?
>> After each attempt, I have tried to run perldoc bptutorial.pl and
>> tried test scripts with "use Bio::Perl" in the headers and I just
>> receive  error mesages like the following:
>>
>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/
>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>> /Library/
>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>> /Network/Library/
>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>
>> I have been working from the OReilly book astering Perl for
>> Bioinformatics and the INSTALL file and have scoured around the
>> BioPerl website and am still stuck.
>>
>> Thanks in advance,
>>
>> Adlai
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Thu Jun 25 16:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 15:07:44 -0500
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <F3802595-7617-4CD5-AC8A-2B67069BE001@illinois.edu>

That would mean, within the cpan shell, type 'o conf init  
urllist' (again, requires sudo).

chris

On Jun 25, 2009, at 1:59 PM, Adlai Burman wrote:

> Hey again, I'm right into trying to install again and I now get a  
> new error:
>
> Client not fully configured, please proceed with configuring.
> o conf init urllist
>
> any ideas?
>
> Adlai
>
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
>
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>>
>> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>>
>> That error suggests that the install fails and you need to figure out
>> why from the install error messages. I suspect you aren't doing the
>> install as root, but as a normal user who lacks the needed  
>> permissions
>> to change files in certain directories.
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Adlai Burman
>>> Sent: Wednesday, June 24, 2009 7:09 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Extreme newbie question.
>>>
>>> I have been trying to install BioPerl for a while now and after
>>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>>> Fink installation, a >cpan installation and removing my .cpan
>>> folder I
>>> am still at square 0. I do not want to do anymore damage to my
>>> computer, yet I really need a working install (especially to
>>> interface
>>> with remote DBs like GenBank. Can anyone give me some advice here?
>>> After each attempt, I have tried to run perldoc bptutorial.pl and
>>> tried test scripts with "use Bio::Perl" in the headers and I just
>>> receive  error mesages like the following:
>>>
>>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/ 
>>> lib/
>>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>>> /Library/
>>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>>> /Network/Library/
>>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin- 
>>> thread-
>>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>>
>>> I have been working from the OReilly book astering Perl for
>>> Bioinformatics and the INSTALL file and have scoured around the
>>> BioPerl website and am still stuck.
>>>
>>> Thanks in advance,
>>>
>>> Adlai
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 25 16:19:07 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 25 Jun 2009 21:19:07 +0100
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <4A43DBBB.2050109@sendu.me.uk>

Adlai Burman wrote:
> Hey again, I'm right into trying to install again and I now get a new 
> error:
> 
> Client not fully configured, please proceed with configuring.
>  o conf init urllist

Run cpan and do as it says.


From cjm at berkeleybop.org  Thu Jun 25 20:32:05 2009
From: cjm at berkeleybop.org (Chris Mungall)
Date: Thu, 25 Jun 2009 17:32:05 -0700
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
Message-ID: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>


I've written a module Bio::FeatureIO::seqont_owl, which generates  
Sequence Ontology compliant RDF/OWL. This will allow for example  
loading of GFF into triplestores and inference using OWL reasoners.

- It's experimental, fairly incomplete, and subject to change
- Relies on an experimental extension of SO
- Probably of interest to a minority of bp users
- It's not yet fully documented (but there will be a paper)
- It doesn't introduce any additional dependencies (all done via  
XML::Writer, which is already a dependency)
- Doesn't otherwise impinge on existing code

I'd like to get this under source control. Is the appropriate place  
for this:

- HEAD
- a branch
- bioperl-dev
- a separate repository

?

Cheers
Chris


From maj at fortinbras.us  Thu Jun 25 21:08:43 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 25 Jun 2009 21:08:43 -0400
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
Message-ID: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>

This sounds very Dev to me. Also cool.
MAJ
----- Original Message ----- 
From: "Chris Mungall" <cjm at berkeleybop.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 25, 2009 8:32 PM
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF


>
> I've written a module Bio::FeatureIO::seqont_owl, which generates  Sequence 
> Ontology compliant RDF/OWL. This will allow for example  loading of GFF into 
> triplestores and inference using OWL reasoners.
>
> - It's experimental, fairly incomplete, and subject to change
> - Relies on an experimental extension of SO
> - Probably of interest to a minority of bp users
> - It's not yet fully documented (but there will be a paper)
> - It doesn't introduce any additional dependencies (all done via  XML::Writer, 
> which is already a dependency)
> - Doesn't otherwise impinge on existing code
>
> I'd like to get this under source control. Is the appropriate place  for this:
>
> - HEAD
> - a branch
> - bioperl-dev
> - a separate repository
>
> ?
>
> Cheers
> Chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 25 21:35:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 20:35:06 -0500
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
	<7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
Message-ID: <12F203C3-689B-423E-9691-86EB1D500A7D@illinois.edu>

I agree.  Just to note, FeatureIO (even though it's in core) will be  
operated on at some future point to be simplified (and likely will  
move away from Bio::SF::Annotated).

chris

On Jun 25, 2009, at 8:08 PM, Mark A. Jensen wrote:

> This sounds very Dev to me. Also cool.
> MAJ
> ----- Original Message ----- From: "Chris Mungall" <cjm at berkeleybop.org 
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Thursday, June 25, 2009 8:32 PM
> Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
>
>
>>
>> I've written a module Bio::FeatureIO::seqont_owl, which generates   
>> Sequence Ontology compliant RDF/OWL. This will allow for example   
>> loading of GFF into triplestores and inference using OWL reasoners.
>>
>> - It's experimental, fairly incomplete, and subject to change
>> - Relies on an experimental extension of SO
>> - Probably of interest to a minority of bp users
>> - It's not yet fully documented (but there will be a paper)
>> - It doesn't introduce any additional dependencies (all done via   
>> XML::Writer, which is already a dependency)
>> - Doesn't otherwise impinge on existing code
>>
>> I'd like to get this under source control. Is the appropriate  
>> place  for this:
>>
>> - HEAD
>> - a branch
>> - bioperl-dev
>> - a separate repository
>>
>> ?
>>
>> Cheers
>> Chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rmb32 at cornell.edu  Fri Jun 26 00:27:55 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 25 Jun 2009 21:27:55 -0700
Subject: [Bioperl-l] BioPerl hackathon, hooray!
Message-ID: <4A444E4B.2000808@cornell.edu>

I'm pleased to announce a thoroughly climactic conclusion to the 
YAPC::NA 2009 BioPerl hackathon.

Between Jay Hannah (jhannah) and myself (rbuels), plus #bioperl virtual 
participant Bruno Vecchi (brunov), we SMASHED the HECK out of 6 bugs in 
the BioPerl Bugzilla.

Many thanks to the participants, let's do it again next year!

Rob


From jay at jays.net  Fri Jun 26 00:54:31 2009
From: jay at jays.net (Jay Hannah)
Date: Fri, 26 Jun 2009 00:54:31 -0400
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <4A444E4B.2000808@cornell.edu>
References: <4A444E4B.2000808@cornell.edu>
Message-ID: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>

On Jun 26, 2009, at 12:27 AM, Robert Buels wrote:
> I'm pleased to announce a thoroughly climactic conclusion to the  
> YAPC::NA 2009 BioPerl hackathon.

Feel free to check our work:

    http://github.com/rbuels/bioperl-live

:)

j
http://www.bioperl.org/wiki/User:Jhannah


From rahall2 at ualr.edu  Fri Jun 26 02:28:05 2009
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 26 Jun 2009 01:28:05 -0500
Subject: [Bioperl-l] Random nucleotide string generator?
Message-ID: <fc2dd7b3461f.4a442425@ualr.edu>

All,
 
Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere? 
 
I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
 
If not - what should the namespace be for such a module should it be undone and desirable? 
 
TIA!
 
Roger 
 
 
From David.Messina at sbc.su.se  Fri Jun 26 06:15:04 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 12:15:04 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com>

The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this
post from Neil Saunders' blog:
http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/


You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID
package, available here:
[ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>

<ftp://selab.janelia.org/pub/software/squid/>

If not - what should the namespace be for such a module should it be undone
> and desirable?


Perhaps add it to Bio::SeqUtils?


Dave


From David.Messina at sbc.su.se  Fri Jun 26 07:37:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 13:37:44 +0200
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
References: <4A444E4B.2000808@cornell.edu>
	<E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
Message-ID: <628aabb70906260437r18fc7543oc05761241fe810ff@mail.gmail.com>

Awesome, great work guys!
Thanks so much.


Dave


From David.Messina at sbc.su.se  Fri Jun 26 08:58:20 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 14:58:20 +0200
Subject: [Bioperl-l]  Random nucleotide string generator?
In-Reply-To: <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
References: <fc2dd7b3461f.4a442425@ualr.edu>
	<628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> 
	<1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
Message-ID: <628aabb70906260558k585f6700ycef271e7f26dd1a3@mail.gmail.com>

[Forwarding Bruno's reply.... -Dave]
---------- Forwarded message ----------
From: Bruno Vecchi <vecchi.b at gmail.com>
Date: Fri, Jun 26, 2009 at 14:44
Subject: Re: [Bioperl-l] Random nucleotide string generator?
To: Dave Messina <David.Messina at sbc.su.se>


Here's a little script that I used for a somewhat related task. It produces
a randomized version of an input sequence (thus keeping the original's
composition). Maybe you could adjust it to your needs; providing an input
sequence with the desired length and composition you should get what you
want.

#!perl
use List::Util qw(shuffle);
use Bio::SeqIO;

my ($seqfile, $number) = @ARGV;

my $in = Bio::SeqIO->new(-file => $seqfile);
my $fh = Bio::SeqIO->newFh(-format => 'fasta');

my $seq = $in->next_seq;
my @chars = split '', $seq->seq;

for my $i (1 .. $number) {
    @chars = shuffle @chars;
    my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars);
    print $fh $new_seq;
}

You can use it like this from the command line (assuming you want 20 output
sequences):

shuffle.pl input_sequence.fasta 20 > random_sequences.fasta

Bruno.

2009/6/26 Dave Messina <David.Messina at sbc.su.se>

> The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on
> this
> post from Neil Saunders' blog:
>
> http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/
>
>
> You can also do this outside of BioPerl using shuffle from Sean Eddy's
> SQUID
> package, available here:
> [ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>
>
> <ftp://selab.janelia.org/pub/software/squid/>
>
> If not - what should the namespace be for such a module should it be undone
> > and desirable?
>
>
> Perhaps add it to Bio::SeqUtils?
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Fri Jun 26 04:30:12 2009
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Fri, 26 Jun 2009 10:30:12 +0200 (CEST)
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <Pine.LNX.4.44.0906261028110.14978-100000@bibo.EMBL-Heidelberg.DE>

a non-bioperl option would be to use something external like seq-gen or 
similar - tools designed for outputing "random" sequences simulated over a 
tree - one could simply sample a single simulated sequence at random from 
the output alignment

On Fri, 26 Jun 2009, Roger Hall wrote:

> All,
>  Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>  
> I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
>  
> If not - what should the namespace be for such a module should it be undone and desirable? 
>  
> TIA!
>  
> Roger 
>  
>  
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
----------------------------------------------------------------------
Aidan Budd                                    tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

http://www.embl-heidelberg.de/~budd/
http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html


From me at miguel.weapps.com  Fri Jun 26 04:52:46 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Fri, 26 Jun 2009 10:52:46 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <94da4c880906260152k3a764951u6ea8a6fdfa3b7f2c@mail.gmail.com>

Dear all, dear Roger,
I'm not sure if there is such generator (I think so).  Anyway, if you flag
it as "undone and desirable", please take into account the possibility of
extend the generator for dinucleotides, particularly useful when working
with secondary structure of RNA molecules,

Cheers,

On Fri, Jun 26, 2009 at 8:28 AM, Roger Hall <rahall2 at ualr.edu> wrote:

> All,
>
> Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>
> I noticed a thread about it from 2000 and nothing since (searching for
> "random sequence").
>
> If not - what should the namespace be for such a module should it be undone
> and desirable?
>
> TIA!
>
> Roger
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]


+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From pri2darshini at gmail.com  Fri Jun 26 06:18:55 2009
From: pri2darshini at gmail.com (priya darshini)
Date: Fri, 26 Jun 2009 15:48:55 +0530
Subject: [Bioperl-l] bioperl installation
Message-ID: <7c569a160906260318t5611fdd8nd536ae5139f5b1d4@mail.gmail.com>

Respected Sir,
                    I am K.Lakshmi priya Darshini. My specialization is M.Sc
bioinformatics. I am interseted in learning bioperl. My operating system is
windows Vista. I have followed the steps to install bioperl as given by your
team in the bioperl tutorial. But i am getting the error message as *"Begin
failed".Sir please help me to continue with my installation further. I am
using 5.10 version of perl.Waithing for your reply.*
* thanking you.*
*                  *
**
*regards,*
*lakshmi priya darshini.*


From Jonathan.Moore at warwick.ac.uk  Fri Jun 26 05:55:54 2009
From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan)
Date: Fri, 26 Jun 2009 10:55:54 +0100
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>

I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML files at the TAIR FTP site.

I've tried SeqIO with both tigr and tigrxml formats but both are giving errors in 1.6.0.  Has anyone advice on whether it's likely to be doable, or should I wait til the .gb files are available?

Jay Moore


From fungazid at yahoo.com  Fri Jun 26 07:59:06 2009
From: fungazid at yahoo.com (Fungazid)
Date: Fri, 26 Jun 2009 04:59:06 -0700 (PDT)
Subject: [Bioperl-l] Bio::Assembly::IO
Message-ID: <57633.49243.qm@web65505.mail.ac4.yahoo.com>


Hello,

I received an ACE file containing newbler assembly of 454 cDNA reads, and a corresponding phd.ball file. I was able to view and manipulate the contigs in this assembly using Consed on linux. Consed required ~1.5GB RAM, and the assembly was loaded within ~2 min. 
I would like to parse the assembly within my code (preferentially in Perl, but not necessarily), to fetch all read sequences for each contig, nucleotide quality, alignment to consensus, etc. 
I am trying to use Bio::Assembly::IO , but it eats more than my entire RAM (3GB), and is extremely slow (~1 hour before it crashes).
Maybe you have an idea ?
In addition, do you maybe aware of other non-visual parsers of ACE assembly format for Perl or other languages

Many thanks,
funazid   


From cjfields at illinois.edu  Fri Jun 26 13:00:41 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 12:00:41 -0500
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <FEC1932A-49FE-4E63-9727-F08520FF0252@illinois.edu>

If there are errors this should be submitted as a bug.  You should  
attach example data to the report after it (e.g. don't copy&paste into  
the text box).

http://www.bioperl.org/wiki/Bugs

chris

On Jun 26, 2009, at 4:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From plantboy at gmail.com  Fri Jun 26 14:46:35 2009
From: plantboy at gmail.com (cody h)
Date: Fri, 26 Jun 2009 11:46:35 -0700
Subject: [Bioperl-l] test suite failing on mac os x 10.5
Message-ID: <320708320906261146v2e799c82mc1b921218fc233c5@mail.gmail.com>

Hi,

I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7.
The Build.PL file executes fine, but the test suite fails dramatically,
returning the error "No database selected" for many of the tests. All the
error calls seem to be originating from line 852 in
BasePersistenceAdaptor.pm. I took a look at the code but I could not figure
out why it wasn't working.

I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql
server. The dependencies all seem to be working, but I haven't used them
enough to completely verify this, so that could be part of the problem. I
don't know which ones to check though. Does anyone have any idea why I might
be getting these "No database selected" errors? Here is a sample of the
error messages given by the ./Build test command (note, this same error is
generated byt 15/16 test files)

t/12ontology.t .... 1/738
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: error while executing statement in
Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: t/12ontology.t:44
-----------------------------------------------------------
t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00)


From maj at fortinbras.us  Fri Jun 26 14:50:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 26 Jun 2009 14:50:02 -0400
Subject: [Bioperl-l] Fw: Inquiry about a prog written by [MAJ]
Message-ID: <0581B2DAE8514F418127D54407384905@NewLife>

Thought this should be archived to the list. 
MAJ

----- Original Message ----- 
From: Mark A. Jensen 
To: Ross KK Leung 
Sent: Thursday, June 25, 2009 8:46 AM
Subject: Re: Inquiry about a prog written by you


Hi Ross-
Yes, you can specify the recombinants, as "A/C/G[subtype]" in the query string. Unfortunately, the 10000 record limit is imposed by the Los Alamos site that my program accesses. You might be able to work around this if you're willing to write your own script using the BioPerl modules that are the basis for the hivq.PLS -- by using the modules to perform multiple queries, and collecting the the entire set of sequences over that series of queries. 
You might look at the documentation for the modules for ideas; try looking at http://www.bioperl.org/wiki/Module:Bio::DB::HIV and http://www.bioperl.org/wiki/Module:Bio::DB::Query::HIVQuery . 
best regards- 
Mark
  ----- Original Message ----- 
  From: Ross KK Leung 
  To: maj at fortinbras.us 
  Sent: Thursday, June 25, 2009 6:09 AM
  Subject: Inquiry about a prog written by you


  Dear Mark A. Jensen,

   
  A google search returns your program (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/DB-HIV/hivq.PLS)

   
  I wonder whether the program is able to search recombinants (e.g. B incl. recombinants) and retrieve results more than 50000 records. This limitation is a bottleneck by the web-based search.

   
  Thanks for your advice, Ross


From rmb32 at cornell.edu  Fri Jun 26 17:06:06 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Jun 2009 14:06:06 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
Message-ID: <4A45383E.40207@cornell.edu>

Reposting to bioperl list.

This is a really giant opportunity to expose some of the best 
technologists in the world to what we do in bioinformatics, and possibly 
to entice some of them to help us the heck out!  ;-)

Rob

On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
> University.  Can you offer any lecturer recommendations and could I 
> fill an entire multi day thread with BioPerl lectures?  I would also 
> like to "entice" MJD to come to YAPC with the use of BioPerl.
>
> Thanks for your thoughts.
>
> Heath Bair
> (Candybar)

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain.cshl at gmail.com  Fri Jun 26 17:12:37 2009
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 26 Jun 2009 17:12:37 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <D2A53AB2-E35A-499B-B81A-13B9D61752CA@gmail.com>

Cool--Columbus is just down the road.  I could give a talk (or even  
multiple talks) on a variety of GMOD topics (which I consider BioPerl  
related, since so much of what we do depends on BioPerl).

Scott

On Jun 26, 2009, at 5:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Fri Jun 26 17:49:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 16:49:39 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <642C6C93-8FCD-4463-8A39-E15832F8714C@illinois.edu>

Well, if it's in Columbus I'll be there (I can make a drive out of it).

In short, we should probably get something going, yes. Lots of things  
we can talk about, inc. bioperl6, Bio::Moose, etc.

chris

On Jun 26, 2009, at 4:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Fri Jun 26 23:59:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 26 Jun 2009 20:59:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <19013.39182.97468.604560@already.dhcp.gene.com>


This does seems like a great opportunity.  I think you/the-community
could put together at least a day, and maybe more, of Bio and Perl
stuff.  I think that it's important to range beyond the stuff that's
in the BioPerl namespace and pull in something from the Gene Ontology
project, the Ensembl project[s], maybe libbio, etc....

g.

Robert Buels writes:
 > Reposting to bioperl list.
 > 
 > This is a really giant opportunity to expose some of the best 
 > technologists in the world to what we do in bioinformatics, and possibly 
 > to entice some of them to help us the heck out!  ;-)
 > 
 > Rob
 > 
 > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > > I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > > University.  Can you offer any lecturer recommendations and could I 
 > > fill an entire multi day thread with BioPerl lectures?  I would also 
 > > like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >
 > > Thanks for your thoughts.
 > >
 > > Heath Bair
 > > (Candybar)
 > 
 > -- 
 > Robert Buels
 > Bioinformatics Analyst, Sol Genomics Network
 > Boyce Thompson Institute for Plant Research
 > Tower Rd
 > Ithaca, NY  14853
 > Tel: 503-889-8539
 > rmb32 at cornell.edu
 > http://www.sgn.cornell.edu
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From cjfields at illinois.edu  Sat Jun 27 00:28:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 23:28:14 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <19013.39182.97468.604560@already.dhcp.gene.com>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<19013.39182.97468.604560@already.dhcp.gene.com>
Message-ID: <EB3EB763-05F4-4F75-88F5-8A642E567ABA@illinois.edu>

Agree (and should add GMOD/Gbrowse to that as well).

chris

On Jun 26, 2009, at 10:59 PM, George Hartzell wrote:

>
> This does seems like a great opportunity.  I think you/the-community
> could put together at least a day, and maybe more, of Bio and Perl
> stuff.  I think that it's important to range beyond the stuff that's
> in the BioPerl namespace and pull in something from the Gene Ontology
> project, the Ensembl project[s], maybe libbio, etc....
>
> g.
>
> Robert Buels writes:
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best
>> technologists in the world to what we do in bioinformatics, and  
>> possibly
>> to entice some of them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would
>>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State
>>> University.  Can you offer any lecturer recommendations and could I
>>> fill an entire multi day thread with BioPerl lectures?  I would also
>>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sat Jun 27 00:56:41 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 00:56:41 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <E6D907E51B8D477FBB635ED4B500C257@NewLife>

I think BioPerl has enough to talk about to have its own conference, 
which would coincide with its 15th anniversary in 2010. That may 
put the kibosh on the original  intent of the inviter, which ultimately is 
to get The Dominus to bite (and more power to her, I say. My 
programming style is forever changed, and I haven't even finished
The Book). 

If someone organizes it, I'll bring the chips and dip.
MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Cc: <BAIRH at nationwide.com>
Sent: Friday, June 26, 2009 5:06 PM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


> Reposting to bioperl list.
> 
> This is a really giant opportunity to expose some of the best 
> technologists in the world to what we do in bioinformatics, and possibly 
> to entice some of them to help us the heck out!  ;-)
> 
> Rob
> 
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
>> University.  Can you offer any lecturer recommendations and could I 
>> fill an entire multi day thread with BioPerl lectures?  I would also 
>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
> 
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From maj at fortinbras.us  Sat Jun 27 01:30:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 01:30:34 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net><4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <B44649FB157145A3BE7153D163802926@NewLife>

[...to *him*, that is...pardon]

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Robert Buels" <rmb32 at cornell.edu>; "BioPerl List" 
<bioperl-l at lists.open-bio.org>
Sent: Saturday, June 27, 2009 12:56 AM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


>I think BioPerl has enough to talk about to have its own conference, which 
>would coincide with its 15th anniversary in 2010. That may put the kibosh on 
>the original  intent of the inviter, which ultimately is to get The Dominus to 
>bite (and more power to her, I say. My programming style is forever changed, 
>and I haven't even finished
> The Book).
> If someone organizes it, I'll bring the chips and dip.
> MAJ
> ----- Original Message ----- 
> From: "Robert Buels" <rmb32 at cornell.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Cc: <BAIRH at nationwide.com>
> Sent: Friday, June 26, 2009 5:06 PM
> Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
>
>
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best technologists 
>> in the world to what we do in bioinformatics, and possibly to entice some of 
>> them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would like to 
>>> have a "BioPerl" thread at YAPC::NA::2010 at Ohio State University.  Can you 
>>> offer any lecturer recommendations and could I fill an entire multi day 
>>> thread with BioPerl lectures?  I would also like to "entice" MJD to come to 
>>> YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kpclancy at hotmail.com  Sat Jun 27 06:04:20 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Sat, 27 Jun 2009 04:04:20 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <COL107-W978FB7B4A3E98561F84E5CE320@phx.gbl>


I think ismb will be in Boston in 2010 (feels odd just typing that...)

maybe that is enough of a running start to set something up.

kevin
 
> CC: jay at jays.net; vecchi.b at gmail.com; bioperl-l at bioperl.org
> From: cjfields at illinois.edu
> To: kpclancy at hotmail.com
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> Date: Wed, 24 Jun 2009 22:54:28 -0500
> 
> I have no idea; I don't think there are many bioperl devs attending 
> this year unfortunately. Any meetings in the next year where we could 
> set up a bioperl hackathon? I will likely be available to attend if 
> it's stateside...
> 
> chris
> 
> On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:
> 
> >
> > is there an intention to have a hackathon at ISMB this weekend - I 
> > know there is a 2 day BOSC
> > kevin
> >
> >> From: cjfields at illinois.edu
> >> To: jay at jays.net
> >> Date: Wed, 24 Jun 2009 16:10:34 -0500
> >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> >>
> >>
> >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> >>
> >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >>>> Let me know if anyone needs collab on biomoose on github; Mark
> >>>> Jensen's already added.
> >>>
> >>> Anything on github should be trivial, even with no perms -- we can
> >>> just fork and then send you (whoever) pull requests. github++ :)
> >>>
> >>>> 1) Any help towards bugzilla fixes would be most welcome.
> >>>
> >>> I don't know how to make any progress in bugzilla if no one has a
> >>> commit bit...?
> >>
> >> For some reason I thought you had a commit bit; we can add you in if
> >> needed. Anyway, patches are most definitely welcome ;>
> >>
> >>>> 2) Better GFF3 integration
> >>>> 3) Typed but lightweight seqfeatures
> >>>
> >>> Are there bugzilla tickets (or somewhere) describing those?
> >>
> >> No as the issues are more complex than one single bug, but we do have
> >> something to help track for the time being:
> >>
> >> http://www.bioperl.org/wiki/GFF_Refactor
> >> http://www.bioperl.org/wiki/Align_Refactor
> >>
> >> I'll probably file TODOs during the process for those refactors. The
> >> easiest to tackle would be probably be Align/LocatableSeq refactors.
> >>
> >>> I wonder if anyone can help me get out of sporadic MailMan
> >>> purgatory...
> >>>
> >>> Thanks,
> >>>
> >>> j
> >>
> >> -c
> >>
> >> PS - Don't feel constrained by the above. There are many many areas
> >> to contribute to.
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hartzell at alerce.com  Sat Jun 27 13:08:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 27 Jun 2009 10:08:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <19014.20986.867646.940277@already.dhcp.gene.com>


I had an eye-opening time at YAPC, and I think that it would be very
powerful to have many members of the Bio & Perl community rubbing
elbows with the folks leading (and following, for that matter) the
"Modern Perl" movement (in the broader sense, not _just_ chromatic):
Moose, DBIx::Class, Dist::Zilla, KiokoDB, etc....  I think that it
would help pull BioPerl and the others towards powerful mainstream
technologies and expose many of us to new people, tricks, and tools.
Having us off on our own, or mingling with ISMB'ers, doesn't really
stir the pot.

g.


Mark A. Jensen writes:
 > I think BioPerl has enough to talk about to have its own conference, 
 > which would coincide with its 15th anniversary in 2010. That may 
 > put the kibosh on the original  intent of the inviter, which ultimately is 
 > to get The Dominus to bite (and more power to her, I say. My 
 > programming style is forever changed, and I haven't even finished
 > The Book). 
 > 
 > If someone organizes it, I'll bring the chips and dip.
 > MAJ
 > ----- Original Message ----- 
 > From: "Robert Buels" <rmb32 at cornell.edu>
 > To: "BioPerl List" <bioperl-l at lists.open-bio.org>
 > Cc: <BAIRH at nationwide.com>
 > Sent: Friday, June 26, 2009 5:06 PM
 > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
 > 
 > 
 > > Reposting to bioperl list.
 > > 
 > > This is a really giant opportunity to expose some of the best 
 > > technologists in the world to what we do in bioinformatics, and possibly 
 > > to entice some of them to help us the heck out!  ;-)
 > > 
 > > Rob
 > > 
 > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > >> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > >> University.  Can you offer any lecturer recommendations and could I 
 > >> fill an entire multi day thread with BioPerl lectures?  I would also 
 > >> like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >>
 > >> Thanks for your thoughts.
 > >>
 > >> Heath Bair
 > >> (Candybar)
 > > 
 > > -- 
 > > Robert Buels
 > > Bioinformatics Analyst, Sol Genomics Network
 > > Boyce Thompson Institute for Plant Research
 > > Tower Rd
 > > Ithaca, NY  14853
 > > Tel: 503-889-8539
 > > rmb32 at cornell.edu
 > > http://www.sgn.cornell.edu
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > > 
 > >
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From richard.harrison at edinburgh.ac.uk  Mon Jun 29 18:43:54 2009
From: richard.harrison at edinburgh.ac.uk (Richard Harrison)
Date: Mon, 29 Jun 2009 23:43:54 +0100
Subject: [Bioperl-l] PopGen
Message-ID: <5FBB6056-386D-42E3-8236-1FEB8F5BE520@edinburgh.ac.uk>

Dear all,

I am having trouble with the PopGen modules and I was wondering if  
anyone had any ideas.

I am working with polymorphism data. I am trying to identify the  
derived vs ancestral allele between two species. I have been modifying  
the modules a bit to include different site models etc.  Here is where  
I fall over:

Within aln_to_population I can create a modified Genotype object to  
include details of the ancestral allele (see at end of this post).

However,  the problem that I have hit upon is that aln_to_population  
returns a population object, filled with IndividualI objects.  In  
other words, it takes my array of GenotypeI objects and converts them  
into IndividualI objects, wrapped in a single Population object.  This  
means that the information in the GenotypeI object about the ancestral/ 
derived states is lost. How can I overcome this?


Thanks,
Richard


###excerpt from aln_to_population


  $inds[$i]->add_Genotype(Bio::PopGen::Genotype->new
					   (-marker_name  => $nm,
					    -individual_id=> $inds[$i]->unique_id,
					    -alleles      => [$genotypes[$i]],
					    -outgroup      => $outgroup[0]));


###excerpt from Genotypes.pm

sub new {
   my($class, at args) = @_;

   my $self = $class->SUPER::new(@args);
   my ($name,$desc,$type,$uid,$af,$og) = $self->_rearrange([qw(NAME
							  DESCRIPTION
							  TYPE
							  UNIQUE_ID
							  ALLELE_FREQ
							  OUTGROUP)], at args);
   $self->{'_allele_freqs'} = {};
   $self->{'_outgroup_name'} = {};

   if( ! defined $uid ) {
       $uid = $UniqueCounter++;
   }
   if( defined $name) {
       $self->name($name);
   } else {
       $self->throw("Must provide a name when initializing a Marker");
   }
   defined $desc && $self->description($desc);
   defined $type && $self->type($type);


       $self->outgroup_name($og);


   $self->unique_id($uid);

   return $self;
}

=head2 og
  Title   : name
  Usage   : my $name = $marker->og();
  Function: Get the name of the outgroup
  Returns : string representing the name of the marker
  Args    : [optional] name


=cut

sub outgroup_name{
     my $self = shift;

     return $self->{'_outgroup_name'} = shift if @_;
     return $self->{'_outgroup_name'};
}


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Tue Jun 30 01:03:08 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 29 Jun 2009 22:03:08 -0700
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <E6D82027-AF55-4E64-BC8F-71F3F60D0E7E@bioperl.org>

There are several flavors of TIGR XML for rice and arabidoposis, and  
other projects etc, I don't know which is tracked with the current  
tigrxml version unfortunately but one can compare the test files in t/ 
data to the versions downloaded to see what is currently supported.   
Usually the gbk will be more consistently parseable but we can try and  
work it out if it is a sensible transformation.

On Jun 26, 2009, at 2:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From paola.bisignano at gmail.com  Tue Jun 30 05:12:49 2009
From: paola.bisignano at gmail.com (Paola Bisignano)
Date: Tue, 30 Jun 2009 11:12:49 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25
In-Reply-To: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
Message-ID: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>

Hi,
I need a little help, to parse a file, but I tried to search some
modules of bioperl, but there are a lot, and I don't know how to
start, I find moduls for all db, for different web site, but not for
my favorite PDBsum....so I parsed a lot of thing on my own, even if I
was new in learning perl....but now I'm waiting for help...because I
need to parse a FASTA file, resulted from aligned sequences...I need
to extract the aligned sequences, only for the pdb in my lista....


my fasta file is like:

Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
  1>>>Sequence 3e7e:A - 333 aa
Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
17840403 residues in 79353 sequences

       opt      E()
< 20   286     0:===
  22     1     0:=          one = represents 135 library sequences
  24     1     0:=
  26     0     2:*
  28    21    18:*
  30    36   109:*
  32   237   421:== *
  34   956  1140:========*
  36  1924  2342:===============  *
  38  3591  3871:=========================== *
  40  4904  5400:=====================================  *
  42  6750  6600:================================================*=
  44  7145  7281:=====================================================*
  46  8047  7416:======================================================*=====
.........

>>2np8:A                                                  (159 aa)
 initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
overlap (59-204:13-153)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                                 ::
2np8:A                                               QWALEDFEIGRPLG
                                                             10

               70          80        90         100        110
Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
           20        30        40        50        60        70

         120         130       140       150       160       170
Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
             80        90       100            110       120

           180       190        200       210       220       230
Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
       : ::::.:..::      ::: : . :.: :.
2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
       130             140       150

            240       250       260       270       280       290
Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP

            300       310       320       330
Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

>>2ojg:A                                                  (337 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:1-204)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2ojg:A                                              FDVGPRYTNLSYI-G
                                                            10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
           20        30         40        50             60

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
       70        80        90        100       110       120

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
       130       140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
            190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
            250       260       270       280       290       300

2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
            310       320       330

>>2oji:A                                                  (344 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:5-208)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2oji:A                                          RGQVFDVGPRYTNLSYI-G
                                                        10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
       20        30        40         50             60        70

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
             80        90        100       110       120       130

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
             140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
        190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
        250       260       270       280       290       300

2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
        310       320       330       340

.......
I show a part of the file...if I want for example only that two
alignment? are there moduls to parse...because I've tried to parse
whit regex but....without results :-(....
If anyone has suggestion for muduls or anything else, I'll be very
happy to learn
thanks
Paola


From giles.weaver at googlemail.com  Tue Jun 30 07:28:25 2009
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Tue, 30 Jun 2009 12:28:25 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>

I'm developing a transcriptomics database for use with next-gen data, and
have found processing the raw data to be a big hurdle.

I'm a bit late in responding to this thread, so most issues have already
been discussed. One thing that hasn't been mentioned is removal of adapters
from raw Illumina sequence. This is a PITA, and I'm not aware of any well
developed and documented open source software for removal of adapters (and
poor quality sequence) from Illumina reads.

My current Illumina sequence processing pipeline is an unholy mix of
biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting
the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure
perl to trim the poor quality sequence from each read, and bioperl with
emboss to remove the adapter sequence. I'm aware that the pipeline contains
bugs and would like to simplify it, but at least it does work...

Ideally I'd like to replace as much of the pipeline as possible with
bioperl/bioperl-run, but this isn't currently possible due to both a lack of
features and poor performance. I'm sure the features will come with time,
but the performance is more of a concern to me. I wonder if Bio::Moose might
be used to alleviate some of the performance issues? Might next-gen modules
be an ideal guinea pig for Bio::Moose?

For my purposes the tools that would love to see supported in
bioperl/bioperl-run are:

   - next-gen sequence quality parsing (to output phred scores)
   - sequence quality based trimming
   - sequencing adapter removal
   - filtering based on sequence complexity (repeats, entropy etc)
   - bioperl-run modules for bowtie etc.

Obviously all of these need to be fast!
I'd love to muck in, but I doubt I'll contribute much before
Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares!

Regarding trimming bad quality bases (see comments from Tristan Lefebure)
from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be
much faster than a primarily bioperl based implementation. I found
Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My
current code trims ~1300 sequences/second, including unzipping the raw data
and converting it to sanger fastq with biopython. Processing an entire
sequencing run with the whole pipeline takes in the region of 6-12h.

Hope this looooong post was of interest to someone!

Giles

2009/6/17 Tristan Lefebure <tristan.lefebure at gmail.com>

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
>
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan
>


From manchunjohn-ma at uiowa.edu  Tue Jun 30 12:17:08 2009
From: manchunjohn-ma at uiowa.edu (John M.C. Ma)
Date: Tue, 30 Jun 2009 11:17:08 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker crashes perl
Message-ID: <5486b2980906300917m20e8cd06sbaee207aed3a27c9@mail.gmail.com>

Hi everyone,

(OS: OpenSuSE 11.1, Versions: Perl:v5.10.0-i586-linux-thread-multi,
Bioperl: 1.6.0-cpan, Bioperl-run: 1.6.1-cpan, Ensembl: Ver 54-cvs)

This is the first time I use Bio::Tools::Run::RepeatMasker, and it
came with a strange crash that I can't think of a reason. I would
rather think it's my problem?

My code involved pulling a sequence from Ensembl-variation, put it
into a PrimarySeq Object and run RepeatMasker on it:

use strict;
use warnings;
use Bio::SeqIO;
use Bio::PrimarySeq;
use Bio::Tools::Run::RepeatMasker;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Variation::Variation;
[snips most Ensembl code as the sequence itself looks OK]
	my $ref_allele=$snp_obj->five_prime_flanking_seq.${$snp_obj->get_all_Alleles}[0]->allele.$snp_obj->three_prime_flanking_seq;
	my $mask_seq=Bio::PrimarySeq->new (-seq=>$ref_allele);
	my $rmasker_handle=Bio::Tools::Run::RepeatMasker->new(-species=>'rat',-noisy=>"1");
	my @masked_features=$rmasker_handle->run($mask_seq);
	my $masked_seq=$rmasker_handle->run;

And when I let the wrapper run, perl crashed with these warnings:

--------------------- WARNING ---------------------
MSG: RepeatMasker didn't find any repetitive sequences

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/EWLAmIVymd/wByClB8iqr.masked: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::Root::IO::_initialize_io
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/IO.pm:310
STACK: Bio::SeqIO::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:450
STACK: Bio::SeqIO::fasta::_initialize
/usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO/fasta.pm:81
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:347
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:373
STACK: Bio::Tools::Run::RepeatMasker::_run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:320
STACK: Bio::Tools::Run::RepeatMasker::run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:260
STACK: main::SeqList
/home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:40
STACK: /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:63
-----------------------------------------------------------

What could happen?

Cheers,

John Ma,
University of Iowa


From cjfields at illinois.edu  Tue Jun 30 13:46:27 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 12:46:27 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
Message-ID: <6723B5A0-9A21-4851-BD88-0BA3CC107439@illinois.edu>


On Jun 30, 2009, at 6:28 AM, Giles Weaver wrote:

> I'm developing a transcriptomics database for use with next-gen  
> data, and
> have found processing the raw data to be a big hurdle.
>
> I'm a bit late in responding to this thread, so most issues have  
> already
> been discussed. One thing that hasn't been mentioned is removal of  
> adapters
> from raw Illumina sequence. This is a PITA, and I'm not aware of any  
> well
> developed and documented open source software for removal of  
> adapters (and
> poor quality sequence) from Illumina reads.
>
> My current Illumina sequence processing pipeline is an unholy mix of
> biopython, bioperl, pure perl, emboss and bowtie. Biopython for  
> converting
> the Illumina fastq to Sanger fastq, bioperl to read the quality  
> values, pure
> perl to trim the poor quality sequence from each read, and bioperl  
> with
> emboss to remove the adapter sequence. I'm aware that the pipeline  
> contains
> bugs and would like to simplify it, but at least it does work...

My local bioperl is working with FASTQ parsing of Sanger and Illumina  
(but not solexa yet).  I'll commit what I have today, and we should be  
able to add in solexa soon.  We'll also need to add in write_seq  
support.

> Ideally I'd like to replace as much of the pipeline as possible with
> bioperl/bioperl-run, but this isn't currently possible due to both a  
> lack of
> features and poor performance. I'm sure the features will come with  
> time,
> but the performance is more of a concern to me. I wonder if  
> Bio::Moose might
> be used to alleviate some of the performance issues? Might next-gen  
> modules
> be an ideal guinea pig for Bio::Moose?

We should get FASTQ working in core first then optimize on speed (as  
Elia previously pointed out).  We can do that within the actual SeqIO  
parser using a few simple tricks. For instance my local  
Bio::SeqIO::fastq has a reconfigured next_seq to call an iterator that  
returns raw processed data as a simple hash ref; users have access to  
that method, so if one wanted they could retrieve the raw data  
directly, or pass it through a filter that only creates seq instances  
one wants on the fly (that would be where your quality checks, adaptor  
modification, etc. fit in).

In the end it might be to wrap a C/C++-based solution for speed.  As  
mentioned previously a C-based parser exists from Sanger Centre that  
we could incorporate in some fashion, but I would like if it were able  
to report back file position for fast indexing.  The code is fairly  
simple so it should be too hard to incorporate that in somehow.

Just so there is no confusion, Bio::Moose is an attempt to both lay  
out plans for perl6 and deal with inheritance issues within bioperl  
now. It's still in very early development and may not see a release  
until Dec. at the very earliest, it will be an alpha release then, and  
likely won't have every major class represented at that point.  It's  
also not intended to be backwards-compatible with bioperl core.  It  
may help, but that's not an absolute certainty.  As for bioperl6, it  
will be pre-alpha until perl6 spec reaches a stable draft and we have  
an active implementation.

> For my purposes the tools that would love to see supported in
> bioperl/bioperl-run are:
>
>   - next-gen sequence quality parsing (to output phred scores)
>   - sequence quality based trimming
>   - sequencing adapter removal
>   - filtering based on sequence complexity (repeats, entropy etc)
>   - bioperl-run modules for bowtie etc.
>
> Obviously all of these need to be fast!
> I'd love to muck in, but I doubt I'll contribute much before
> Bio::Moose/bioperl6, as the (bio)perl object system gives me  
> nightmares!

One can only read a file so fast (even with a highly optimized C/C++  
based parser), but I don't think that will be the limiting factor as  
much as object instantiation.

> Regarding trimming bad quality bases (see comments from Tristan  
> Lefebure)
> from Solexa/Illumina reads, I did find a mixed pure/bioperl solution  
> to be
> much faster than a primarily bioperl based implementation. I found
> Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow.  
> My
> current code trims ~1300 sequences/second, including unzipping the  
> raw data
> and converting it to sanger fastq with biopython. Processing an entire
> sequencing run with the whole pipeline takes in the region of 6-12h.

Right, hence coming up with a 'pre-filter' for raw data (hash refs)  
prior to object instantiation to speed things up.  This will be a bit  
easier with Bio::Moose as we can introspect attributes via the meta  
class, but this will be a while yet.

> Hope this looooong post was of interest to someone!
>
> Giles

It's always good to hear about such issues and what one expects.

chris


From cjfields at illinois.edu  Tue Jun 30 17:58:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 16:58:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <A9776DF4-CE78-4973-9ADC-7594A3DAA118@illinois.edu>

All,

I have committed the first run at adding Illumina/Solexa parsing for  
FASTQ along with tests.  It's very possible the quality scores are  
off, particularly for Solexa (Illumina 1.0), so test away and let me  
know if anything pops up (should be a quick fix).  Along with that is  
a small commit to Bio::SeqIO so that we can add format variants (see  
below for an example).  write_seq/write_qual/write_fastq will likely  
not work as expected as I haven't touched them; they are to be tackled  
next.

For faster parsing I have also added a next_dataset method that  
returns a hash reference to the parsed data instead of an object; this  
hash includes quality scores.  This method is called by next_seq and  
the relevant data is passed in to the sequence factory directly; one  
could do something like the following to filter sequences as needed:

use Modern::Perl;
use Bio::SeqIO;
use Bio::Seq::SeqFactory;

my $file = shift;

# same as (-format   => 'fastq', -variant => 'illumina')
my $in = Bio::SeqIO->new(-file     => $file,
                          -format   => 'fastq-illumina');

my $factory = Bio::Seq::SeqFactory->new(-type => 'Bio::Seq::Quality');

while (my $data = $in->next_dataset) {
     next if seq_is_crap($data);
     my $seq = $factory->create(%$data);
}

sub seq_is_crap { # filter here
}


chris


From maj at fortinbras.us  Tue Jun 30 21:41:16 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 30 Jun 2009 21:41:16 -0400
Subject: [Bioperl-l] Parsing a FASTA file (Was:  Bioperl-l Digest, Vol 74,
	Issue 25)
In-Reply-To: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
	<e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
Message-ID: <9D386274308C4DF98E38918477801541@NewLife>

Hi Paola, 

You want to try Bio::SearchIO, I think. It's not quite clear what you 
want to do, but here's an example of what you can do: 

Get all high-scoring pairs ( the mini-alignments ) involving
the database sequence called "2ojg:A"--

 use Bio::SearchIO;
 
 my $io = Bio::SearchIO->new(-format=>'fasta', -file=>'yourfile.fasta');
 my $result = $io->next_result;
 my @desired_hsps;

 while ( my $hit = $result->next_hit ) {
   push @desired_hsps, grep { $_->subject->seq_id =~ /2ojg:A/ } $hit->hsps;
 }
 
 # now all your desired hsps are in the array @desired_hsps;
 # you can get Bio::SimpleAlign objects from them all, for example:
 my @aligns = map { $_->get_aln } @desired_hsps;
 #...and lots of other things...

Look at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
and http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods 
for a nice introduction to the Bio::SearchIO system by its authors. They 
use a blast output as an example, but everything applies to fasta output 
as well.

You didn't waste your time writing regexps, by the way. For a Perl
student, that kind of work is like money in the bank.

cheers, 
Mark
      

----- Original Message ----- 
From: "Paola Bisignano" <paola.bisignano at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 30, 2009 5:12 AM
Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25


> Hi,
> I need a little help, to parse a file, but I tried to search some
> modules of bioperl, but there are a lot, and I don't know how to
> start, I find moduls for all db, for different web site, but not for
> my favorite PDBsum....so I parsed a lot of thing on my own, even if I
> was new in learning perl....but now I'm waiting for help...because I
> need to parse a FASTA file, resulted from aligned sequences...I need
> to extract the aligned sequences, only for the pdb in my lista....
> 
> 
> my fasta file is like:
> 
> Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
>  1>>>Sequence 3e7e:A - 333 aa
> Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
> 17840403 residues in 79353 sequences
> 
>       opt      E()
> < 20   286     0:===
>  22     1     0:=          one = represents 135 library sequences
>  24     1     0:=
>  26     0     2:*
>  28    21    18:*
>  30    36   109:*
>  32   237   421:== *
>  34   956  1140:========*
>  36  1924  2342:===============  *
>  38  3591  3871:=========================== *
>  40  4904  5400:=====================================  *
>  42  6750  6600:================================================*=
>  44  7145  7281:=====================================================*
>  46  8047  7416:======================================================*=====
> .........
> 
>>>2np8:A                                                  (159 aa)
> initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
> Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
> overlap (59-204:13-153)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                                 ::
> 2np8:A                                               QWALEDFEIGRPLG
>                                                             10
> 
>               70          80        90         100        110
> Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
>       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
> 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
>           20        30        40        50        60        70
> 
>         120         130       140       150       160       170
> Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
>        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
> 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
>             80        90       100            110       120
> 
>           180       190        200       210       220       230
> Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
>       : ::::.:..::      ::: : . :.: :.
> 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
>       130             140       150
> 
>            240       250       260       270       280       290
> Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP
> 
>            300       310       320       330
> Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
>>>2ojg:A                                                  (337 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:1-204)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2ojg:A                                              FDVGPRYTNLSYI-G
>                                                            10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>           20        30         40        50             60
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>       70        80        90        100       110       120
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>       130       140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>            190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>            250       260       270       280       290       300
> 
> 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
>            310       320       330
> 
>>>2oji:A                                                  (344 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:5-208)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2oji:A                                          RGQVFDVGPRYTNLSYI-G
>                                                        10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>       20        30        40         50             60        70
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>             80        90        100       110       120       130
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>             140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>        190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>        250       260       270       280       290       300
> 
> 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
>        310       320       330       340
> 
> .......
> I show a part of the file...if I want for example only that two
> alignment? are there moduls to parse...because I've tried to parse
> whit regex but....without results :-(....
> If anyone has suggestion for muduls or anything else, I'll be very
> happy to learn
> thanks
> Paola
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Tue Jun 30 23:48:11 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 22:48:11 -0500
Subject: [Bioperl-l] FASTQ output
Message-ID: <A6217D90-4861-4EEB-B2D8-F3565B81EB4B@illinois.edu>

I am working on FASTQ output and noticed a real oddity.  Apparently,  
there are three write_* methods for this module, with the odd choice  
of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ.   
write_qual() writes Qual format:

http://www.bioperl.org/wiki/Qual_sequence_format

and write_fastq() writes FASTQ.  Now, maybe it's just me, but I think  
an implementation of write_seq() for a specific format should probably  
output that format and not something else entirely unexpected.  Also,  
is there a reason for duplicating output code for qual and FASTA  
output within Bio::SeqIO::fastq, i.e. should we call Bio::SeqIO::fasta/ 
qual instead?

I would consider the write_seq() issue a bug, the others are really  
just maintenance issues.  Anyone have problems with me changing that  
up a bit?

chris


From upgrade32009 at live.com  Mon Jun 29 20:07:57 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:07:57 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780056@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From upgrade32009 at live.com  Mon Jun 29 20:10:43 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:10:43 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780088@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From Jonas_Schaer at gmx.de  Sun Jun 28 06:15:18 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Sun, 28 Jun 2009 12:15:18 +0200
Subject: [Bioperl-l] different results with remote-blast skript
Message-ID: <D6BA00577BC94BDFAB04DF5EF43E9598@jonas>

Hi again :)
please, I only have this little question:
why do I get different results with my remote::blast perl skript then on the ncbi blast homepage?
I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'.
PLEASE help me,
thank you in advance,
Jonas

ps: my skript:
################################################################################
use Bio::Seq::SeqFactory;
  use Bio::Tools::Run::RemoteBlast;
  use strict;
  my @blast_report;
  my $prog = 'blastp';
  my $db   = 'nr';
  my $e_val= '1e-10';
  #my $e_val= '10';
  my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
 $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
  
  my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE';
  #$v is just to turn on and off the messages
  my $v = 1;
  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');   
  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); 
  my $filename='temp2.out';
  my $r = $factory->submit_blast($seq);
  print STDERR "waiting..." if( $v > 0 );
    while ( my @rids = $factory->each_rid ) 
    {
        foreach my $rid ( @rids ) 
        {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) 
            {
                if( $rc < 0 ) 
                {
                    $factory->remove_rid($rid);
                }
                print STDERR "." if ( $v > 0 );
            } 
                else    
                {
                    my $result = $rc->next_result();
                    $factory->save_output($filename);
                    $factory->remove_rid($rid);
                    print "\nQuery Name: ", $result->query_name(), "\n";
                    while ( my $hit = $result->next_hit ) 
                    {
                        next unless ( $v > 0);
                        print "\thit name is ", $hit->name, "\n";
                        while( my $hsp = $hit->next_hsp ) 
                        {
                            print "\t\tscore is ", $hsp->score, "\n";
                        }
                    }
                }
        }
   
    
    }
@blast_report = get_file_data ($filename);
return @blast_report;
##################################################################################


From stevey_mac2k2 at hotmail.com  Sun Jun 28 06:53:04 2009
From: stevey_mac2k2 at hotmail.com (stephenmcgowan1)
Date: Sun, 28 Jun 2009 03:53:04 -0700 (PDT)
Subject: [Bioperl-l]  Installing Bioperl on Mac OS X 10.5.7
Message-ID: <24240541.post@talk.nabble.com>


Hi,

I'm new to the mac way of working and programming aswell as the UNIX
(Terminal) environment. I will describe in as much detail as i can as to
what i have done so far in terms of bioperl installation and try to describe
what my problem is.

Ok so first of all i have downloaded and extracted the files BioPerl-1.6.0
and BioPerl-db-1.6.0 from the site. I have these two folders saved in a
folder on my OSX desktop called "ExerciseTwo".

After doing this, i open up Terminal and locate BioPerl-1.6.0.

i then run:

perl Build.PL (i have also tried sudo perl Build.pl)

i then run ./Build test (again tried this with sudo ./Build test)

after running the build test, i receive the feedback:

Failed Test                              Stat Wstat Total Fail  Failed  List
of Failed
-------------------------------------------------------------------------------
t/AlignIO/AlignIO.t                    255 65280    28   42 150.00%  8-28
t/AlignIO/arp.t                         255 65280    48   92 191.67%  3-48
t/Annotation/Annotation.t          255 65280   159   83  52.20%  9 117
119-159
t/ClusterIO/SequenceFamily.t    255 65280    19   34 178.95%  3-19
t/LocalDB/Flat.t                       255 65280    24   20  83.33%  15-24
t/LocalDB/Index.t                     255 65280    64   66 103.12%  32-64
t/RemoteDB/BioFetch.t              255 65280    36    2   5.56%  36
t/RemoteDB/DB.t                      3   768   113   59  52.21%  83-113
t/RemoteDB/EUtilities.t              1   256   309    1   0.32%  307
t/SeqIO/Handler.t                     255 65280   550 1098 199.64%  2-550
t/SeqIO/chaos.t                        1   256     8    1  12.50%  1
t/SeqIO/swiss.t                        255 65280   240  479 199.58%  1-240
t/SeqTools/GuessSeqFormat.t          1   256    49    2   4.08%  25 50
t/Tools/Analysis/Protein/ELM.t     255 65280    15   22 146.67%  5-15
t/Tools/Analysis/Protein/Scansite  255 65280    14   20 142.86%  5-14
t/Tools/Run/WrapperBase.t            1   256    27    1   3.70%  20
44 tests and 250 subtests skipped.
Failed 16/318 test scripts, 94.97% okay. 1015/15518 subtests failed, 93.46%
okay

Ok so going off this i then decide to run the install: ./Build install

This is a segment of the info i receive back in Terminal after the install:

Manifying blib/script/bp_pairwise_kaks.pl ->
blib/bindoc/bp_pairwise_kaks.pl.1
Manifying blib/script/bp_seqret.pl -> blib/bindoc/bp_seqret.pl.1
Manifying blib/script/bp_seq_length.pl -> blib/bindoc/bp_seq_length.pl.1
Manifying blib/script/bp_query_entrez_taxa.pl ->
blib/bindoc/bp_query_entrez_taxa.pl.1
Manifying blib/script/bp_load_gff.pl -> blib/bindoc/bp_load_gff.pl.1
Manifying blib/script/bp_fastam9_to_table.pl ->
blib/bindoc/bp_fastam9_to_table.pl.1
Manifying blib/script/bp_process_wormbase.pl ->
blib/bindoc/bp_process_wormbase.pl.1
Manifying blib/script/bp_nrdb.pl -> blib/bindoc/bp_nrdb.pl.1
Manifying blib/script/bp_composite_LD.pl -> blib/bindoc/bp_composite_LD.pl.1
Manifying blib/script/bp_classify_hits_kingdom.pl ->
blib/bindoc/bp_classify_hits_kingdom.pl.1
Manifying blib/script/bp_blast2tree.pl -> blib/bindoc/bp_blast2tree.pl.1
Manifying blib/script/bp_heterogeneity_test.pl ->
blib/bindoc/bp_heterogeneity_test.pl.1
Manifying blib/script/bp_generate_histogram.pl ->
blib/bindoc/bp_generate_histogram.pl.1
Manifying blib/script/bp_process_gadfly.pl ->
blib/bindoc/bp_process_gadfly.pl.1
mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

now these bp_files such as bp_nrdb.pl should be installed onto my Unix
somewhere? but i'm not sure if the install has worked, and these files saved
to the made directory, as is the case here:

mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

is there something wrong with my install? i think /usr/local/share should be
created and then all of these bp_files should go into this folder. Is there
anything that i'm doing wrong here?

Thanks

Stephen.


-- 
View this message in context: http://www.nabble.com/Installing-Bioperl-on-Mac-OS-X-10.5.7-tp24240541p24240541.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From w.bryant at ucl.ac.uk  Mon Jun  1 04:06:58 2009
From: w.bryant at ucl.ac.uk (Will Bryant)
Date: Mon, 01 Jun 2009 09:06:58 +0100
Subject: [Bioperl-l] Extract genomic data from GenBank
Message-ID: <4A238C22.9090604@ucl.ac.uk>

I'm trying to retrieve the complete GenBank format sequence file for a 
specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not 
exist' errors, even when trying the example gi '405830'.  The script was 
running fine September last year, but when I came back to it this week 
it wasn't working.  Am I missing something obvious?

In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100

Code:

#!/usr/bin/perl -w

use strict;
use Bio::Perl;
use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank');

my $straincomp = $gb->get_Seq_by_gi('405830');

my $seqout = 0;

#my $set_output_file = '$seqout = Bio::SeqIO->new( -format => 
\'genbank\', -file => 
\'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');';

#print $set_output_file;
eval ($set_output_file);

$seqout -> write_seq($straincomp);


Error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: gi does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi 
c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209
STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12
-----------------------------------------------------------

Many thanks,

Will Bryant.


From David.Messina at sbc.su.se  Mon Jun  1 05:04:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 1 Jun 2009 11:04:40 +0200
Subject: [Bioperl-l] Extract genomic data from GenBank
In-Reply-To: <4A238C22.9090604@ucl.ac.uk>
References: <4A238C22.9090604@ucl.ac.uk>
Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com>

Hey Will,
I think there have been API changes in GenBank's remote query interface that
have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to
BioPerl 1.6 and see if that works for you.

(Note that I've only glanced at your code -- I'm assuming that's not the
problem since it worked fine for you before.)


Dave


From fontanez at fas.harvard.edu  Mon Jun  1 08:41:06 2009
From: fontanez at fas.harvard.edu (Kristina Fontanez)
Date: Mon, 1 Jun 2009 08:41:06 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<4A205502.2030701@sendu.me.uk>
	<024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu>
	<4A205D46.4090105@sendu.me.uk>
	<C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu>

Hey everyone-

Thanks for all the advice. I reinstalled Xcode tools, installed Fink  
and downloaded bioperl successfully. It's now working smoothly.

Thanks again,
Kristina
---------------------------------------------------------------
Kristina Fontanez
PhD candidate
Department of Organismic and Evolutionary Biology
Cavanaugh lab
Harvard University
16 Divinity Ave.
Cambridge, MA 02138

tel: 617-495-1138
fax: 617-496-6933
email: fontanez at fas.harvard.edu


On May 29, 2009, at 10:40 PM, Chris Fields wrote:

Kristina,

You aren't running as superuser:

 > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez 
$ cpan

You'll need to run cpan using 'sudo cpan' if installing modules  
anywhere requiring superuser permissions.

chris

On May 29, 2009, at 5:10 PM, Sendu Bala wrote:

> Kristina Fontanez wrote:
>> Hello everyone-
>> Sendu - I took your advice but doing Install Bundle::CPAN did not  
>> take care of the dependencies. It still failed. See attached txt  
>> file with my terminal output. Does anyone have any idea how this  
>> might be?
>
> From reading the output it seems like perhaps you don't have 'make'  
> or there is something wrong when using it. If you're on a mac you  
> may need to install the dev tools. Someone else want to jump in here  
> with advice?
>
> Also, check your CPAN configuration to ensure it is trying to use  
> the correct make commands. ('o conf' etc.)
>
>
>> If I wanted to wipe all perl from my computer and simply start  
>> over, how might this be accomplished?
>
> Don't do that. At least not until you know you have a working make  
> setup.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun  1 10:55:50 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 10:55:50 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife>

Hi All 
Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. 
cheers and thanks for all input--
Mark


From cjfields at illinois.edu  Mon Jun  1 11:21:30 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 10:21:30 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>

A autogenerated passthrough Makefile.PL is generated with the  
distribution:

http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL

We may remove that in future releases, but it should work regardless  
(i.e. call Module::Build and Build.PL).  I'm pretty convinced that the  
issue was permissions-based at heart.  Note Kristina ran 'cpan'  
instead of 'sudo cpan' to invoke the shell, so the shell is using  
current user config instead of su for installation.  You need to use  
'sudo' to install anything /Library/Perl on Mac (unless you are  
already 'root', but on recent OS X version logging in as 'root' is  
turned off).

I just noticed nothing is mentioned along these lines in the  
installation docs, so we'll need to update those.

chris

On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:

> Hi Kristina,
>
> [Don't forget to reply-all, so the list stays in the loop. Many many  
> more helpers
> there.]
>
> Apparently cpan can't make the Makefile, but can download and expand  
> the
> library directories, in your .cpan directory (see edited highlights  
> below).
>
> Let's appeal to the BioPerl brethren/sestren---answers?
>
> MAJ
>
>
> term dump:
>
> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
> Terminal does not support AddHistory.
>
> cpan shell -- CPAN exploration and modules installation (v1.7602)
> ReadLine support available (try 'install Bundle::CPAN')
>
> cpan> install Test::Harness
> CPAN: Storable loaded ok
> Going to read /Users/kristinafontanez/.cpan/Metadata
> Database was generated on Fri, 29 May 2009 11:27:00 GMT
> Running install for module Test::Harness
> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
> CPAN: Digest::MD5 loaded ok
> CPAN: Compress::Zlib loaded ok
> Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ 
> ANDYA/Test-Harness-3.17.tar.gz ok
> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
> Test-Harness-3.17/
> Test-Harness-3.17/Build.PL
> ...
> Test-Harness-3.17/xt/perls/sample-tests/
> Test-Harness-3.17/xt/perls/sample-tests/perl_version
> Removing previously used /Users/kristinafontanez/.cpan/build/Test- 
> Harness-3.17
>
> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>
> Checking if your kit is complete...
> Looks good
> Writing Makefile for Test::Harness
>   -- NOT OK
> Running make test
> Can't test without successful make
> Running make install
> make had returned bad status, install seems impossible
>
> cpan> install File::HomeDir
> ...[more of same]...
>
>
> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu 
> >
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Friday, May 29, 2009 3:56 PM
> Subject: Re: [Bioperl-l] problem with bioperl install
>
>
>> Mr. Jensen-
>>
>> Thank you for your help but unfortunately the installation of
>> Test::Harness etc didn't work. I copied my terminal output and
>> attached the file. Any advice on what's still going wrong?
>>
>> Thanks,
>> Kristina
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>>
>>
>>
>> ---------------------------------------------------------------
>> Kristina Fontanez
>> PhD candidate
>> Department of Organismic and Evolutionary Biology
>> Cavanaugh lab
>> Harvard University
>> 16 Divinity Ave.
>> Cambridge, MA 02138
>>
>> tel: 617-495-1138
>> fax: 617-496-6933
>> email: fontanez at fas.harvard.edu
>>
>>
>>
>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>
>> The message says you are first updating your CPAN.pm.
>> That module needs modules you don't have, so
>>
>> use cpan to install the dependencies you don't have, viz.
>>>   Test::Harness
>>>   File::HomeDir
>>
>> $ cpan
>>> install Test::Harness
>> etc.
>> Then install CPAN.pm again (or run the Bioperl install again).
>>
>> Lather, rinse, repeat the install of Bioperl until it completes
>> without errors.
>>
>> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu
>> >
>> To: <bioperl-l at bioperl.org>
>> Sent: Friday, May 29, 2009 3:07 PM
>> Subject: [Bioperl-l] problem with bioperl install
>>
>>
>>> Hello-
>>>
>>> I am trying to install bioperl and I ran into some problems. See
>>> list  below.
>>>
>>>
>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>
>>> Checking if your kit is complete...
>>> Looks good
>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>> Writing Makefile for CPAN
>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>> CPAN-1.94.tar.gz] -----
>>>   Test::Harness
>>>   File::HomeDir
>>>
>>>
>>> How can I fix this?
>>>
>>>
>>> Thanks,
>>> Kristina
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jun  1 12:14:07 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 11:14:07 -0500
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>

I think, as long is it doesn't significantly impact SearchIO  
performance wise (from reading the HOWTO I can't see how it will), I  
say commit away. In fact, I consider this a bug fix that should be in  
the next 1.6 point release. We should add deprecation warnings where  
needed for 1.7...

chris

On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:

> Hi All
> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  
> exhibition of B::S::Tiling, use cases, code snippets, design,  
> implementation and algorithm discussions. We're just about ready to  
> port over to core from bioperl-dev; please shout out if this is not  
> a good idea.
> cheers and thanks for all input--
> Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.bolser at gmail.com  Mon Jun  1 12:27:30 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Mon, 1 Jun 2009 17:27:30 +0100
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>

2009/6/1 Chris Fields <cjfields at illinois.edu>:

...
> for installation. ?You need to use 'sudo' to install anything /Library/Perl
> on Mac (unless you are already 'root', but on recent OS X version logging in
...

local::lib is supposed to take care of this. Is this broken on Mac?
Building stuff as root is generally considered to be bad.


> I just noticed nothing is mentioned along these lines in the installation
> docs, so we'll need to update those.

I tried to write down a clear 'recipe' for getting things installed
(this was actually on the GMod wiki). I really think the install docs
could be improved. Sometimes less verbose is better.

Dan

> chris
>
> On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:
>
>> Hi Kristina,
>>
>> [Don't forget to reply-all, so the list stays in the loop. Many many more
>> helpers
>> there.]
>>
>> Apparently cpan can't make the Makefile, but can download and expand the
>> library directories, in your .cpan directory (see edited highlights
>> below).
>>
>> Let's appeal to the BioPerl brethren/sestren---answers?
>>
>> MAJ
>>
>>
>> term dump:
>>
>> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
>> Terminal does not support AddHistory.
>>
>> cpan shell -- CPAN exploration and modules installation (v1.7602)
>> ReadLine support available (try 'install Bundle::CPAN')
>>
>> cpan> install Test::Harness
>> CPAN: Storable loaded ok
>> Going to read /Users/kristinafontanez/.cpan/Metadata
>> Database was generated on Fri, 29 May 2009 11:27:00 GMT
>> Running install for module Test::Harness
>> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> CPAN: Digest::MD5 loaded ok
>> CPAN: Compress::Zlib loaded ok
>> Checksum for
>> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> ok
>> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
>> Test-Harness-3.17/
>> Test-Harness-3.17/Build.PL
>> ...
>> Test-Harness-3.17/xt/perls/sample-tests/
>> Test-Harness-3.17/xt/perls/sample-tests/perl_version
>> Removing previously used
>> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17
>>
>> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>>
>> Checking if your kit is complete...
>> Looks good
>> Writing Makefile for Test::Harness
>> ?-- NOT OK
>> Running make test
>> Can't test without successful make
>> Running make install
>> make had returned bad status, install seems impossible
>>
>> cpan> install File::HomeDir
>> ...[more of same]...
>>
>>
>> ----- Original Message ----- From: "Kristina Fontanez"
>> <fontanez at fas.harvard.edu>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Friday, May 29, 2009 3:56 PM
>> Subject: Re: [Bioperl-l] problem with bioperl install
>>
>>
>>> Mr. Jensen-
>>>
>>> Thank you for your help but unfortunately the installation of
>>> Test::Harness etc didn't work. I copied my terminal output and
>>> attached the file. Any advice on what's still going wrong?
>>>
>>> Thanks,
>>> Kristina
>>>
>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>>
>>> The message says you are first updating your CPAN.pm.
>>> That module needs modules you don't have, so
>>>
>>> use cpan to install the dependencies you don't have, viz.
>>>>
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>
>>> $ cpan
>>>>
>>>> install Test::Harness
>>>
>>> etc.
>>> Then install CPAN.pm again (or run the Bioperl install again).
>>>
>>> Lather, rinse, repeat the install of Bioperl until it completes
>>> without errors.
>>>
>>> ----- Original Message ----- From: "Kristina Fontanez"
>>> <fontanez at fas.harvard.edu
>>> >
>>> To: <bioperl-l at bioperl.org>
>>> Sent: Friday, May 29, 2009 3:07 PM
>>> Subject: [Bioperl-l] problem with bioperl install
>>>
>>>
>>>> Hello-
>>>>
>>>> I am trying to install bioperl and I ran into some problems. See
>>>> list ?below.
>>>>
>>>>
>>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>>
>>>> Checking if your kit is complete...
>>>> Looks good
>>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>>> Writing Makefile for CPAN
>>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>>> CPAN-1.94.tar.gz] -----
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>>
>>>>
>>>> How can I fix this?
>>>>
>>>>
>>>> Thanks,
>>>> Kristina
>>>> ---------------------------------------------------------------
>>>> Kristina Fontanez
>>>> PhD candidate
>>>> Department of Organismic and Evolutionary Biology
>>>> Cavanaugh lab
>>>> Harvard University
>>>> 16 Divinity Ave.
>>>> Cambridge, MA 02138
>>>>
>>>> tel: 617-495-1138
>>>> fax: 617-496-6933
>>>> email: fontanez at fas.harvard.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Jun  1 13:15:42 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 12:15:42 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>


On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote:

> 2009/6/1 Chris Fields <cjfields at illinois.edu>:
>
> ...
>> for installation.  You need to use 'sudo' to install anything / 
>> Library/Perl
>> on Mac (unless you are already 'root', but on recent OS X version  
>> logging in
> ...
>
> local::lib is supposed to take care of this. Is this broken on Mac?
> Building stuff as root is generally considered to be bad.

You can install to a local lib, yes, but cpan needs to be manually  
configured to do this; I don't think it is automatically configured to  
do so in OS X, eg. it defaults to /Library/Perl.

Frankly, I sidestep the whole issue with my own custom perl  
installation, but that's me.

>> I just noticed nothing is mentioned along these lines in the  
>> installation
>> docs, so we'll need to update those.
>
> I tried to write down a clear 'recipe' for getting things installed
> (this was actually on the GMod wiki). I really think the install docs
> could be improved. Sometimes less verbose is better.
>
> Dan

True, but I would much rather have reasonable instructions that  
outline most installation issues than ones that aren't detailed enough.

My thought is to strip down the INSTALL doc that comes with BioPerl  
down to the essentials and point to the wiki for the more detailed  
ones (including problems encountered).  It's too hard to maintain both  
and backport the wiki into plain text.

chris


From maj at fortinbras.us  Mon Jun  1 15:03:05 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 15:03:05 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
	<6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
Message-ID: <AABEFA992F2345548C861ADDFDC50132@NewLife>

Thanks, Chris--

Bio::Search::Tiling is now ported to core; the snapshot of the ported version is 
in bioperl-dev/tags/tiling-port-to-core-060109.
Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets 
BIOPERL_TILING_EXHAUSTIVE_TESTS .

Cry 'Havoc!' and let slip the dogs of war...

MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Sendu Bala" <bix at sendu.me.uk>; "Dave Messina" <dave at davemessina.com>; 
"BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, June 01, 2009 12:14 PM
Subject: Re: [Bioperl-l] a HOWTO for Tiling


>I think, as long is it doesn't significantly impact SearchIO  performance wise 
>(from reading the HOWTO I can't see how it will), I  say commit away. In fact, 
>I consider this a bug fix that should be in  the next 1.6 point release. We 
>should add deprecation warnings where  needed for 1.7...
>
> chris
>
> On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:
>
>> Hi All
>> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  exhibition of 
>> B::S::Tiling, use cases, code snippets, design,  implementation and algorithm 
>> discussions. We're just about ready to  port over to core from bioperl-dev; 
>> please shout out if this is not  a good idea.
>> cheers and thanks for all input--
>> Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From koenvanderdrift at gmail.com  Mon Jun  1 18:22:23 2009
From: koenvanderdrift at gmail.com (Koen van der Drift)
Date: Mon, 1 Jun 2009 18:22:23 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>


On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:

> My thought is to strip down the INSTALL doc that comes with BioPerl  
> down to the essentials and point to the wiki for the more detailed  
> ones (including problems encountered).  It's too hard to maintain  
> both and backport the wiki into plain text.


Good idea, please then also update the file PLATFORMS. It has a link  
to a very outdated website for the installation of bioperl on OS X.  
And maybe a line + link to the bioperl wiki can be added that  
recommends the use of fink as an alternative to cpan?

cheers,

- Koen.


From cjfields at illinois.edu  Mon Jun  1 19:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 18:27:32 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
	<2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu>


On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote:

>
> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:
>
>> My thought is to strip down the INSTALL doc that comes with BioPerl  
>> down to the essentials and point to the wiki for the more detailed  
>> ones (including problems encountered).  It's too hard to maintain  
>> both and backport the wiki into plain text.
>
>
> Good idea, please then also update the file PLATFORMS. It has a link  
> to a very outdated website for the installation of bioperl on OS X.  
> And maybe a line + link to the bioperl wiki can be added that  
> recommends the use of fink as an alternative to cpan?
>
> cheers,
>
> - Koen.

Done. I've added a ticket on bugzilla for tracking this so it doesn't  
get lost:

http://bugzilla.open-bio.org/show_bug.cgi?id=2846

chris


From shalabh.sharma7 at gmail.com  Tue Jun  2 10:44:25 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 10:44:25 -0400
Subject: [Bioperl-l] Refseq Hits
Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>

Hi All,
          This is not really a bioperl query, but i am really confused and
need some help.
I blasted some sequences against refseq database (locally). After parsing
the blast result what i noticed that some description fields contain two hit
names like:
hit_name ->    gi|71082715|ref|YP_265434.1|
Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
[Candidatus Pelagibacter ubique HTCC1002]

So besides giving me description for hit_name (HTCC 1062) its also giving me
HTCC 1002.
I will really appreciate if someone can help me out.

Thanks
Shalabh
_________________________________________________
Shalabh Sharma
Scientific Computing Professional Associate
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

phone: 706-542-0341
email: ssharmai at uga.edu


From jonathancrabtree at gmail.com  Tue Jun  2 11:04:33 2009
From: jonathancrabtree at gmail.com (Jonathan Crabtree)
Date: Tue, 2 Jun 2009 11:04:33 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>

Hi Shalabh-

I believe RefSeq is a non-redundant database, in which sequence entries with
identical sequences are merged and their descriptions are concatenated in
the FASTA defline.  If you look up the two accession numbers/gi numbers from
your search results I think you'll see that both are valid matches because
their polypeptide sequences are identical:

http://www.ncbi.nlm.nih.gov/protein/71082715
http://www.ncbi.nlm.nih.gov/protein/91762865

You're just getting a single match with two descriptions instead of two
matches with one description, but the sequence is the same and so, therefore
are the blast alignments.

Jonathan

On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>          This is not really a bioperl query, but i am really confused and
> need some help.
> I blasted some sequences against refseq database (locally). After parsing
> the blast result what i noticed that some description fields contain two
> hit
> names like:
> hit_name ->    gi|71082715|ref|YP_265434.1|
> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
> [Candidatus Pelagibacter ubique HTCC1002]
>
> So besides giving me description for hit_name (HTCC 1062) its also giving
> me
> HTCC 1002.
> I will really appreciate if someone can help me out.
>
> Thanks
> Shalabh
> _________________________________________________
> Shalabh Sharma
> Scientific Computing Professional Associate
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>
> phone: 706-542-0341
> email: ssharmai at uga.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Tue Jun  2 11:15:45 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 11:15:45 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>

Hi Jonathan,                  Your information is really helpful. Thanks a
lot.

-Shalabh


On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
jonathancrabtree at gmail.com> wrote:

>
> Hi Shalabh-
>
> I believe RefSeq is a non-redundant database, in which sequence entries
> with identical sequences are merged and their descriptions are concatenated
> in the FASTA defline.  If you look up the two accession numbers/gi numbers
> from your search results I think you'll see that both are valid matches
> because their polypeptide sequences are identical:
>
> http://www.ncbi.nlm.nih.gov/protein/71082715
> http://www.ncbi.nlm.nih.gov/protein/91762865
>
> You're just getting a single match with two descriptions instead of two
> matches with one description, but the sequence is the same and so, therefore
> are the blast alignments.
>
> Jonathan
>
> On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>          This is not really a bioperl query, but i am really confused and
>> need some help.
>> I blasted some sequences against refseq database (locally). After parsing
>> the blast result what i noticed that some description fields contain two
>> hit
>> names like:
>> hit_name ->    gi|71082715|ref|YP_265434.1|
>> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
>> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
>> protein
>> [Candidatus Pelagibacter ubique HTCC1002]
>>
>> So besides giving me description for hit_name (HTCC 1062) its also giving
>> me
>> HTCC 1002.
>> I will really appreciate if someone can help me out.
>>
>> Thanks
>> Shalabh
>> _________________________________________________
>> Shalabh Sharma
>> Scientific Computing Professional Associate
>> Department of Marine Sciences
>> University of Georgia
>> Athens, GA 30602-3636
>>
>> phone: 706-542-0341
>> email: ssharmai at uga.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From tristan.lefebure at gmail.com  Tue Jun  2 12:24:21 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 2 Jun 2009 12:24:21 -0400
Subject: [Bioperl-l] Creating a fastq format file?
In-Reply-To: <ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com>
	<ddde1f420904262242s533bd5abqeb9db75463d5a8f2@mail.gmail.com>
	<ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
Message-ID: <200906021224.21439.tristan.lefebure@gmail.com>

On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote:
> I convinced at least myself to the degree that I wrote
> the range_convert() method - with plenty of tests. I
> mention this now so that no-one else need to start
> thinking through all the edge values.
>
> :)
>
> I'll contribute it to the code base once there is a
> consensus of best way forward.
>

Heikki,

This thread has been quiet for a while, but I don't see 
anything new in Bio::Seq::Quality. Did we reach a consensus 
or are you waiting for some more discussion on the subject?

(I'm pretty impatient to see bioperl handling both sanger 
and illumina ranges on the fly!)

--Tristan

>     -Heikki
>
> 2009/4/27 Heikki Lehvaslaiho 
<heikki.lehvaslaiho at gmail.com>:
> >> I have tried to summarise this in a central place:
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >
> > Torsten,
> >
> > Thanks for putting this together. Very helpful.
> >
> > Do you have a plan of action?  Let me propose one for
> > BioPerl. It based on following assumptions:
> >
> > 1. There is multitude of different ways of coding
> > quality values out there. 2. Bio::Seq::Quality is
> > agnostic of any quality value range rules 3. The
> > emerging open standard is the Sanger fastq
> > specification 4. Open source programs use the Sanger
> > fastq specs
> >
> >
> > From these it follows that:
> >
> >
> > 1. BioPerl should support Sanger fastq standard
> >
> > 1.1. it already does and there are other SeqIO modules
> > for dealing with other non-fastq formats.
> >
> > 2. BioPerl should offer simple ways of converting
> > between quality range rules
> >
> > 2.1. Have a generic method accessible from
> > Bio::Seq::Quality with preset versions of the method
> > for converting between known variants (Sanger fastq and
> > the two Illumina versions)
> >
> > For example:
> >
> > range_convert ($from_lower, $from_upper, $to_lower,
> > $to_upper, $value) throw if $value < $from_lower or
> > $value > $from_upper return $newvalue
> >
> > range_convert_illumina2fastq(),
> > range_convert_fastq2illumina(),
> > range_convert_fastq2phred(),
> >  range_convert_phred2fastq()....
> >
> > (assuming that illumina 1.3 eq phred)
> >
> > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert
> > Illumina qualities into Sanger fastq on the fly
> >
> > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the
> > incoming stream of quality value range either
> > automatically or be given a keyword parameter
> > indicating the range.
> >
> > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.4. It would be useful but not absolutely necessary
> > for Bio::SeqIO::Fastq::write_seq to be able to write
> > out in Illumina ranges
> >
> >
> > What do you think?
> >
> >    -Heikki
> >
> > 2009/4/26 Torsten Seemann 
<torsten.seemann at infotech.monash.edu.au>:
> >>> > This might be a good place to ask the question:
> >>> > having looked at the fastq.pm page, is the fastq
> >>> > format defined (only) by a "@'" followed by
> >>>
> >>> a
> >>>
> >>> > sequence line and a "+" header followed by a
> >>> > quality line and the two headers have to agree? Now
> >>> > that Illumina is using phred scaling, are 'Sanger'
> >>> > and 'Illumina' versions the same?
> >>>
> >>> No they aren't the same, Illumina still encodes the
> >>> ascii as value + 64 and Sanger as value + 33.
> >>
> >> Illumina have now CHANGED how they calculate the
> >> quality value however in the last month or so... Their
> >> Q range used to be -5..40 mapped to ASCII 64+, but now
> >> they produce Q >= 0 and it is unclear if they start at
> >> 69 or 64 now...
> >>
> >> I have tried to summarise this in a central place:
> >>
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >>
> >> Corrections welcome!
> >>
> >>
> >> --Torsten Seemann
> >> --Victorian Bioinformatics Consortium, Dept.
> >> Microbiology, Monash University, AUSTRALIA
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> >    -Heikki
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +27 (0)714328090
> > Sent from Claremont, WC, South Africa


From Russell.Smithies at agresearch.co.nz  Tue Jun  2 16:56:26 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 3 Jun 2009 08:56:26 +1200
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
	<9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz>

The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Wednesday, 3 June 2009 3:16 a.m.
> To: Jonathan Crabtree
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Refseq Hits
> 
> Hi Jonathan,                  Your information is really helpful. Thanks a
> lot.
> 
> -Shalabh
> 
> 
> On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
> jonathancrabtree at gmail.com> wrote:
> 
> >
> > Hi Shalabh-
> >
> > I believe RefSeq is a non-redundant database, in which sequence entries
> > with identical sequences are merged and their descriptions are concatenated
> > in the FASTA defline.  If you look up the two accession numbers/gi numbers
> > from your search results I think you'll see that both are valid matches
> > because their polypeptide sequences are identical:
> >
> > http://www.ncbi.nlm.nih.gov/protein/71082715
> > http://www.ncbi.nlm.nih.gov/protein/91762865
> >
> > You're just getting a single match with two descriptions instead of two
> > matches with one description, but the sequence is the same and so, therefore
> > are the blast alignments.
> >
> > Jonathan
> >
> > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > > wrote:
> >
> >> Hi All,
> >>          This is not really a bioperl query, but i am really confused and
> >> need some help.
> >> I blasted some sequences against refseq database (locally). After parsing
> >> the blast result what i noticed that some description fields contain two
> >> hit
> >> names like:
> >> hit_name ->    gi|71082715|ref|YP_265434.1|
> >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
> >> protein
> >> [Candidatus Pelagibacter ubique HTCC1002]
> >>
> >> So besides giving me description for hit_name (HTCC 1062) its also giving
> >> me
> >> HTCC 1002.
> >> I will really appreciate if someone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _________________________________________________
> >> Shalabh Sharma
> >> Scientific Computing Professional Associate
> >> Department of Marine Sciences
> >> University of Georgia
> >> Athens, GA 30602-3636
> >>
> >> phone: 706-542-0341
> >> email: ssharmai at uga.edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Tue Jun  2 17:05:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 2 Jun 2009 17:05:03 -0400
Subject: [Bioperl-l] Bio::Search::Tiling
Message-ID: <B006036D760941179148C9F8E2AD7E05@NewLife>

All-
Bio::Search::Tiling is now in bioperl-live, passes all tests.
Thanks, 
Mark


From shalabh.sharma7 at gmail.com  Wed Jun  3 13:27:59 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 3 Jun 2009 13:27:59 -0400
Subject: [Bioperl-l] gbf to gff
Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>

Hi all,                 I am working on Roseobacters. Many times I've
converted gbk file from GenBank to gff format but now one genome
"Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
gbf files:

https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain

So now how i can convert this genome to one gff file so i can use it in
gbrowse?
I would really appreciate if anyone can help me out.

Thanks


From scott at scottcain.net  Wed Jun  3 14:11:54 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 3 Jun 2009 14:11:54 -0400
Subject: [Bioperl-l] gbf to gff
In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com>

Hi Shalabh,

Do you want them combined onto a single reference sequence?  I'm
guessing this is a circular microbial genome in two segments.  Do you
know how to the coordinates in one genbank file relates to the other
(or are you willing to make something up)?  I imagine the way I would
do it would be to convert both files to gff and then write a quicky
script to convert the coordinates and reference sequence name (column
1) of one file to be consistent with the other.

Scott


On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've
> converted gbk file from GenBank to gff format but now one genome
> "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
> gbf files:
>
> https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain
>
> So now how i can convert this genome to one gff file so i can use it in
> gbrowse?
> I would really appreciate if anyone can help me out.
>
> Thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From alperyilmaz at gmail.com  Fri Jun  5 14:50:46 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Fri, 5 Jun 2009 14:50:46 -0400
Subject: [Bioperl-l] GBroswe2 - feature details
Message-ID: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>

Dear all,

I have a question about utilizing the tag/value pairs that were used
in 9th of GFF. If my 9th column is like this:

ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22

How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
print name and sequence of a BindingSite, what do I need to replace
question marks below?

balloon hover = <font size=small color=red>Motif name: $name,
Sequence: ???????</font>


The manual is mentioning that it's possible to use user defined
tag/value pairs, but I couldn't figure out how. The manual is
mentioning:
 [feature_type:details]
 tag1 = formatting rule
 tag2 = formatting rule
 tag3 = formatting rule

can be used to adjust formatting of a tag, but I don't how this can be
used to assign value to a tag? I tried ;
[cis-elements:details]
bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
mentioned, tags are case-insensitive)
 OR
$bs_seq = <b>$value</b>

but, I cannot use $bs_seq in hover link option after doing this. What
am I doing wrong?

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954
www.grassius.org


From cjfields at illinois.edu  Fri Jun  5 16:43:04 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 5 Jun 2009 15:43:04 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm?
In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
References: <002b01c9e567$e09b0de0$a1d129a0$@edu>
	<A145C0B1-D2B3-47CB-BA46-DCCDD693D05F@illinois.edu>
	<52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
Message-ID: <C29B8160-5682-48AF-BD9E-A5FF26EC679F@illinois.edu>

(Just so this is going to the correct list)

Marcos,

I'll look into it.  This may have been fixed in between the releases,  
though.

There isn't a PPM available for 1.6 yet (several prereqs were missing  
at the time of the 1.6 release, such as Graphviz and so on).  A bug  
report is in the queue for this, though, as a reminder.  I think those  
are now available, though, so we should *theoretically* be capable of  
getting a PPM ready.  I say 'theoretically' b/c I don't have easy  
access to a PC running Windows (I have moved to OS X).  I'll see what  
I can do about that in the next few weeks.

In the meantime, if you need it you can download 1.6 or the 'nightly  
build' version (nightly snapshots of svn code) and add it to PERL5LIB  
or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work.

Nightly builds:

http://bioperl.org/DIST/nightly_builds/

chris

On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote:

> OK, I attached the first record for both files.  These are GenBank  
> flat files that were emailed to us and transferred from Macs to PCs,  
> so I am not sure if the encoding/line terminations got messed up at  
> some point.  I converted the line terminations to Unix and the  
> encoding to Western European Windows, still, it didn't work. May be  
> worth it mention that BioEdit did understand the format after I  
> fixed the encoding.
>
> The data was erased because my boss is kind of finicky about sharing  
> information.  However, I tested the files attached to this email and  
> got the same results.
>
> I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the  
> availability of an upgrade from CPAN, are you releasing the PPD as  
> well?
>
> Thanks!
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thu 6/4/2009 8:05 PM
> To: Barbeitos, Marcos
> Cc: bioperl-guts-l at lists.open-bio.org
> Subject: Re: [Bioperl-guts-l] Bug in genbank.pm?
>
> Marcos,
>
> We need the GenBank file (or the accession) you are attempting to
> parse.  Also, what version are you using?  We have released v. 1.6 on
> CPAN, and I intend on releasing 1.6.1 soon.
>
> chris
>
> On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote:
>
>> Hello.  I am trying to parse the Info from GeneBank flat files using
>> Bio::SeqIO.  I got two file which are virtually identical and one of
>> them
>> gets parsed just fine.  However, in the case of the other, the  
>> program
>> croaks when trying to parse the features and gives me:
>>
>>
>>
>> -------------------- WARNING ---------------------
>>
>> MSG: Unexpected error in feature table for  Skipping feature,
>> attempting to
>> recover
>>
>> ---------------------------------------------------
>>
>>
>>
>> I noticed that it does that after it reads the entry '/organism' in
>> Features.  The only difference I can see between the two files is the
>> presence of the feature ' /organelle' and of the line BASE COUNT in
>> one of
>> them, but the error persists even after I remove these lines.  Apart
>> from
>> that, there are the number of white spaces that precede the
>> beginning of
>> each line.   Any ideas?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Marcos S. Barbeitos
>>
>> Post-Doc Fellow
>>
>> The University of Kansas
>> Department of Ecology and Evolutionary Biology
>> 2041 Haworth Hall
>> 1200 Sunnyside Avenue
>> Lawrence, Kansas 66045
>> p: 785.864.5887
>> f: 785.864.5860
>>
>>
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
>
> <BioPerlTest.gb>


From Russell.Smithies at agresearch.co.nz  Sun Jun  7 16:32:27 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 8 Jun 2009 08:32:27 +1200
Subject: [Bioperl-l] GBroswe2 - feature details
In-Reply-To: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
References: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz>

For the first part of your question, you can use a sub to access values in your annotations:

balloon hover = sub{my $f = shift;
			my %a = $f->attributes;
			my $name = $f->name;
			my $seq = $a{'BS_Seq'};
			return "<font size=small color=red>Motif name: $name, Sequence: $seq</font>" if defined $seq;
			return "<font size=small color=red>Motif name: $name, No sequence defined</font>";
			}


For the second bit, here's the formatting rules I'm using to create hyperlinks:

[Dbxref:DETAILS]
URL = sub {
      my ($tag,$value)=@_;
      if ($value =~ /NCBI_gi:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1";
       }
      if ($value =~ /NCBI_Gene:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1";
       }
       return;
     }

And this is what the gff looks like:
BTA10	refseq	mRNA	10011147	10176454	0	-	.	ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311;
BTA10	refseq	mRNA	10241506	10301142	0	+	.	ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862;

Hopefully, this will get you going :-)


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz
> Sent: Saturday, 6 June 2009 6:51 a.m.
> To: BioPerl List
> Subject: [Bioperl-l] GBroswe2 - feature details
> 
> Dear all,
> 
> I have a question about utilizing the tag/value pairs that were used
> in 9th of GFF. If my 9th column is like this:
> 
> ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22
> 
> How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
> print name and sequence of a BindingSite, what do I need to replace
> question marks below?
> 
> balloon hover = <font size=small color=red>Motif name: $name,
> Sequence: ???????</font>
> 
> 
> The manual is mentioning that it's possible to use user defined
> tag/value pairs, but I couldn't figure out how. The manual is
> mentioning:
>  [feature_type:details]
>  tag1 = formatting rule
>  tag2 = formatting rule
>  tag3 = formatting rule
> 
> can be used to adjust formatting of a tag, but I don't how this can be
> used to assign value to a tag? I tried ;
> [cis-elements:details]
> bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
> mentioned, tags are case-insensitive)
>  OR
> $bs_seq = <b>$value</b>
> 
> but, I cannot use $bs_seq in hover link option after doing this. What
> am I doing wrong?
> 
> thanks,
> 
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> www.grassius.org
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From bernd.jagla at pasteur.fr  Mon Jun  8 12:24:12 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 8 Jun 2009 18:24:12 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>

Hi, 

 
I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
'install Bio::Das'
This is perl, v5.8.9 built for darwin-2level
(please let me know if you need anything else)

 
I am trying to install Bio::Das 1.11

 
I get the following error:

 
not ok 3

not ok 4

Can't call method "description" on an undefined value at t/01das.t line 62.

 
When going into the sources for 01das.t and printing out $db I get:

 
$VAR1 = \bless( {

                   'autotypes' => undef,

                   'default_dsn' => undef,

                   'autocategories' => undef,

                   'sockets' => {},

                   'aggregators' => [

                                      bless( {

                                               'sub_parts' => [

 
'coding_exon'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'CDS',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator'
),

                                      bless( {

                                               'sub_parts' => [

                                                                'EST_match'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'alignment',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator' )

                                    ],

                   'timeout' => undef,

                   'oldstyle_api' => 1,

                   'default_server' => 'http://www.wormbase.org/db/seq/das'

                 }, 'Bio::Das' );

 
@sources is empty

And test(3, at sources) fails.

 
Please advise.

 
Thanks,

 
Bernd

 
From lincoln.stein at gmail.com  Mon Jun  8 13:00:48 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 8 Jun 2009 13:00:48 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>

Hi,

The regression tests require an active Internet connection, as well as the
DAS test server being up and running. It may be there was a temporary
failure of one of those two. I just tested on my end and the regression
tests ran ok, so could you try it again?

Lincoln

On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:

> Hi,
>
>
>
> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
> 'install Bio::Das'
> This is perl, v5.8.9 built for darwin-2level
> (please let me know if you need anything else)
>
>
>
> I am trying to install Bio::Das 1.11
>
>
>
> I get the following error:
>
>
>
> not ok 3
>
> not ok 4
>
> Can't call method "description" on an undefined value at t/01das.t line 62.
>
>
>
> When going into the sources for 01das.t and printing out $db I get:
>
>
>
> $VAR1 = \bless( {
>
>                   'autotypes' => undef,
>
>                   'default_dsn' => undef,
>
>                   'autocategories' => undef,
>
>                   'sockets' => {},
>
>                   'aggregators' => [
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>
> 'coding_exon'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'CDS',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator'
> ),
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>                                                                'EST_match'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'alignment',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator' )
>
>                                    ],
>
>                   'timeout' => undef,
>
>                   'oldstyle_api' => 1,
>
>                   'default_server' => 'http://www.wormbase.org/db/seq/das'
>
>                 }, 'Bio::Das' );
>
>
>
>
>
> @sources is empty
>
> And test(3, at sources) fails.
>
>
>
> Please advise.
>
>
>
> Thanks,
>
>
>
> Bernd
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lsbrath at gmail.com  Mon Jun  8 16:28:46 2009
From: lsbrath at gmail.com (lsbrath at gmail.com)
Date: Mon, 08 Jun 2009 20:28:46 +0000
Subject: [Bioperl-l] fasta conversion
Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com>

Hello!

I am running into trouble while trying to convert a text file to fasta. It  
should be simple enough but I am getting a wierd error message.

This is my script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
use Bio::SeqIO;


my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
my $maid = '13063';

opendir my $dh, "$maid_dir"; # directory to search
my @files = readdir $dh;
#find the _fasta file
for my $f (@files){
my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
my $r = $maid_dir."/".$maid."_hu_1kb.txt";
open (my $in,$r);
if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta

print Dumper($f);
my $hu_1kb = $maid.'_hu_1kb'; #file to convert
my $in = Bio::SeqIO->new(-file => $r,
-format => 'raw');
my $out = Bio::SeqIO->new(-file => ">$fa",
-format => 'Fasta');
while ( my $seq = $in->next_seq()) {
$out->write_seq($seq);
}
}
}

I keep getting the following error message:

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 13063
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [13063HU] which does not look healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
STACK: Bio::Seq::SeqFactory::create  
C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
-----------------------------------------------------------

Anyone out there that can help me solve this?


From kjaja27 at yahoo.com  Fri Jun  5 19:42:13 2009
From: kjaja27 at yahoo.com (kayj)
Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT)
Subject: [Bioperl-l]  finding SNPs in a given region
Message-ID: <23897107.post@talk.nabble.com>


Hi All,

Is there a way to find the SNPs in a given region, I have the start and the
end base pair position, I am looking to download the SNPs in different
regions, Is that possible ?
 This is my first time using bioperl and any help will be greatly
appreciated

Thanks

-- 
View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From kjaja27 at yahoo.com  Mon Jun  8 09:49:24 2009
From: kjaja27 at yahoo.com (kayj)
Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT)
Subject: [Bioperl-l]  How to extract SNPs
Message-ID: <23924432.post@talk.nabble.com>


Hi All,
I have several regions on the genome each is defined with the start and the
end base pair position. I am looking into using HapMap
http://hapmart.hapmap.org/BioMart/martview

 to extract the SNPs in these region given a population. I am new to bioperl
and any help will be greatly appreciated.


-- 
View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bernd at pasteur.fr  Mon Jun  8 16:31:57 2009
From: bernd at pasteur.fr (bernd at pasteur.fr)
Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST)
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>

I tested the connection with wget and everything works fine.
I suspect that our proxy might be the problem but all variables are set
correctly (ftp_proxy, http_proxy and many more) I am not sure which
environment variable are being used...
I am not too familiar with all this and don't know where to look for the
right configurations.

Thanks,

Bernd

> Hi,
>
> The regression tests require an active Internet connection, as well as the
> DAS test server being up and running. It may be there was a temporary
> failure of one of those two. I just tested on my end and the regression
> tests ran ok, so could you try it again?
>
> Lincoln
>
> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>> -e
>> 'install Bio::Das'
>> This is perl, v5.8.9 built for darwin-2level
>> (please let me know if you need anything else)
>>
>>
>>
>> I am trying to install Bio::Das 1.11
>>
>>
>>
>> I get the following error:
>>
>>
>>
>> not ok 3
>>
>> not ok 4
>>
>> Can't call method "description" on an undefined value at t/01das.t line
>> 62.
>>
>>
>>
>> When going into the sources for 01das.t and printing out $db I get:
>>
>>
>>
>> $VAR1 = \bless( {
>>
>>                   'autotypes' => undef,
>>
>>                   'default_dsn' => undef,
>>
>>                   'autocategories' => undef,
>>
>>                   'sockets' => {},
>>
>>                   'aggregators' => [
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>
>> 'coding_exon'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' => 'CDS',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator'
>> ),
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>                                                                'EST_match'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' =>
>> 'alignment',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator' )
>>
>>                                    ],
>>
>>                   'timeout' => undef,
>>
>>                   'oldstyle_api' => 1,
>>
>>                   'default_server' =>
>> 'http://www.wormbase.org/db/seq/das'
>>
>>                 }, 'Bio::Das' );
>>
>>
>>
>>
>>
>> @sources is empty
>>
>> And test(3, at sources) fails.
>>
>>
>>
>> Please advise.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Bernd
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Mon Jun  8 17:12:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 8 Jun 2009 17:12:03 -0400
Subject: [Bioperl-l] fasta conversion
In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com>
References: <000e0cd6aa4cd53993046bdc1675@google.com>
Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife>

you're getting the sequence descriptor rather than the sequence in the return 
from
$in->next_seq. Read up on what the 'raw' format actually entails in the 
Bio::SeqIO pod..
cheers MAJ
----- Original Message ----- 
From: <lsbrath at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, June 08, 2009 4:28 PM
Subject: [Bioperl-l] fasta conversion


> Hello!
>
> I am running into trouble while trying to convert a text file to fasta. It 
> should be simple enough but I am getting a wierd error message.
>
> This is my script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use File::Copy;
> use Bio::SeqIO;
>
>
> my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
> my $maid = '13063';
>
> opendir my $dh, "$maid_dir"; # directory to search
> my @files = readdir $dh;
> #find the _fasta file
> for my $f (@files){
> my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
> my $r = $maid_dir."/".$maid."_hu_1kb.txt";
> open (my $in,$r);
> if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta
>
> print Dumper($f);
> my $hu_1kb = $maid.'_hu_1kb'; #file to convert
> my $in = Bio::SeqIO->new(-file => $r,
> -format => 'raw');
> my $out = Bio::SeqIO->new(-file => ">$fa",
> -format => 'Fasta');
> while ( my $seq = $in->next_seq()) {
> $out->write_seq($seq);
> }
> }
> }
>
> I keep getting the following error message:
>
> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 13063
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Attempting to set the sequence to [13063HU] which does not look healthy
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
> STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
> STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
> STACK: Bio::Seq::SeqFactory::create 
> C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
> STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
> -----------------------------------------------------------
>
> Anyone out there that can help me solve this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From stefan.kirov at bms.com  Mon Jun  8 17:26:17 2009
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Mon, 08 Jun 2009 17:26:17 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
	<47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
Message-ID: <4A2D81F9.8060509@bms.com>

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>                                                                'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bernd.jagla at pasteur.fr  Tue Jun  9 03:05:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Tue, 9 Jun 2009 09:05:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <4A2D81F9.8060509@bms.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
	<4A2D81F9.8060509@bms.com>
Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>

Great, that works!!!
But since I am using Bio::Das within GBrowse I can't/don't want to  change
those sources. I tried setting some environment variable but that doesn't
seem to work either...
So far I have the set the following:
FTP_PROXY=http://...
HTTP_PROXY=http://...
PROXYFTP=http://...
PROXYHTTP=http://...
ftp_proxy=http://...
http_proxy=http://...
PROXY=http://...

Any suggestions are welcome.

Thanks,

Bernd


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov
Sent: Monday, June 08, 2009 11:26 PM
To: bernd at pasteur.fr
Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as
the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Tue Jun  9 07:20:35 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 9 Jun 2009 12:20:35 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>

Hi,

I have been experimenting with the Bio::DB::EUtilities module, with  
help from the Cookbook. But I can't seem to figure out how to get the  
DNA sequence of a gene; all the examples seem to be fetching protein  
sequence.

How would i go about fetching a sequence using an Entrez GeneID?

thanks for any help

adam


From Kevin.M.Brown at asu.edu  Tue Jun  9 11:25:45 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 9 Jun 2009 08:25:45 -0700
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com>
	<19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Tue Jun  9 12:08:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 11:08:46 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>

All,

I've noticed a few methods in bioperl with names like 'no_Foo' that  
mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
problem I foresee are possible ambiguities, particularly with negative  
boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
Foo'), something that BioPerl also has with various settings.

I suggest we alias these as num_* to disambiguate that.  There's no  
easy way to change already in-place flag setting w/o going through a  
deprecation cycle, but we can promote using positive booleans where  
possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
the older 'no_*' methods as is for the time being and maybe deprecate  
them later.

If no one has objections I'll add these in as needed.

chris


From SMarkel at accelrys.com  Tue Jun  9 12:26:08 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 9 Jun 2009 12:26:08 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>

Chris,

I just checked our code for the Sequence Analysis Collection in
Pipeline Pilot.  We've got a few places we'd need to make code
changes, but we like your suggestion.  So, no objections from us.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, 09 June 2009 9:09 AM
> To: BioPerl List
> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
> 
> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
> problem I foresee are possible ambiguities, particularly with negative
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no
> easy way to change already in-place flag setting w/o going through a
> deprecation cycle, but we can promote using positive booleans where
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave
> the older 'no_*' methods as is for the time being and maybe deprecate
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jun  9 13:03:16 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 12:03:16 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
Message-ID: <A5461F02-AA81-4A02-88DA-181B33EE41FE@illinois.edu>

I don't think it would require code changes right away; for the time  
being no_* will just alias num_*.  We can probably have deprecation  
warnings activate when we reach a particular version.

chris

On Jun 9, 2009, at 11:26 AM, Scott Markel wrote:

> Chris,
>
> I just checked our code for the Sequence Analysis Collection in
> Pipeline Pilot.  We've got a few places we'd need to make code
> changes, but we like your suggestion.  So, no objections from us.
>
> Scott
>
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Tuesday, 09 June 2009 9:09 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative  
>> booleans
>>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
>> problem I foresee are possible ambiguities, particularly with  
>> negative
>> boolean checks (eg 'no_Foo' could also mean 'this instance contains  
>> no
>> Foo'), something that BioPerl also has with various settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no
>> easy way to change already in-place flag setting w/o going through a
>> deprecation cycle, but we can promote using positive booleans where
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
>> leave
>> the older 'no_*' methods as is for the time being and maybe deprecate
>> them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun  9 12:32:51 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 9 Jun 2009 12:32:51 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife>

+1, absolutely- MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 09, 2009 12:08 PM
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans


> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with negative  
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
> the older 'no_*' methods as is for the time being and maybe deprecate  
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hlapp at gmx.net  Tue Jun  9 13:18:05 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 13:18:05 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>

Great suggestions, I'm all for it.

	-hilmar

On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:

> All,
>
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with  
> negative boolean checks (eg 'no_Foo' could also mean 'this instance  
> contains no Foo'), something that BioPerl also has with various  
> settings.
>
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
> leave the older 'no_*' methods as is for the time being and maybe  
> deprecate them later.
>
> If no one has objections I'll add these in as needed.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florent.angly at gmail.com  Tue Jun  9 14:41:51 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 09 Jun 2009 11:41:51 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
Message-ID: <4A2EACEF.3090809@gmail.com>

Agree! no_* is prone to misunderstandings.
Also, some BioPerl code uses nof_*, which I quite like.
Florent

Hilmar Lapp wrote:
> Great suggestions, I'm all for it.
>
>     -hilmar
>
> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>> problem I foresee are possible ambiguities, particularly with 
>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>> contains no Foo'), something that BioPerl also has with various 
>> settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no 
>> easy way to change already in-place flag setting w/o going through a 
>> deprecation cycle, but we can promote using positive booleans where 
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>> leave the older 'no_*' methods as is for the time being and maybe 
>> deprecate them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Jun  9 14:55:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 13:55:48 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EACEF.3090809@gmail.com>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
	<4A2EACEF.3090809@gmail.com>
Message-ID: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>

We could probably alias nof_* with num_* just for consistency, but  
leave nof_* as is and not deprecate it (I don't think anyone would  
confuse nof* with no*).

chris

On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:

> Agree! no_* is prone to misunderstandings.
> Also, some BioPerl code uses nof_*, which I quite like.
> Florent
>
> Hilmar Lapp wrote:
>> Great suggestions, I'm all for it.
>>
>>    -hilmar
>>
>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>
>>> All,
>>>
>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>> The problem I foresee are possible ambiguities, particularly with  
>>> negative boolean checks (eg 'no_Foo' could also mean 'this  
>>> instance contains no Foo'), something that BioPerl also has with  
>>> various settings.
>>>
>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>> no easy way to change already in-place flag setting w/o going  
>>> through a deprecation cycle, but we can promote using positive  
>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>> time being and maybe deprecate them later.
>>>
>>> If no one has objections I'll add these in as needed.
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mauricio at open-bio.org  Tue Jun  9 15:33:18 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Tue, 09 Jun 2009 14:33:18 -0500
Subject: [Bioperl-l] Project Help
In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
Message-ID: <4A2EB8FE.4080402@open-bio.org>

Hi Chirag,

The OBF applied for the GSoC 2009 but unfortunately we were not 
accepted. However, other organizations/projects made their way into it 
and have been kind enough to adopt some of the ideas originally proposed 
under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list 
so the people involved with those projects can give you more details.

Regards,
Mauricio.


chirag matkar wrote:
> Hello,
> THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation.
> Also as i am myself a perl developer can i can some stipend or internship for building perl modules?.
> 
> Thanking You,
> Regards Chirag.
> 
> 
>       Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/
> 


From rmb32 at cornell.edu  Tue Jun  9 15:12:54 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 12:12:54 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
Message-ID: <4A2EB436.8020506@cornell.edu>

Why not just add deprecation warnings now?  Or you could add deprecation 
warnings now that only print if $Bio::Root::Version::VERSION >= 
something.  Best to do it while one is thinking about it, I always say. 
  Cause I always forget to do it later.  ;-)

Rob

Chris Fields wrote:
> We could probably alias nof_* with num_* just for consistency, but leave 
> nof_* as is and not deprecate it (I don't think anyone would confuse 
> nof* with no*).
> 
> chris
> 
> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
> 
>> Agree! no_* is prone to misunderstandings.
>> Also, some BioPerl code uses nof_*, which I quite like.
>> Florent
>>
>> Hilmar Lapp wrote:
>>> Great suggestions, I'm all for it.
>>>
>>>    -hilmar
>>>
>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>
>>>> All,
>>>>
>>>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>>>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>>>> problem I foresee are possible ambiguities, particularly with 
>>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>>>> contains no Foo'), something that BioPerl also has with various 
>>>> settings.
>>>>
>>>> I suggest we alias these as num_* to disambiguate that.  There's no 
>>>> easy way to change already in-place flag setting w/o going through a 
>>>> deprecation cycle, but we can promote using positive booleans where 
>>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>>>> leave the older 'no_*' methods as is for the time being and maybe 
>>>> deprecate them later.
>>>>
>>>> If no one has objections I'll add these in as needed.
>>>>
>>>> chris
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 16:19:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:19:03 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EB436.8020506@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
Message-ID: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>

On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:

> Why not just add deprecation warnings now?  Or you could add  
> deprecation warnings now that only print if  
> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
> is thinking about it, I always say.  Cause I always forget to do it  
> later.  ;-)
>
> Rob

Actually, that's one thing I want to implement within Root, namely the  
ability to do this:

$self->deprecated(-message     => 'method Foo is deprecated',
                   -start_ver   => $version1,
                   -throw_ver   => $version2
);

So it's essentially a noop and invisible up to start_ver (upon where  
it warns), then throws after, well, throw_ver.  I could probably  
finagle that in w/o destroying things...

chris

> Chris Fields wrote:
>> We could probably alias nof_* with num_* just for consistency, but  
>> leave nof_* as is and not deprecate it (I don't think anyone would  
>> confuse nof* with no*).
>> chris
>> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
>>> Agree! no_* is prone to misunderstandings.
>>> Also, some BioPerl code uses nof_*, which I quite like.
>>> Florent
>>>
>>> Hilmar Lapp wrote:
>>>> Great suggestions, I'm all for it.
>>>>
>>>>   -hilmar
>>>>
>>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>>>> The problem I foresee are possible ambiguities, particularly  
>>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this  
>>>>> instance contains no Foo'), something that BioPerl also has with  
>>>>> various settings.
>>>>>
>>>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>>>> no easy way to change already in-place flag setting w/o going  
>>>>> through a deprecation cycle, but we can promote using positive  
>>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>>>> time being and maybe deprecate them later.
>>>>>
>>>>> If no one has objections I'll add these in as needed.
>>>>>
>>>>> chris
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 16:45:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:45:37 -0500
Subject: [Bioperl-l] deprecated(), was Re:  use of no_* to mean 'number_of',
	negative booleans
In-Reply-To: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
Message-ID: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>

On Jun 9, 2009, at 3:19 PM, Chris Fields wrote:

> On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:
>
>> Why not just add deprecation warnings now?  Or you could add  
>> deprecation warnings now that only print if  
>> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
>> is thinking about it, I always say.  Cause I always forget to do it  
>> later.  ;-)
>>
>> Rob
>
> Actually, that's one thing I want to implement within Root, namely  
> the ability to do this:
>
> $self->deprecated(-message     => 'method Foo is deprecated',
>                  -start_ver   => $version1,
>                  -throw_ver   => $version2
> );
>
> So it's essentially a noop and invisible up to start_ver (upon where  
> it warns), then throws after, well, throw_ver.  I could probably  
> finagle that in w/o destroying things...
>
> chris

Just to note, this is mainly to allow us devs the opportunity to add  
these to main trunk w/o having to worry about merges over to the 1.6  
branch (where the version is different).  We don't want the dep  
warnings showing up there right away, but maybe in a point release or  
minor version.

chris


From hlapp at gmx.net  Tue Jun  9 19:09:26 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 19:09:26 -0400
Subject: [Bioperl-l] Project Help
In-Reply-To: <4A2EB8FE.4080402@open-bio.org>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
	<4A2EB8FE.4080402@open-bio.org>
Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net>

Hi Chirag,

check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted  
Projects') at

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

	-hilmar

On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote:

> Hi Chirag,
>
> The OBF applied for the GSoC 2009 but unfortunately we were not  
> accepted. However, other organizations/projects made their way into  
> it and have been kind enough to adopt some of the ideas originally  
> proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl  
> mailing list so the people involved with those projects can give you  
> more details.
>
> Regards,
> Mauricio.
>
>
> chirag matkar wrote:
>> Hello,
>> THis is Chirag Matkar wanting to know whether there were any GSOC  
>> 2009 projects underway in open Bioinformatics Foundation.
>> Also as i am myself a perl developer can i can some stipend or  
>> internship for building perl modules?.
>> Thanking You,
>> Regards Chirag.
>>      Explore and discover exciting holidays and getaways with  
>> Yahoo! India Travel http://in.travel.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rmb32 at cornell.edu  Tue Jun  9 21:13:36 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 18:13:36 -0700
Subject: [Bioperl-l] deprecated(),
 was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
Message-ID: <4A2F08C0.3010609@cornell.edu>

Chris Fields wrote:
>> Actually, that's one thing I want to implement within Root, namely the 
>> ability to do this:
>>
>> $self->deprecated(-message     => 'method Foo is deprecated',
>>                  -start_ver   => $version1,
>>                  -throw_ver   => $version2
>> );

Here's a patch with tests against the svn trunk head.  Is this what you 
had in mind?

-- 
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deprecated.patch
Type: text/x-diff
Size: 5601 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090609/431738da/attachment-0003.bin>

From cjfields at illinois.edu  Tue Jun  9 22:54:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 21:54:47 -0500
Subject: [Bioperl-l] deprecated(),
	was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2F08C0.3010609@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
	<4A2F08C0.3010609@cornell.edu>
Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu>

On Jun 9, 2009, at 8:13 PM, Robert Buels wrote:

> Chris Fields wrote:
>>> Actually, that's one thing I want to implement within Root, namely  
>>> the ability to do this:
>>>
>>> $self->deprecated(-message     => 'method Foo is deprecated',
>>>                 -start_ver   => $version1,
>>>                 -throw_ver   => $version2
>>> );
>
> Here's a patch with tests against the svn trunk head.  Is this what  
> you had in mind?
>
> -- 
> Rob

Funny, I had written up almost exactly the same code, just a little  
rearranged.  I've modified mine to follow your use of -warn_version (I  
also had -throw_version as a synonym of -version, JIC).  Also, for the  
tests I created a temp class in the tests and ran tests off that.   
Thanks for the patch!

chris


From maj at fortinbras.us  Wed Jun 10 00:10:12 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:10:12 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>

Hi All, 

I've built a public Amazon machine image, loaded with many many 
goodies, including the most recent (r15747) trunks of 
- bioperl-live
- bioperl-run
- bioperl-db/biosql
The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, 
emboss, and more are all there (and most even pass bioperl-run tests), and 
perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
(r1071) and others. This is *not* a lean mean fighting machine. 

Please give it a try if you're so inclined. Fuller details (including 
image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max.

Ping me if it doesn't work.

Cheers, 
Mark


From cjfields at illinois.edu  Wed Jun 10 00:36:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 23:36:40 -0500
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>

I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
do you have mysql or pg?

Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
rakudo and we could do some damage...

chris

On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jun 10 00:39:36 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:39:36 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife>

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
> do you have mysql or pg?

-both (I'm all about options...)


> 
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
> rakudo and we could do some damage...
> 

bioperl-max-0.1.1, here we come...


> chris
> 

cheers MAJ

> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
> 
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  
>> tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
>> .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From bernd.jagla at pasteur.fr  Wed Jun 10 03:43:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 09:43:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina>

I wrote a small test program to test the environment variables and I have
them:

          'SSH_CLIENT' => '157.
          'FTP_PROXY' => 'http://
          'HTTP_PROXY' => 'http://cache.past
          'SSH_TTY' => '/dev/ttys002',
          'ftp_proxy' => 'http://
          'http_proxy' => 'http://

Using the "-proxy" works, without it doesn't. 

(and yes, I export the variables..)

Thanks for any suggestions.

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.jagla at pasteur.fr  Wed Jun 10 04:16:08 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 10:16:08 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <F5844533CFCB425DA400C888A9995F70@zillumina>

To whom it may concern:

I added 
  $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'};

Around line 72 before:
  $self->proxy($proxy) if $proxy;

In Das.pm. This did the trick.

For completeness I also edited Fetch.pm:
Around line 134:
  $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'};
Before:
  my $dest = $proxy || $request->url;

Best,

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ron at ron.dk  Wed Jun 10 03:35:09 2009
From: ron at ron.dk (Rasmus Ory Nielsen)
Date: Wed, 10 Jun 2009 09:35:09 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase
	file.
Message-ID: <4A2F622D.5060500@ron.dk>

Hi,

This is my first time using bioperl for restriction analysis, so please bear 
with me, if this is a FAQ.

I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
script shown at the bottom of the mail.
My bioperl version is bioperl-live nightly from 09-Jun-2009.

The scripts throws an exception - see below. But, if I comment out the 
'-enzymes' argument, so it uses the built-in collection of enzymes, it works.

My problem is, that I need to use some of the enzymes that are only available 
in rebase. So how do I get this working?

Thanks for your attention.

Best regards,
Rasmus Ory Nielsen


############################################################
Output from the script:
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------

------------- EXCEPTION -------------
MSG: Bad end parameter (11). End must be less than the total length of 
sequence (total=7)
STACK Bio::PrimarySeq::subseq 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
STACK Bio::Restriction::Analysis::_enzyme_sites 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
STACK Bio::Restriction::Analysis::_cuts 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
STACK Bio::Restriction::Analysis::cut 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
STACK Bio::Restriction::Analysis::fragment_maps 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
STACK toplevel ./restriction_test.pl:30
-------------------------------------

[roni at ksdhcp ~]$


############################################################
Output from the script with the '-enzymes' argument commented out
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------
$VAR1 = [
           {
             'seq' => 'CTCGACCGTTAGCAA',
             'end' => 15,
             'start' => '1'
           },
           {
             'seq' => 'AGCTTTCTACCGTTATCGT',
             'end' => 34,
             'start' => '16'
           }
         ];
[roni at ksdhcp ~]$

############################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::PrimarySeq;
use Bio::Restriction::IO;
use Bio::Restriction::Analysis;
use Data::Dumper;

# create seq obj
my $seqobj = new Bio::PrimarySeq(
     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
     -primary_id => 'test',
     -molecule   => 'dna'
);

# read rebase file
my $rebase_io = Bio::Restriction::IO->new(
     -file   => 'withrefm.906',
     -format => 'withrefm',
);
my $rebase_collection = $rebase_io->read;

# start restriction analysis
my $restriction_analysis = Bio::Restriction::Analysis->new(
     -seq     => $seqobj,
     -enzymes => $rebase_collection,    # it works with this line commented out
);

# retrieve fragment maps
my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
print Dumper \@fragment_maps;


From awitney at sgul.ac.uk  Wed Jun 10 07:19:55 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 12:19:55 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>

Hi,

I am going through the EUtilities Cookbook, but the last example (in  
section 2.3.1) fails with:

Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.

This is with BioPerl 1.6.0, perl v5.8.8

thanks for any help

adam


From hlapp at gmx.net  Wed Jun 10 08:08:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 10 Jun 2009 08:08:54 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net>

Very cool! -hilmar

On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at illinois.edu  Wed Jun 10 08:28:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 07:28:44 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>

I can reproduce that; I'll look into it.

chris

On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:

> Hi,
>
> I am going through the EUtilities Cookbook, but the last example (in  
> section 2.3.1) fails with:
>
> Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
> site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>
> This is with BioPerl 1.6.0, perl v5.8.8
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:20:43 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:20:43 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>

EntrezGene doesn't contain the sequence information; I believe it just  
links to the sequence in a specified nuc record with given  
coordinates.  You can get to it, but it takes a little trickery; in  
essence you need to use the UID to get the gene summary information,  
extract that, then grab the sequence record using seqstart, seqend,  
and seqstrand.

A dump of esummary info for UID 18131, for instance, (using $eutil- 
 >print_all) gives this info (abbreviated somewhat):

UID                 :18131
Name                :Notch3
Description         :Notch gene homolog 3 (Drosophila)
Orgname             :Mus musculus
...
GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837
GeneWeight          :23049

The genomic info section gives the accession.version, start, end, and  
(implicitly) the strand (ChrStop is less that ChrStart). I have added  
an example to the cookbook:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

chris

On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:

> Hi,
>
> I have been experimenting with the Bio::DB::EUtilities module, with  
> help from the Cookbook. But I can't seem to figure out how to get  
> the DNA sequence of a gene; all the examples seem to be fetching  
> protein sequence.
>
> How would i go about fetching a sequence using an Entrez GeneID?
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 09:33:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:33:51 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
	<98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu>

Adam,

Okay, fixed that and the previous issue with 'use an undefined value  
as an ARRAY reference'.  The previous issue appears to be due to a  
change in the XML output from NCBI (it used to give the IDs at one  
point).  Also made the wiki changes for this; didn't take long to find  
everything.

Thanks for pointing that out!  If you find any more issues feel free  
to make the necessary changes on the wiki or point them out if they're  
in code.

chris

On Jun 10, 2009, at 8:12 AM, Adam Witney wrote:

> Hi Chris,
>
> not sure if I should start a new thread for this, but it is related  
> to the EUtilities Cookbook and LinkSet.pm.
>
> There are several references in the Cookbook to the method  
> "get_linkname", however this seems to have changed in the recent  
> version of LinkSet.pm to "get_link_name". But one reference to the  
> old method name still exists in LinkSet.pm, as shown by this patch:
>
> --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
> LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
> +++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
> @@ -220,7 +220,7 @@
> =cut
>
> sub get_link_name {
> -    return ($_[0]->get_linknames)[0];
> +    return ($_[0]->get_link_names)[0];
> }
>
> =head2 get_submitted_ids
>
> If i haven't got this all wrong entirely, I could go through and fix  
> the Cookbook entries if that was useful?
>
> adam
>
>
> On 10 Jun 2009, at 13:28, Chris Fields wrote:
>
>> I can reproduce that; I'll look into it.
>>
>> chris
>>
>> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I am going through the EUtilities Cookbook, but the last example  
>>> (in section 2.3.1) fails with:
>>>
>>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>>
>>> This is with BioPerl 1.6.0, perl v5.8.8
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From awitney at sgul.ac.uk  Wed Jun 10 09:12:05 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 14:12:05 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>


Hi Chris,

not sure if I should start a new thread for this, but it is related to  
the EUtilities Cookbook and LinkSet.pm.

There are several references in the Cookbook to the method  
"get_linkname", however this seems to have changed in the recent  
version of LinkSet.pm to "get_link_name". But one reference to the old  
method name still exists in LinkSet.pm, as shown by this patch:

--- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
+++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
@@ -220,7 +220,7 @@
  =cut

  sub get_link_name {
-    return ($_[0]->get_linknames)[0];
+    return ($_[0]->get_link_names)[0];
  }

  =head2 get_submitted_ids

If i haven't got this all wrong entirely, I could go through and fix  
the Cookbook entries if that was useful?

adam


On 10 Jun 2009, at 13:28, Chris Fields wrote:

> I can reproduce that; I'll look into it.
>
> chris
>
> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I am going through the EUtilities Cookbook, but the last example  
>> (in section 2.3.1) fails with:
>>
>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>
>> This is with BioPerl 1.6.0, perl v5.8.8
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Wed Jun 10 10:10:21 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 15:10:21 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
Message-ID: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>


Thanks for the pointers Chris.

The new example on the Cookbook doesn't quite work for me as ChrStart  
seems to appear in the DocSum twice, thus  
get_contents_by_name('ChrStart') returns a list of two values (which  
writes the second ChrStart into $end). Also the $start and $end seem  
to be out by 1, so I needed to change it to this:

my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
my ($start) = ($docsum->get_contents_by_name('ChrStart'));
my ($end) = ($docsum->get_contents_by_name('ChrStop'));

  $start += 1;
  $end += 1;

Ah, looking at this further there appears to be something going on in  
the response from Entrez. Compare these two gene records:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? 
db=gene&id=18131		(your example below)
http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
		(my gene)

In both cases you can see that ChrStart appears twice, once as part of  
the GenomicInfo list and once on its own at the bottom. In my example  
above the two ChrStart values match, but in the Notch3 example you  
posted the 2nd ChrStart seems to be the same as the ChrStop in the  
GenomicInfo list. Do you know if the second ChrStart has a separate  
meaning?

I guess in the Cookbook example we would need to make sure that the  
get_contents_by_name('ChrStart') picks up the value from the  
GenomicInfo list, is this possible?

thanks again

adam


On 10 Jun 2009, at 14:20, Chris Fields wrote:

> EntrezGene doesn't contain the sequence information; I believe it  
> just links to the sequence in a specified nuc record with given  
> coordinates.  You can get to it, but it takes a little trickery; in  
> essence you need to use the UID to get the gene summary information,  
> extract that, then grab the sequence record using seqstart, seqend,  
> and seqstrand.
>
> A dump of esummary info for UID 18131, for instance, (using $eutil- 
> >print_all) gives this info (abbreviated somewhat):
>
> UID                 :18131
> Name                :Notch3
> Description         :Notch gene homolog 3 (Drosophila)
> Orgname             :Mus musculus
> ...
> GenomicInfo
>    GenomicInfoType
>        ChrLoc      :17
>        ChrAccVer   :NC_000083.5
>        ChrStart    :32303796
>        ChrStop     :32257837
> GeneWeight          :23049
>
> The genomic info section gives the accession.version, start, end,  
> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
> added an example to the cookbook:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>
> chris
>
> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I have been experimenting with the Bio::DB::EUtilities module, with  
>> help from the Cookbook. But I can't seem to figure out how to get  
>> the DNA sequence of a gene; all the examples seem to be fetching  
>> protein sequence.
>>
>> How would i go about fetching a sequence using an Entrez GeneID?
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 13:56:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 12:56:46 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
	<B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
Message-ID: <CD8513A6-0872-4174-9333-94D76D5711F8@illinois.edu>

Adam,

That's really odd that they do that (both the duplication of ChrStart  
and the coordinates being off-by-one, which means they appear to be 0- 
based).  It's possible that the second ChrStart is meant to represent  
the actual first base for the gene irrespective of start/end.  My  
example is on the opposite strand, so the second ChrStart == end.

The fact that they use the same element name is slightly annoying (and  
seemingly redundant), but there is a workaround.  We grab only the  
layered information specifically; in this case we want everything  
below 'GenomicInfoType':

GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837

So, we can do this in the DocSum loop (that appears to work for your  
example):

############################

for my $docsum ($eutil->next_DocSum) {
     # to ensure we grab the right ChrStart information, we grab the  
Item above
     # it in the Item hierarchy (visible via print_all from the eutil  
instance)
     my ($item) = $docsum->get_Items_by_name('GenomicInfoType');

     my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop);

     while (my $sub_item = $item->next_subItem) {
         if (exists $item_data{$sub_item->get_name}) {
             $item_data{$sub_item->get_name} = $sub_item->get_content;
         }
     }
     # check to make sure everything is set
     for my $check (qw(ChrAccVer ChrStart ChrStop)) {
         die "$check not set" unless $item_data{$check};
     }

     my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1;
     $fetcher->set_parameters(-id => $item_data{ChrAccVer},
                              -seq_start => $item_data{ChrStart} + 1,
                              -seq_stop  => $item_data{ChrStop} + 1,
                              -strand    => $strand);
     print $fetcher->get_Response->content;
}

############################

That's to retain compatibility with 1.6; I'll update the wiki.  I can  
add some common Item container methods to grab information for any  
Items contained in the current instance (be it a DocSum or another  
Item).  I'll add that in bioperl-live.

chris

On Jun 10, 2009, at 9:10 AM, Adam Witney wrote:

> Thanks for the pointers Chris.
>
> The new example on the Cookbook doesn't quite work for me as  
> ChrStart seems to appear in the DocSum twice, thus  
> get_contents_by_name('ChrStart') returns a list of two values (which  
> writes the second ChrStart into $end). Also the $start and $end seem  
> to be out by 1, so I needed to change it to this:
>
> my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
> my ($start) = ($docsum->get_contents_by_name('ChrStart'));
> my ($end) = ($docsum->get_contents_by_name('ChrStop'));
>
> $start += 1;
> $end += 1;
>
> Ah, looking at this further there appears to be something going on  
> in the response from Entrez. Compare these two gene records:
>
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 
> 		(your example below)
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
> 		(my gene)
>
> In both cases you can see that ChrStart appears twice, once as part  
> of the GenomicInfo list and once on its own at the bottom. In my  
> example above the two ChrStart values match, but in the Notch3  
> example you posted the 2nd ChrStart seems to be the same as the  
> ChrStop in the GenomicInfo list. Do you know if the second ChrStart  
> has a separate meaning?
>
> I guess in the Cookbook example we would need to make sure that the  
> get_contents_by_name('ChrStart') picks up the value from the  
> GenomicInfo list, is this possible?
>
> thanks again
>
> adam
>
>
> On 10 Jun 2009, at 14:20, Chris Fields wrote:
>
>> EntrezGene doesn't contain the sequence information; I believe it  
>> just links to the sequence in a specified nuc record with given  
>> coordinates.  You can get to it, but it takes a little trickery; in  
>> essence you need to use the UID to get the gene summary  
>> information, extract that, then grab the sequence record using  
>> seqstart, seqend, and seqstrand.
>>
>> A dump of esummary info for UID 18131, for instance, (using $eutil- 
>> >print_all) gives this info (abbreviated somewhat):
>>
>> UID                 :18131
>> Name                :Notch3
>> Description         :Notch gene homolog 3 (Drosophila)
>> Orgname             :Mus musculus
>> ...
>> GenomicInfo
>>   GenomicInfoType
>>       ChrLoc      :17
>>       ChrAccVer   :NC_000083.5
>>       ChrStart    :32303796
>>       ChrStop     :32257837
>> GeneWeight          :23049
>>
>> The genomic info section gives the accession.version, start, end,  
>> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
>> added an example to the cookbook:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>>
>> chris
>>
>> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I have been experimenting with the Bio::DB::EUtilities module,  
>>> with help from the Cookbook. But I can't seem to figure out how to  
>>> get the DNA sequence of a gene; all the examples seem to be  
>>> fetching protein sequence.
>>>
>>> How would i go about fetching a sequence using an Entrez GeneID?
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 07:36:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 07:36:40 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife>

Hi Chris and list-
Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
I notice also that autogenerated documentation for bioperl-live doesn't contain
new modules (or HIVQuery & Tiling, anyway ;) )--
cheers, Mark


From maj at fortinbras.us  Thu Jun 11 09:17:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 09:17:25 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>

Rasmus et al-

This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles 
through
all enzymes apparently creating a global cut map). AarI has a recognition 
sequence of

CACCTGC (in $enz->seq->seq)

but a cut site of

CACCTGCNNNN^ (in $enz->seq->site)

The bad parm '11' refers to the end of the cut site sequence, but the routine
B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence,
and so throws.

This surprises me. Core, let me know if you want me to take this on, or
if the module author can fix it quicker.

cheers,
Mark

----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 11 10:19:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 11 Jun 2009 09:19:51 -0500
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>

Mark,

Feel free to take it up.  It's probably a good idea to start a bug  
report for tracking if it proves to be thornier to fix than expected.

chris

On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:

> Rasmus et al-
>
> This looks like a bug. A quick debug shows it's barfing on  
> 'AarI' (as it cycles through
> all enzymes apparently creating a global cut map). AarI has a  
> recognition sequence of
>
> CACCTGC (in $enz->seq->seq)
>
> but a cut site of
>
> CACCTGCNNNN^ (in $enz->seq->site)
>
> The bad parm '11' refers to the end of the cut site sequence, but  
> the routine
> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition  
> sequence,
> and so throws.
>
> This surprises me. Core, let me know if you want me to take this on,  
> or
> if the module author can fix it quicker.
>
> cheers,
> Mark
>
> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
> using rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so  
>> please bear with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>> created the script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out  
>> the '-enzymes' argument, so it uses the built-in collection of  
>> enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only  
>> available in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length  
>> of sequence (total=7)
>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>> Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>          {
>>            'seq' => 'CTCGACCGTTAGCAA',
>>            'end' => 15,
>>            'start' => '1'
>>          },
>>          {
>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>            'end' => 34,
>>            'start' => '16'
>>          }
>>        ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>    -primary_id => 'test',
>>    -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>    -file   => 'withrefm.906',
>>    -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>    -seq     => $seqobj,
>>    -enzymes => $rebase_collection,    # it works with this line  
>> commented out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 10:26:19 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 10:26:19 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <CD6C392C39CD4287B3619FCDBC1D19CF@NewLife>

All-righty-- thanks MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From mauricio at open-bio.org  Thu Jun 11 12:46:35 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 11 Jun 2009 11:46:35 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
Message-ID: <4A3134EB.4080702@open-bio.org>

Hi Mark,

I'll take a look into this sometime between today and tomorrow. Will 
keep you posted. Thanks for the heads up :)

Mauricio.


Mark A. Jensen wrote:
> Hi Chris and list-
> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
> I notice also that autogenerated documentation for bioperl-live doesn't contain
> new modules (or HIVQuery & Tiling, anyway ;) )--
> cheers, Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Thu Jun 11 14:41:26 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 14:41:26 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3134EB.4080702@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
Message-ID: <A53006055C854297AAA58F6650F4F867@NewLife>

cheers Mauricio! MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Thursday, June 11, 2009 12:46 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Hi Mark,
>
> I'll take a look into this sometime between today and tomorrow. Will keep you 
> posted. Thanks for the heads up :)
>
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> Hi Chris and list-
>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>> I notice also that autogenerated documentation for bioperl-live doesn't 
>> contain
>> new modules (or HIVQuery & Tiling, anyway ;) )--
>> cheers, Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> 


From Xianjun.Dong at bccs.uib.no  Fri Jun 12 16:38:50 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Fri, 12 Jun 2009 22:38:50 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for
	Bio::Graphics::Glyph
Message-ID: <4A32BCDA.4080605@ii.uib.no>

HI,

I am not sure this is the right place I can get help.

I've suffered by a problem for several days: I want to highlight parts 
of regions in my track, using a different background color. To do that, 
I defined a glyph named "background", based on the 
'Bio::Graphics::Glyph::generic' module. I override the draw_component() 
method, by adding code like below:

$gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));

# the script is pasted at the end

This will draw a rectangle with top=0, bottom=$gd->height. I made the 
highlight regions into a list of features, and add_track with 
-glyph=>'background'. (see the following script, test.pl) This really 
works as I expect, which will add a colored block at background of all 
tracks in a panel (including the ruler arrow). You can see the output 
image in attached file "test.bioperl1.2.3.png"

Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does 
not work. Well, it works, but the highlight part only shrink to a low 
height, instead of covering all tracks in the panel. I also attached the 
output here, see the file "test.bioperl1.6.png".

I tried to think about the reason, the 'background' module is based on 
the generic module. What can cause the difference? Is it because 
$gd->height is different, or the tracks followed with 'background' track 
can not draw from the first position?

Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
person solve problem, wise person avoid problem"...) But another problem 
is coming: Bio::Graphics in Bioperl 1.2.3 does not support 
$panel->create_web_map() function, which means I have to use some higher 
version if I want to create web map for my graphics, but then I have to 
give up using highlight background.

OK. It's long enough for my first-time submission here. Hope someone can 
throw me some clue.

Thanks ahead!!

Xianjun


==================== test.pl =======================
#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12);

# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
$panel->add_track([$trans41,$trans31],
          -glyph   => 'background',
                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();

1;

==================== background.pm =======================
package Bio::Graphics::Glyph::background;
 
use strict;
use base 'Bio::Graphics::Glyph::generic';
sub pad_top{
  return 0;
}

sub draw_component {
  my $self = shift;
  #$self->SUPER::draw_component(@_);
  my ($gd,$dx,$dy) = @_;
  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
 
  # draw an arrow to indicate the direction of transcript
  my $color = $self->option('block_bgcolor') || '#cccccc';
  $gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));
}
 
1;

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0007.png>

From scott at scottcain.net  Fri Jun 12 21:29:09 2009
From: scott at scottcain.net (Scott Cain)
Date: Fri, 12 Jun 2009 21:29:09 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A32BCDA.4080605@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>

Hello Xianjun,

I don't think that approach will work.  What you almost certainly need
to do is a postgrid callback that does the drawing of the highlighted
region.  For example code of how to do this, take a look at the
make_postgrid_callback subroutine in GBrowse 1.69.  The option
-postgrid is a method of Bio::Graphics::Panel.

Scott


On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
> ? ? ? ? -glyph ? => 'background',
> ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
> ? ? ? ? ? ? ? ? );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
> ? ? ? ? ? ? ? ? -glyph=>'arrow',
> ? ? ? ? ? ? ? ? -double=>1,
> ? ? ? ? ? ? ? ? -tick=>2);
>
> $panel->add_track($trans,
> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -title => '$source',
> ? ? ? ? ? ? ? ? -link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
> ? ? ? ? ? ? ? ? );
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Jun 13 09:27:39 2009
From: scott at scottcain.net (Scott Cain)
Date: Sat, 13 Jun 2009 09:27:39 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A339621.2060702@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>

Hi Xianjun,

I understand what you want to do, as the current version of gbrowse
does this, which uses bioperl 1.6.  Without digging through the code,
I can't tell you exactly how this works and you didn't send your code
that uses this callback, so I can't try it either.

One thing that is different between your code and gbrowse is that each
of the tracks is actually a seperate panel (to allow track dragging),
so it possible that this sort of callback doesn't work for
Bio::Graphics any more.

Scott

On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott
>
> Thanks for your reply first.
>
> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>
> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>
> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>
> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>
> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>
> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
> test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>
> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>
> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>
> Thanks
>
> Xianjun
> =============================================
>
> # this generates the callback for highlighting a region
> sub make_postgrid_callback {
> ?my $settings = shift;
> ?return unless ref $settings->{h_region};
>
> ?my @h_regions = map {
>  ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>  ? defined($h_ref) && $h_ref eq $settings->{ref}
>  ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>  ? ? ? ? ? ? ? ?: ()
> ?}
>  ? @{$settings->{h_region}};
>
> ?return unless @h_regions;
> ?return hilite_regions_closure(@h_regions);
> }
>
> # this subroutine generates a Bio::Graphics::Panel callback closure
> # suitable for hilighting a region of a panel.
> # The args are a list of [start,end,color]
> sub hilite_regions_closure {
> ?my @h_regions = @_;
>
> ?return sub {
>  ? my $gd ? ? = shift;
>  ? my $panel ?= shift;
>  ? my $left ? = $panel->pad_left;
>  ? my $top ? ?= $panel->top;
>  ? my $bottom = $panel->bottom;
>  ? for my $r (@h_regions) {
>  ? ? my ($h_start,$h_end,$h_color) = @$r;
>  ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>  ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>  ? ? # assuming top is 0 so as to ignore top padding
>  ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>  ? }
> ?};
> }
>
>
> Scott Cain wrote:
>
> Hello Xianjun,
>
> I don't think that approach will work. ?What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region. ?For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>
>
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
>  ? ? ? ?-glyph ? => 'background',
>  ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
>  ? ? ? ? ? ? ? ?);
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>  ? ? ? ? ? ? ? ?-glyph=>'arrow',
>  ? ? ? ? ? ? ? ?-double=>1,
>  ? ? ? ? ? ? ? ?-tick=>2);
>
> $panel->add_track($trans,
>  ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>  ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-title => '$source',
>  ? ? ? ? ? ? ? ?-link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>  ? ? ? ? ? ? ? ?);
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 12:48:16 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 18:48:16 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
Message-ID: <4A33D850.1020203@ii.uib.no>

Hi, Scott

Before I gave up my own whole solution to use GBrowse, I still want to 
bother you once:

As you suggested, I put -postgrid option when the panel, which will call 
a function to draw the background. The code below is almost copied from 
the online POD of Bio::Graphics::Panel (see 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
)

But it still does not work. Could you help to have a look? I paste it 
below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the 
gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)

  my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment,
                                        -grid=>1,
                                        -width=>600,
                                        -postgrid=> \&draw_gap);
  sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $panel->bottom;
     my $gray                 = $panel->translate_color('gray');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}

THanks

Xianjun

-----------------------------------------------

#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12
                                             -postgrid=>\&gap_it);

sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $gd->height, #panel->bottom;
     my $gray                 = $panel->translate_color('red');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}
# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
#$panel->add_track([$trans41,$trans31],
#          -glyph   => 'background',
#                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
#                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();


Scott Cain wrote:
> Hi Xianjun,
>
> I understand what you want to do, as the current version of gbrowse
> does this, which uses bioperl 1.6.  Without digging through the code,
> I can't tell you exactly how this works and you didn't send your code
> that uses this callback, so I can't try it either.
>
> One thing that is different between your code and gbrowse is that each
> of the tracks is actually a seperate panel (to allow track dragging),
> so it possible that this sort of callback doesn't work for
> Bio::Graphics any more.
>
> Scott
>
> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
>   
>> Hi, Scott
>>
>> Thanks for your reply first.
>>
>> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>>
>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>
>> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>
>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>
>> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>>
>> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>> test.bioperl1.2.3.png:    http://translog.genereg.net/test.bioperl1.2.3.png ]
>>
>> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>>
>> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>>
>> Thanks
>>
>> Xianjun
>> =============================================
>>
>> # this generates the callback for highlighting a region
>> sub make_postgrid_callback {
>>  my $settings = shift;
>>  return unless ref $settings->{h_region};
>>
>>  my @h_regions = map {
>>    my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>                 : ()
>>  }
>>    @{$settings->{h_region}};
>>
>>  return unless @h_regions;
>>  return hilite_regions_closure(@h_regions);
>> }
>>
>> # this subroutine generates a Bio::Graphics::Panel callback closure
>> # suitable for hilighting a region of a panel.
>> # The args are a list of [start,end,color]
>> sub hilite_regions_closure {
>>  my @h_regions = @_;
>>
>>  return sub {
>>    my $gd     = shift;
>>    my $panel  = shift;
>>    my $left   = $panel->pad_left;
>>    my $top    = $panel->top;
>>    my $bottom = $panel->bottom;
>>    for my $r (@h_regions) {
>>      my ($h_start,$h_end,$h_color) = @$r;
>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>>      # assuming top is 0 so as to ignore top padding
>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>    }
>>  };
>> }
>>
>>
>> Scott Cain wrote:
>>
>> Hello Xianjun,
>>
>> I don't think that approach will work.  What you almost certainly need
>> to do is a postgrid callback that does the drawing of the highlighted
>> region.  For example code of how to do this, take a look at the
>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>> -postgrid is a method of Bio::Graphics::Panel.
>>
>> Scott
>>
>>
>>
>>
>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>
>>
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>>     
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From maj at fortinbras.us  Sun Jun 14 00:35:18 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 00:35:18 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when
	usingrebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <A9819F7FF3894C768CF89C36CB689942@NewLife>

All-

I'm finding this is requiring a pretty substantial refactor and
rationalization. I have opened a branch at
REPOS/bioperl-live/branches/restriction-refactor
and am making commits at will there (won't Rob be pleased!).
When it appears to be passing tests, I'll let Chris know (on list),
and he can decide on its mergability, and brave users could try
it out by downloading Bio/Restriction (deeply) via subversion.

My running commentary is at Bug #2855.
MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when 
usingrebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Sun Jun 14 21:57:45 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 14 Jun 2009 18:57:45 -0700
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception
	when	usingrebasefile.
In-Reply-To: <A9819F7FF3894C768CF89C36CB689942@NewLife>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
	<A9819F7FF3894C768CF89C36CB689942@NewLife>
Message-ID: <4A35AA99.2080305@cornell.edu>

Mark A. Jensen wrote:
> I'm finding this is requiring a pretty substantial refactor and
> rationalization. I have opened a branch at
> REPOS/bioperl-live/branches/restriction-refactor
> and am making commits at will there (won't Rob be pleased!).
Oh Mark, you are so agile!

> When it appears to be passing tests, I'll let Chris know (on list),
> and he can decide on its mergability, and brave users could try
> it out by downloading Bio/Restriction (deeply) via subversion.
If it's passing tests but still has bugs, make sure you add tests for 
the additional bugs you find!

Rob


From maj at fortinbras.us  Sun Jun 14 22:02:37 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 22:02:37 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis.
	Exceptionwhen	usingrebasefile.
In-Reply-To: <4A35AA99.2080305@cornell.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu><A9819F7FF3894C768CF89C36CB689942@NewLife>
	<4A35AA99.2080305@cornell.edu>
Message-ID: <FFDC29BB104149BE95840F1AD1B61827@NewLife>


----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Sunday, June 14, 2009 9:57 PM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen 
usingrebasefile.


> Mark A. Jensen wrote:
>> I'm finding this is requiring a pretty substantial refactor and
>> rationalization. I have opened a branch at
>> REPOS/bioperl-live/branches/restriction-refactor
>> and am making commits at will there (won't Rob be pleased!).
> Oh Mark, you are so agile!
ha!
>
>> When it appears to be passing tests, I'll let Chris know (on list),
>> and he can decide on its mergability, and brave users could try
>> it out by downloading Bio/Restriction (deeply) via subversion.
> If it's passing tests but still has bugs, make sure you add tests for the 
> additional bugs you find!

mais, bien sur; plenty new tests coming-- thanks Rob-
MAJ

>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From shalabh.sharma7 at gmail.com  Mon Jun 15 16:06:31 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 15 Jun 2009 16:06:31 -0400
Subject: [Bioperl-l] sub sampling
Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>

Hi All,           I was just wondering that is there any module is bioperl
that do subsampling?
I have a file like this:

369859  0477    93
163417  1348    92
228122  0176    88
232792  0050    93
239636  1850    95
300069  0048    96
244108  0046    91
199087  0055    93
206209  0048    96
-              -         -
-              -         -

which contain around 100,000 lines and i want to take out a sample of 25%
from this file. Is there any way i can do this in Bioperl?

Thanks
Shalabh


From maj at fortinbras.us  Mon Jun 15 19:49:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 19:49:58 -0400
Subject: [Bioperl-l] Bio::Restriction refactor [Was:
	Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>

Dear All,

The revamped Bio::Restriction::* in branch

REPOS/bioperl-live/branches/restriction-refactor

passes all existing tests, including those in t/Restriction.
New tests will be added within the next day or so.
The original bug occurred because only a subset of
the possible rebase withrefm-formatted enzymes were
handled; it choked on freshly-downloaded rebase
files because of this.

The refactored version now handles *all* rebase types,
including those of rebase forms

XXX^X                [ intrasite cutters, the main types
                               built in to base.pm]
XXXX(m/n)          [ right-end extrasite cutters ]
(s/t)XXXX            [ left-end ditto ]
(s/t)XXXX(m/n)    [ double-end ditto],

palindromic and non-palindromic, as well as multisite
enzymes that string together combinations of these
forms. Much rationalization (well, seems rational to me
anyway) and cruft removal in the affected code has also
occurred. itype2.pm has been updated as well, to
conform to the refactoring.

If you're dying to try this now, get a working copy
of the branch like so

$ svn co 
svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
bioperl-rr
$ cd bioperl-rr
$ perl Build.PL
$ ./Build test
$ ./Build install

This will only hammer your current installation in the
$SITE_LIB/Bio/Restriction path; I worked only on
a sparse checkout of the necessary files. To revert to your
old install, do

$ cd $MY_OLD_BIOPERL_WORKINGDIR
$ ./Build install

[In the possible event that these instructions are in error,
there will be a response on this list in a matter of
milliseconds, so stand by.]

Happy coding-
Mark


----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 15 20:07:21 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 20:07:21 -0400
Subject: [Bioperl-l] sub sampling
In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
Message-ID: <A030148C139446DAB1DEE791A4EC2D3B@NewLife>

Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):

 # open your file into $my_infile, then
 @lines = <$my_infile>;

 my $num_samps = 10;
 my $sample_size_pc = 0.25;
 my @samples;

 for (1..$num_samps) {
    push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * 
@lines) ) ];
 }

# now, do something, fr'instance
 my @sample_pc;
 foreach (@samples) {
    my $pct=0;
    foreach my $line (@lines[ @$_ ]) {
        @a = split(/\s+/,$line);
        $pct += $a[2];
    }
    $pct /= @$_;
    push @sample_pc, $pct;
 }

R's just better for some things, ain't it?
MAJ


----- Original Message ----- 
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling


> Hi All,           I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859  0477    93
> 163417  1348    92
> 228122  0176    88
> 232792  0050    93
> 239636  1850    95
> 300069  0048    96
> 244108  0046    91
> 199087  0055    93
> 206209  0048    96
> -              -         -
> -              -         -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 08:05:53 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 14:05:53 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
Message-ID: <4A339621.2060702@ii.uib.no>

Hi, Scott

Thanks for your reply first.

I still have question: I dig out the code from GBrowse (which I paste 
below). Method make_postgrid_callback gets all highlight region and then 
use hilite_regions_closure function to draw them out, using the 
following GD function:

$gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));

where the $bottom=$panel->bottom. This is the only difference from my 
code, where I use $gd->height. I guess they are almost same (except the 
pad_bottom), we can see this in the code of 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22

OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for 
my highlight regions. The output is same, when using the library of 
Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")

OK. I might have not explained my question explicitly. My question is: 
if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I 
can get the right image I want (see the attached file 
"test.bioperl1.2.3.png"), where the highlight range will go from the 
roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
highlight region in its own track, not the whole panel. OK, did I 
explain clearly now? you can see the difference of the two images.

[I am not sure the mailist allow to attach image, otherwise, I put them 
in the following links:
test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
test.bioperl1.2.3.png:    
http://translog.genereg.net/test.bioperl1.2.3.png ]

You can test it and see the difference if you have both 1.2.3 and 1.6 on 
your computer?

Really want to know how this works in bioperl 1.2.3 (Even though this 
might be a bug at that version, or whatever)

Thanks

Xianjun
=============================================

# this generates the callback for highlighting a region
sub make_postgrid_callback {
  my $settings = shift;
  return unless ref $settings->{h_region};

  my @h_regions = map {
    my ($h_ref,$h_start,$h_end,$h_color) = 
/^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
    defined($h_ref) && $h_ref eq $settings->{ref}
                 ? [$h_start,$h_end,$h_color||'lightgrey']
                 : ()
  }
    @{$settings->{h_region}};

  return unless @h_regions;
  return hilite_regions_closure(@h_regions);
}

# this subroutine generates a Bio::Graphics::Panel callback closure
# suitable for hilighting a region of a panel.
# The args are a list of [start,end,color]
sub hilite_regions_closure {
  my @h_regions = @_;

  return sub {
    my $gd     = shift;
    my $panel  = shift;
    my $left   = $panel->pad_left;
    my $top    = $panel->top;
    my $bottom = $panel->bottom;
    for my $r (@h_regions) {
      my ($h_start,$h_end,$h_color) = @$r;
      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
      if ($end-$start <= 1) { $end++; $start-- } # so that we always see 
something
      # assuming top is 0 so as to ignore top padding
      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));
    }
  };
}


Scott Cain wrote:
> Hello Xianjun,
>
> I don't think that approach will work.  What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region.  For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69.  The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>   
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0007.png>

From malcolm.cook at gmail.com  Tue Jun 16 04:06:36 2009
From: malcolm.cook at gmail.com (Malcolm Cook)
Date: Tue, 16 Jun 2009 03:06:36 -0500
Subject: [Bioperl-l]  Alignment->slice() issue?
Message-ID: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>

Kevin,

I'm getting struck by this old issue you once coded around.

      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html

Any chance you could share your implementation with  fellow traveller...

??

Thanks,

Malcolm Cook
Stowers insitute for Medical research


From remi.planel at free.fr  Tue Jun 16 10:57:27 2009
From: remi.planel at free.fr (Remi Planel)
Date: Tue, 16 Jun 2009 16:57:27 +0200
Subject: [Bioperl-l] Hits Object
Message-ID: <4A37B2D7.70807@free.fr>

Hi all,

I couldn't find out from a Bio::Search::Result::ResultI object (obtain 
after parsing a blast report) a way to filter some of the hsps associated ?
By filter I mean eliminate for each hit some hsps I'm not interested in ?

Can I modify directly the Result object ?

Thanks,


From lsbrath at gmail.com  Tue Jun 16 11:42:37 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Tue, 16 Jun 2009 11:42:37 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
	undefined value
Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

sub hu_bl2seq_parser{
	my ($maid, $maid_dir) = @_;
	# Get the report
	my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
						   -report_type => 'blastn');
	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
	my $result=$in->next_result;
	my($hu_aln,$hu_mismatches);
	# Get info about the first hit
	my $hit = $result->next_hit;
	my $name = $hit->name;
	# get info about the first hsp of the first hit
	my $hsp = $hit->next_hsp;
	# get the alignment object
	my $aln = $hsp->get_aln;
	#my $percent_id = $hsp->percent_identity;
	#my $aln_length = $hsp->length('total');
	my @mismatches = $hsp->seq_inds('query','nomatch');
	my $aln_str="";
	# access the alignment string
	my $strIO=IO::String->new($aln_str);
	#  write the string alignio in clustalw format
	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
	# now the actual alignment string is accessable for printing or in
this case moving to a db table
	$alnio->write_aln($aln);
	$hu_aln=$aln_str;
	$hu_mismatches = scalar @mismatches;
	return($hu_aln, $hu_mismatches);
}

The problem is at "my $hit = $result->next_hit;"
Any help will be appreciated.
LomSpace


From cjfields at illinois.edu  Tue Jun 16 14:14:18 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:14:18 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu>

I'll check out the branch sometime today and run tests on it.  Thanks  
for the hard work Mark!

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From maj at fortinbras.us  Tue Jun 16 13:58:56 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:58:56 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>

Dear All,

There are tests for the new functionality of Bio::Restriction
now in t/Restriction on the branch, along with the withrefm.906
in t/data that revealed the bug in RON's post. All tests pass without
warnings on my machine (which is bioperl live, perl 5.10.10,
under Vista/cygwin - yes, I still don't have a real computer).
We're ready for a merge on my end.

Thanks all for your silent assent to these machinations.
cheers
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 16 13:51:14 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:51:14 -0400
Subject: [Bioperl-l] Hits Object
In-Reply-To: <4A37B2D7.70807@free.fr>
Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife>

Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe
http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will 
be answered there-
cheers, Mark


From cjfields at illinois.edu  Tue Jun 16 14:31:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:31:10 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>

Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  
merge.

Also (as mentioned some time back w/ Hilmar among others), we can  
probably delete this branch seeing as the code will be merged to trunk  
(it being a feature branch and all).  Worth doing the same for a few  
other feature branches as well.

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Tue Jun 16 15:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 14:07:44 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>

Sounds to me like a BioPerl bug.  Do you have some example data  
demonstrating the problem?

chris

On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:

> Kevin,
>
> I'm getting struck by this old issue you once coded around.
>
>      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>
> Any chance you could share your implementation with  fellow  
> traveller...
>
> ??
>
> Thanks,
>
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun 16 15:32:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 15:32:02 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on
	andundefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife>

lomspace-
this

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

should be

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

if you're reading the file. Then $result will have something in it when
you do $in->next_result

cheers, MAJ
----- Original Message ----- 
From: "Mgavi Brathwaite" <lsbrath at gmail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 16, 2009 11:42 AM
Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined 
value


> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> sub hu_bl2seq_parser{
> my ($maid, $maid_dir) = @_;
> # Get the report
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
>    -report_type => 'blastn');
> #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");
> #my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> my $result=$in->next_result;
> my($hu_aln,$hu_mismatches);
> # Get info about the first hit
> my $hit = $result->next_hit;
> my $name = $hit->name;
> # get info about the first hsp of the first hit
> my $hsp = $hit->next_hsp;
> # get the alignment object
> my $aln = $hsp->get_aln;
> #my $percent_id = $hsp->percent_identity;
> #my $aln_length = $hsp->length('total');
> my @mismatches = $hsp->seq_inds('query','nomatch');
> my $aln_str="";
> # access the alignment string
> my $strIO=IO::String->new($aln_str);
> #  write the string alignio in clustalw format
> my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> # now the actual alignment string is accessable for printing or in
> this case moving to a db table
> $alnio->write_aln($aln);
> $hu_aln=$aln_str;
> $hu_mismatches = scalar @mismatches;
> return($hu_aln, $hu_mismatches);
> }
>
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Tue Jun 16 15:46:40 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 16 Jun 2009 12:46:40 -0700
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
 undefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <4A37F6A0.1080907@cornell.edu>

Mgavi Brathwaite wrote:
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.

Your proximate problem seems to be that you are prepending a '>' to the 
filename in your invocation of Bio::SearchIO::new, which I think might 
cause it to write to the file instead of reading from it.  But also, you 
probably want to use next_result and next_hit in while loops, since they 
return undef when there are no more hits or hsps to parse.  This is what 
is causing your "can't call next_hit on undefined value" error. 
next_result() returns undef when there are no results to parse.

by while loops, I mean something like:

while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
      # insert the rest of your operations here
      }
}

Hope this helps.

Rob

> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
> 
> sub hu_bl2seq_parser{
> 	my ($maid, $maid_dir) = @_;
> 	# Get the report
> 	my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
> 						   -report_type => 'blastn');
> 	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
> 	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> 	my $result=$in->next_result;
> 	my($hu_aln,$hu_mismatches);
> 	# Get info about the first hit
> 	my $hit = $result->next_hit;
> 	my $name = $hit->name;
> 	# get info about the first hsp of the first hit
> 	my $hsp = $hit->next_hsp;
> 	# get the alignment object
> 	my $aln = $hsp->get_aln;
> 	#my $percent_id = $hsp->percent_identity;
> 	#my $aln_length = $hsp->length('total');
> 	my @mismatches = $hsp->seq_inds('query','nomatch');
> 	my $aln_str="";
> 	# access the alignment string
> 	my $strIO=IO::String->new($aln_str);
> 	#  write the string alignio in clustalw format
> 	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> 	# now the actual alignment string is accessable for printing or in
> this case moving to a db table
> 	$alnio->write_aln($aln);
> 	$hu_aln=$aln_str;
> 	$hu_mismatches = scalar @mismatches;
> 	return($hu_aln, $hu_mismatches);
> }
> 
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Tue Jun 16 16:10:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 16:10:34 -0400
Subject: [Bioperl-l] Bio::Restriction
	refactor[Was:Bio::Restriction::Analysis. Exception when using
	rebasefile.]
In-Reply-To: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
References: <4A2F622D.5060500@ron.dk><E80E6C1BC08D4E338739148BFE9BFAC0@NewLife><D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
	<A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife>

Right; will remove branch. Will go ahead with merge at 21:20 UTC.
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Tuesday, June 16, 2009 2:31 PM
Subject: Re: [Bioperl-l] Bio::Restriction 
refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]


> Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  merge.
>
> Also (as mentioned some time back w/ Hilmar among others), we can  probably 
> delete this branch seeing as the code will be merged to trunk  (it being a 
> feature branch and all).  Worth doing the same for a few  other feature 
> branches as well.
>
> chris
>
> On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:
>
>> Dear All,
>>
>> There are tests for the new functionality of Bio::Restriction
>> now in t/Restriction on the branch, along with the withrefm.906
>> in t/data that revealed the bug in RON's post. All tests pass without
>> warnings on my machine (which is bioperl live, perl 5.10.10,
>> under Vista/cygwin - yes, I still don't have a real computer).
>> We're ready for a merge on my end.
>>
>> Thanks all for your silent assent to these machinations.
>> cheers
>> Mark
>>
>> ----- Original Message ----- From: "Mark A. Jensen"  <maj at fortinbras.us>
>> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
>> Sent: Monday, June 15, 2009 7:49 PM
>> Subject: [Bioperl-l] Bio::Restriction refactor 
>> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>>
>>
>>> Dear All,
>>>
>>> The revamped Bio::Restriction::* in branch
>>>
>>> REPOS/bioperl-live/branches/restriction-refactor
>>>
>>> passes all existing tests, including those in t/Restriction.
>>> New tests will be added within the next day or so.
>>> The original bug occurred because only a subset of
>>> the possible rebase withrefm-formatted enzymes were
>>> handled; it choked on freshly-downloaded rebase
>>> files because of this.
>>>
>>> The refactored version now handles *all* rebase types,
>>> including those of rebase forms
>>>
>>> XXX^X                [ intrasite cutters, the main types
>>>                              built in to base.pm]
>>> XXXX(m/n)          [ right-end extrasite cutters ]
>>> (s/t)XXXX            [ left-end ditto ]
>>> (s/t)XXXX(m/n)    [ double-end ditto],
>>>
>>> palindromic and non-palindromic, as well as multisite
>>> enzymes that string together combinations of these
>>> forms. Much rationalization (well, seems rational to me
>>> anyway) and cruft removal in the affected code has also
>>> occurred. itype2.pm has been updated as well, to
>>> conform to the refactoring.
>>>
>>> If you're dying to try this now, get a working copy
>>> of the branch like so
>>>
>>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>>> restriction-refactor bioperl-rr
>>> $ cd bioperl-rr
>>> $ perl Build.PL
>>> $ ./Build test
>>> $ ./Build install
>>>
>>> This will only hammer your current installation in the
>>> $SITE_LIB/Bio/Restriction path; I worked only on
>>> a sparse checkout of the necessary files. To revert to your
>>> old install, do
>>>
>>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>>> $ ./Build install
>>>
>>> [In the possible event that these instructions are in error,
>>> there will be a response on this list in a matter of
>>> milliseconds, so stand by.]
>>>
>>> Happy coding-
>>> Mark
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, June 10, 2009 3:35 AM
>>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>>> rebasefile.
>>>
>>>
>>>> Hi,
>>>>
>>>> This is my first time using bioperl for restriction analysis, so  please 
>>>> bear with me, if this is a FAQ.
>>>>
>>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>>> the script shown at the bottom of the mail.
>>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>>
>>>> The scripts throws an exception - see below. But, if I comment out  the 
>>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>>> works.
>>>>
>>>> My problem is, that I need to use some of the enzymes that are  only 
>>>> available in rebase. So how do I get this working?
>>>>
>>>> Thanks for your attention.
>>>>
>>>> Best regards,
>>>> Rasmus Ory Nielsen
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script:
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>>
>>>> ------------- EXCEPTION -------------
>>>> MSG: Bad end parameter (11). End must be less than the total  length of 
>>>> sequence (total=7)
>>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>>> 5.10.0/Bio/PrimarySeq.pm:401
>>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>>> STACK toplevel ./restriction_test.pl:30
>>>> -------------------------------------
>>>>
>>>> [roni at ksdhcp ~]$
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script with the '-enzymes' argument commented out
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>> $VAR1 = [
>>>>          {
>>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>>            'end' => 15,
>>>>            'start' => '1'
>>>>          },
>>>>          {
>>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>>            'end' => 34,
>>>>            'start' => '16'
>>>>          }
>>>>        ];
>>>> [roni at ksdhcp ~]$
>>>>
>>>> ############################################################
>>>>
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::PrimarySeq;
>>>> use Bio::Restriction::IO;
>>>> use Bio::Restriction::Analysis;
>>>> use Data::Dumper;
>>>>
>>>> # create seq obj
>>>> my $seqobj = new Bio::PrimarySeq(
>>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>>    -primary_id => 'test',
>>>>    -molecule   => 'dna'
>>>> );
>>>>
>>>> # read rebase file
>>>> my $rebase_io = Bio::Restriction::IO->new(
>>>>    -file   => 'withrefm.906',
>>>>    -format => 'withrefm',
>>>> );
>>>> my $rebase_collection = $rebase_io->read;
>>>>
>>>> # start restriction analysis
>>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>>    -seq     => $seqobj,
>>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>>> out
>>>> );
>>>>
>>>> # retrieve fragment maps
>>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>>> print Dumper \@fragment_maps;
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From MEC at stowers.org  Tue Jun 16 16:13:33 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Tue, 16 Jun 2009 15:13:33 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A389@exchmb-02.stowers-institute.org>

Chris!

erm, yeah, I do....

... and I will schedule some time to code up a test and add it to AlignI's suite....

Malcolm
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Tuesday, June 16, 2009 2:08 PM
> To: Malcolm Cook
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Alignment->slice() issue?
> 
> Sounds to me like a BioPerl bug.  Do you have some example 
> data demonstrating the problem?
> 
> chris
> 
> On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:
> 
> > Kevin,
> >
> > I'm getting struck by this old issue you once coded around.
> >
> >      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> >
> > Any chance you could share your implementation with  fellow 
> > traveller...
> >
> > ??
> >
> > Thanks,
> >
> > Malcolm Cook
> > Stowers insitute for Medical research
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Tue Jun 16 22:47:39 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 22:47:39 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>

Dear All,

The refactored Bio::Restriction::* has been merged to trunk, with all
tests passing. [Anyone got a cigarette?]

cheers,
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Russell.Smithies at agresearch.co.nz  Tue Jun 16 23:21:22 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 17 Jun 2009 15:21:22 +1200
Subject: [Bioperl-l] Bio::Restriction
	refactor	[Was:Bio::Restriction::Analysis. Exception when
	using rebasefile.]
In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz>

Cigarettes are post-coitus and pre-firing squad.
What you'd be needing is a cigar (proud father)

;-)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Wednesday, 17 June 2009 2:48 p.m.
> To: bioperl-l at lists.open-bio.org
> Cc: Rasmus Ory Nielsen
> Subject: Re: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
> 
> Dear All,
> 
> The refactored Bio::Restriction::* has been merged to trunk, with all
> tests passing. [Anyone got a cigarette?]
> 
> cheers,
> Mark
> 
> ----- Original Message -----
> From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis.
> Exception when using rebasefile.]
> 
> 
> > Dear All,
> >
> > The revamped Bio::Restriction::* in branch
> >
> > REPOS/bioperl-live/branches/restriction-refactor
> >
> > passes all existing tests, including those in t/Restriction.
> > New tests will be added within the next day or so.
> > The original bug occurred because only a subset of
> > the possible rebase withrefm-formatted enzymes were
> > handled; it choked on freshly-downloaded rebase
> > files because of this.
> >
> > The refactored version now handles *all* rebase types,
> > including those of rebase forms
> >
> > XXX^X                [ intrasite cutters, the main types
> >                               built in to base.pm]
> > XXXX(m/n)          [ right-end extrasite cutters ]
> > (s/t)XXXX            [ left-end ditto ]
> > (s/t)XXXX(m/n)    [ double-end ditto],
> >
> > palindromic and non-palindromic, as well as multisite
> > enzymes that string together combinations of these
> > forms. Much rationalization (well, seems rational to me
> > anyway) and cruft removal in the affected code has also
> > occurred. itype2.pm has been updated as well, to
> > conform to the refactoring.
> >
> > If you're dying to try this now, get a working copy
> > of the branch like so
> >
> > $ svn co
> > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor
> > bioperl-rr
> > $ cd bioperl-rr
> > $ perl Build.PL
> > $ ./Build test
> > $ ./Build install
> >
> > This will only hammer your current installation in the
> > $SITE_LIB/Bio/Restriction path; I worked only on
> > a sparse checkout of the necessary files. To revert to your
> > old install, do
> >
> > $ cd $MY_OLD_BIOPERL_WORKINGDIR
> > $ ./Build install
> >
> > [In the possible event that these instructions are in error,
> > there will be a response on this list in a matter of
> > milliseconds, so stand by.]
> >
> > Happy coding-
> > Mark
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Rasmus Ory Nielsen" <ron at ron.dk>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, June 10, 2009 3:35 AM
> > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
> > rebasefile.
> >
> >
> >> Hi,
> >>
> >> This is my first time using bioperl for restriction analysis, so please
> bear
> >> with me, if this is a FAQ.
> >>
> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created
> the
> >> script shown at the bottom of the mail.
> >> My bioperl version is bioperl-live nightly from 09-Jun-2009.
> >>
> >> The scripts throws an exception - see below. But, if I comment out the
> >> '-enzymes' argument, so it uses the built-in collection of enzymes, it
> works.
> >>
> >> My problem is, that I need to use some of the enzymes that are only
> available
> >> in rebase. So how do I get this working?
> >>
> >> Thanks for your attention.
> >>
> >> Best regards,
> >> Rasmus Ory Nielsen
> >>
> >>
> >> ############################################################
> >> Output from the script:
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >>
> >> ------------- EXCEPTION -------------
> >> MSG: Bad end parameter (11). End must be less than the total length of
> >> sequence (total=7)
> >> STACK Bio::PrimarySeq::subseq
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> >> STACK Bio::Restriction::Analysis::_enzyme_sites
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> >> STACK Bio::Restriction::Analysis::_cuts
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> >> STACK Bio::Restriction::Analysis::cut
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> >> STACK Bio::Restriction::Analysis::fragment_maps
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> >> STACK toplevel ./restriction_test.pl:30
> >> -------------------------------------
> >>
> >> [roni at ksdhcp ~]$
> >>
> >>
> >> ############################################################
> >> Output from the script with the '-enzymes' argument commented out
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >> $VAR1 = [
> >>           {
> >>             'seq' => 'CTCGACCGTTAGCAA',
> >>             'end' => 15,
> >>             'start' => '1'
> >>           },
> >>           {
> >>             'seq' => 'AGCTTTCTACCGTTATCGT',
> >>             'end' => 34,
> >>             'start' => '16'
> >>           }
> >>         ];
> >> [roni at ksdhcp ~]$
> >>
> >> ############################################################
> >>
> >> #!/usr/bin/perl
> >> use strict;
> >> use warnings;
> >> use Bio::PrimarySeq;
> >> use Bio::Restriction::IO;
> >> use Bio::Restriction::Analysis;
> >> use Data::Dumper;
> >>
> >> # create seq obj
> >> my $seqobj = new Bio::PrimarySeq(
> >>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
> >>     -primary_id => 'test',
> >>     -molecule   => 'dna'
> >> );
> >>
> >> # read rebase file
> >> my $rebase_io = Bio::Restriction::IO->new(
> >>     -file   => 'withrefm.906',
> >>     -format => 'withrefm',
> >> );
> >> my $rebase_collection = $rebase_io->read;
> >>
> >> # start restriction analysis
> >> my $restriction_analysis = Bio::Restriction::Analysis->new(
> >>     -seq     => $seqobj,
> >>     -enzymes => $rebase_collection,    # it works with this line commented
> >> out
> >> );
> >>
> >> # retrieve fragment maps
> >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> >> print Dumper \@fragment_maps;
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From e.stupka at ucl.ac.uk  Wed Jun 17 07:29:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 12:29:08 +0100
Subject: [Bioperl-l] Next-gen modules
Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>

Dear all,

after several years of absence I am slowly coming back to Bioperl, and  
hope to contribute again to its development.

One area that I was thinking of starting from, since we are actively  
involved with it, is to improve BIoperl's support fo next-gen  
sequencing data, tools, etc. Since I am sure I have missed out on a  
lot of recent developments, do let me know if/what is useful.

One example that comes to mind is that the conversion of various  
formats to/from FASTQ does not seem to be supported. Some code can be  
found within Li Heng's script: http://maq.sourceforge.net/ 
fq_all2std.pl but it would be good if it could make its way into  
SeqIO? And similarly, potentially, for other next-gen sequence formats?

Similarly, there seems to be little in bioperl-run to support tools  
that have been developed in this area, such as Maq, BowTie, TopHat, etc?

Do let me know if there is a past thread on this, or other people  
actively developing, etc. so that I can find out what priorities are.

thanks and best regards to all (old friends and new),

Elia

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 08:19:04 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:19:04 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>

[ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ]
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Wed Jun 17 08:21:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 13:21:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com>

On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?

If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)

I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO

Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.

For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.

> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.

Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html

Regards,

Peter (at Biopython)


From maj at fortinbras.us  Wed Jun 17 08:02:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:02:11 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <92C15E3391F64BAF801754E924122540@NewLife>

Elia--
I say a definite +1; in fact, this sounds like it should be a Hot Topic 
(see http://www.bioperl.org/wiki/Category:Hot_Topics for some others
you might have missed in your hiatus...). I will create a page that 
can be a central point for wish lists, discussion, etc.

There has been much discussion of late about FASTQ 
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html

cheers from a newbie, 
Mark

----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 08:57:52 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 07:57:52 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>

Elia,

As Mark indicated, we recently discussed the lack of support for next- 
gen on list, at least re: fastq.  I may be hit with the same thing in  
a few months time myself, and I recall Jason and a few others also  
mentioning the same.  Heikki wrote some code for Illumina FASTQ for  
SeqIO and related modules but I don't believe it has been committed to  
trunk yet, so maybe he can answer.

 From prior discussions IIRC the issues were:

1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
Illumina 1.3) from one another (so maybe some optional validation), and
2) having a way for the Seq object to either 'know' what format is  
contained, or we use phred score and convert back and forth from that  
(I think the latter makes more sense).

Peter's suggestions also are reasonable, though does biopython have a  
separate module for each of these variations?  Our version (I believe)  
mainly varied the conversion within Bio::SeqIO::fastq itself based on  
the fastq variant passed in as a separate named argument.

As for the wrappers, we would most certainly welcome them!

chris

On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl,  
> and hope to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can  
> be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl 
>  but it would be good if it could make its way into SeqIO? And  
> similarly, potentially, for other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?
>
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 08:54:22 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 13:54:22 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>

Dear Mark,

thanks a lot for the pointers.

With regards to FASTQ parsing:

-my understanding by reading past threads is to work on a single  
format, i.e. FASTQ and to interpet the quality "flavours" as just  
quality conversions, right?

-However, I assume we would still want to support a simple way for the  
user to say format => 'fastq-solexa' using the nomenclature adopted in  
BioPython suggested by Peter, right?

-I also saw Heikki's "long essay", but did not yet compare to Heng  
Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we  
would hope they would produce identical outputs, will be a good check.

Finally, I saw Tristan's reply to Heikki's thread, so what is the  
status quo? Is it moving forward?

cheers

Elia


On 17 Jun 2009, at 13:02, Mark A. Jensen wrote:

> Elia--
> I say a definite +1; in fact, this sounds like it should be a Hot  
> Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some  
> others
> you might have missed in your hiatus...). I will create a page that  
> can be a central point for wish lists, discussion, etc.
>
> There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html
>
> cheers from a newbie, Mark
>
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:25:59 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:25:59 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
> Elia,
>
> As Mark indicated, we recently discussed the lack of support for next-gen on
> list, at least re: fastq. ?I may be hit with the same thing in a few months
> time myself, and I recall Jason and a few others also mentioning the same.
> ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but
> I don't believe it has been committed to trunk yet, so maybe he can answer.
>
> From prior discussions IIRC the issues were:
>
> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina
> 1.3) from one another (so maybe some optional validation), and

Following the python rule of thumb for being explicit, Biopython makes
the user specify which FASTQ variant is being used. I don't think you
can do anything else. Any attempted validation would have to be
heuristic based on the ASCII characters found, and would risk false
positive warnings.

> 2) having a way for the Seq object to either 'know' what format is
> contained, or we use phred score and convert back and forth from that (I
> think the latter makes more sense).

I think it could make sense for BioPerl to convert Solexa scores to/from
PHRED scores on the fly (especially now that Illumina is abandoning
the Solexa score system). Python style tries to avoid implicit conversions,
so Biopython doesn't automatically do a conversion from Solexa to
PHRED scores on parsing (but will on writing if the requested output
format requires this).

> Peter's suggestions also are reasonable, though does biopython have a
> separate module for each of these variations? ?Our version (I believe)
> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
> fastq variant passed in as a separate named argument.

Biopython's SeqIO gives the three FASTQ variants their own unique
names. This format name is a required argument for parsing/writing
(we don't try and guess the file format from the data contents). Internally
we have three separate FASTQ parsers/writers although they do share
code.

Other issues to keep in mind:

(3) There should be no warning parsing files where the optional repeated
title is missing on the "+" lines (as discussed earlier on the BioPerl list).

(4) When writing FASTQ files should BioPerl omit the optional repeated
title on the "+" line? Biopython omits this as I understand this to be
common practice, and can make a big different to file sizes - especially
on short read data from Solexa/Illumina.

(5) Also test reading and writing files with an optional description (as well
as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples,
e.g.

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC


(6) Test reading and writing files where the encoded quality string starts
with a "@" or a "+" character, e.g.
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html

Peter


From tristan.lefebure at gmail.com  Wed Jun 17 09:27:12 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 09:27:12 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <200906170927.13273.tristan.lefebure@gmail.com>

Hello,
Regarding next-gen sequences and bioperl, following my 
experience, another issue is bioperl speed. For example, if 
you want to trim bad quality bases at ends of 1E6 Solexa 
reads using Bio::SeqIO::fastq and some methods in 
Bio::Seq::Quality, well, you've got to be patient (but may 
be I missed some shortcuts...).

A pure perl solution will be between 100 to 1000x faster... 
Would it be possible to have an ultra-light quality object 
with few simple methods for next-gen reads?

I can contribute some tests if that sounds like an important 
point.

-Tristan


On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote:
> Elia--
> I say a definite +1; in fact, this sounds like it should
> be a Hot Topic (see
> http://www.bioperl.org/wiki/Category:Hot_Topics for some
> others you might have missed in your hiatus...). I will
> create a page that can be a central point for wish lists,
> discussion, etc.
>
> There has been much discussion of late about FASTQ
> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0
>30187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/
>029765.html
>
> cheers from a newbie,
> Mark
>
> ----- Original Message -----
> From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
> > Dear all,
> >
> > after several years of absence I am slowly coming back
> > to Bioperl, and hope to contribute again to its
> > development.
> >
> > One area that I was thinking of starting from, since we
> > are actively involved with it, is to improve BIoperl's
> > support fo next-gen sequencing data, tools, etc. Since
> > I am sure I have missed out on a lot of recent
> > developments, do let me know if/what is useful.
> >
> > One example that comes to mind is that the conversion
> > of various formats to/from FASTQ does not seem to be
> > supported. Some code can be found within Li Heng's
> > script: http://maq.sourceforge.net/ fq_all2std.pl but
> > it would be good if it could make its way into SeqIO?
> > And similarly, potentially, for other next-gen sequence
> > formats?
> >
> > Similarly, there seems to be little in bioperl-run to
> > support tools that have been developed in this area,
> > such as Maq, BowTie, TopHat, etc?
> >
> > Do let me know if there is a past thread on this, or
> > other people actively developing, etc. so that I can
> > find out what priorities are.
> >
> > thanks and best regards to all (old friends and new),
> >
> > Elia
> >
> > ---
> > Senior Lecturer, Bioinformatics
> > UCL Cancer Institute
> > Paul O' Gorman Building
> > University College London
> > Gower Street
> > WC1E 6BT
> > London
> > UK
> >
> > Office (UCL): +44 207 679 6493
> > Office (ICMS): +44 0207 8822374
> >
> > Mobile: +44 7597 566 194
> > Mobile (Italy): +39 338 8448801
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 17 09:54:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:54:45 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear Mark,
>
> thanks a lot for the pointers.
>
> With regards to FASTQ parsing:
>
> -my understanding by reading past threads is to work on a single format,
> i.e. FASTQ and to interpet the quality "flavours" as just quality
> conversions, right?
> -However, I assume we would still want to support a simple way for the user
> to say format => 'fastq-solexa' using the nomenclature adopted in BioPython
> suggested by Peter, right?

I think you will need a way for the user to say they have a Solexa, or
an Illumina 1.3+, or an original Sanger standard FASTQ file.

>From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I
assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format
and the variant
"flybase_chadoxml" format). Does BioPerl's SeqIO format system have any
concept of flavour that I am not aware of?

> -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code
> at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they
> would produce identical outputs, will be a good check.

Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful
guide (although it doesn't yet cope with the new Illumina 1.3+ variant),
but I don't trust it 100%. See e.g.
http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html
http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html

Peter


From john.marshall at sanger.ac.uk  Wed Jun 17 09:28:12 2009
From: john.marshall at sanger.ac.uk (John Marshall)
Date: Wed, 17 Jun 2009 14:28:12 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>

On 17 Jun 2009, at 12:29, Elia Stupka wrote:
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?

FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit  
in the not too distant future.  (First it needs some "blah blah"  
replaced with actual documentation and a test suite.)

Cheers,

     John

[1] http://www.ebi.ac.uk/~zerbino/velvet/


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From Kevin.M.Brown at asu.edu  Wed Jun 17 11:41:18 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 17 Jun 2009 08:41:18 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>

Warning: This is very ugly code and makes a few assumptions, such as the
alignment objects are stored in order of their start position. I made
this assumption as that is how I put them into the object to begin with.

=head1 C<slice>

Function to slice up an alignment sequence based on start and end
parameters
and returns a new alignment object.

slice($alignment, $start, $end)

=cut

sub slice
{
	my ($alignment, $start, $end, $new_align) = @_;

	$$new_align = new Bio::SimpleAlign;
	print $$alignment->no_sequences() . "\n";

	$$new_align->add_seq(
			   new Bio::LocatableSeq(
				   -seq =>
					 substr(
	
$$alignment->get_seq_by_pos(1)->seq(),
							$start - 1, $end
- $start + 1
						   ),
				   -id    =>
$$alignment->get_seq_by_pos(1)->display_id(),
				   -start =>
	
max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
				   -end => min(
	
$$alignment->get_seq_by_pos(1)->end - $start + 1,
							   $end - $start
+ 1
							  ),
				   -alphabet => 'dna',
				   -strand   =>
$$alignment->get_seq_by_pos(1)->strand()
			   )
	);

	# implement a binary search to determine a decent offset into
the alignment
	my $probe;
	
	if ($$alignment->no_sequences() <= 2) {
		$probe = $$alignment->no_sequences();
	}
	else {
	my ($L, $R) = (1, $$alignment->no_sequences());
	while (($R - $L) > 1)
	{
		$probe = floor(($R + $L) / 2);

		# gotta watch this.  Had the check backwards and so was
never going
		# in the right direction for the search.  If I reverse
these two
		# variables, then I have to either reverse the
conditions or change
		# the > to a <.
		if ($$alignment->get_seq_by_pos($probe)->start() >
$start)
		{
			$R = $probe;
		}
		else
		{
			$L = $probe;
		}
	}
	}
	# now go through the results that are after that point
	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
	{
		my $seq = $$alignment->get_seq_by_pos($i);
		last if ($seq->start() > $end);

		# Only concern ourselves with primers that land inside
the desired region
		# other primers will show up in the image maps for each
gene.
		if ($seq->start() >= $start && $seq->end() <= $end)
		{

			# values for the substr pullout of a given
sequence
			my $offset = max($start - $seq->start(), 0);
			my $length =
			  min($end, $seq->end()) - max($start,
$seq->start()) + 1;
			$$new_align->add_seq(
					 new Bio::LocatableSeq(
						 -seq   => $seq->seq(),
						 -id    =>
$seq->display_id(),
						 -start =>
max($seq->start - $start + 1, 1),
						 -end => min($seq->end -
$start + 1, $end - $start + 1),
						 -alphabet => 'dna',
						 -strand   =>
$seq->strand()
					 )
			);
		}
	}
	return 1;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Malcolm Cook
> Sent: Tuesday, June 16, 2009 1:07 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Alignment->slice() issue?
> 
> Kevin,
> 
> I'm getting struck by this old issue you once coded around.
> 
>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> 
> Any chance you could share your implementation with  fellow 
> traveller...
> 
> ??
> 
> Thanks,
> 
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jun 17 12:47:38 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 12:47:38 -0400
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>


Hi All, 

I thought I'd revisit this thread, since in the last couple weeks,
have used both techniques (bioperl-dev and branch from trunk) to
produce completed projects. My thoughts:

Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
new addition to the core api. There was no pressure to conform to the
existing api there. In particular, there was no implicit insistence to
make things work through Bio::Search::Utils, and I was free to factor
it out. The Tiling api was definitely unstable until the end, when it
was ported to the core. As I made regular reports to bioperl-l,
everything was transparent and up front, and I received excellent
suggestions there (as usual). 

For Bio::Restriction, using the branch was just as natural. Here, the
existing structure was well established, and all the work needed to
happen beneath the api. All old t/Restriction tests needed to pass,
and additional ones created for the new functionality. So here, using
bioperl-dev wasn't natural, even though some "experiments" needed to
be tried (some succeeded and some failed, as you can see in the
commentary at Bug #2855). Even though the new code turned out to
require substantial effort, the effort was required to fix a true bug
in the working core, and any fixes needed to work transparently with
respect to the users for whom this bug had not been an issue. Using
the branch made it relatively easy to merge quickly back into the core
when done, and there is a certain psychological pressure too provided
by an open branch which is helpful.

Hilmar raised the very good point in the previous discussion that
(essentially) bioperl-dev shouldn't become a sandbox with lots of
unfinished code scraps and derelict stuff that doesn't work. My view
is bioperl-dev will become a sandbox only if we treat it like
one. I've filled out the Bioperl-dev page on the wiki
(http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
some recognition to devs there whose modules become part of the
core may be a better way to insure that projects that are started on
bioperl-dev actually get finished, than to prescribe beforehand what
kinds of projects may get started. I believe this follows the adage of
liberality on what is accepted, and strictness on what is emitted.

cheers, 
MAJ


----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at duke.edu>
To: "Chase Miller" <chmille4 at gmail.com>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, May 21, 2009 4:00 PM
Subject: Re: [Bioperl-l] bioperl-dev or branch?


> Moving this question to the BioPerl list, which is where we need to  
> discuss this I think. Can someone refresh my memory on what the  
> Bioperl-dev repository is or was meant for? It doesn't seem documented  
> on the wiki.
> 
> My (admittedly vague) recollection is that bioperl-dev is basically  
> for highly experimental changes or functionality.
> 
> I'm not clear why everything else shouldn't go either into the main  
> trunk or into a branch. If there is a realistic expectation for  
> something to be folded into the main trunk sooner or later, what would  
> be the reasons for not putting it into a branch of the main  
> repository? If we are putting it into a separate repository, we're  
> waiving a lot of svn's support for merging and resolving concurrent  
> edits.
> 
> I would also go actually go a step further and suggest that even if  
> this GSoC project starts out on a branch (which I can see good reasons  
> for, such as eliminating fear to disrupt something), there should be a  
> plan to move to main trunk before the end of the project. We've had a  
> good tradition in BioPerl of developing directly on the main trunk. It  
> sometimes leads to occasional disruptions when lots of tests seem  
> failing, but it also encourages development discipline and make new  
> code to melt into the BioPerl code base without requiring any extra  
> steps by someone.
> 
> Any and all thoughts or comments welcome and appreciated!
> 
> -hilmar
> 
> On May 21, 2009, at 11:26 AM, Chase Miller wrote:
> 
>> This brings me to a question about where I should have my code  
>> repository.  Originally, I was going to use Bioperl-dev, but it was  
>> brought to my attention that that repository does not normally  
>> receive daily updates and it might not be the right place for my day  
>> to day development.  An alternative would be to use something like  
>> google code on a daily basis and commit to Bioperl-dev on a weekly  
>> basis.
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 13:06:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:06:44 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>


On Jun 17, 2009, at 8:25 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>
>> Elia,
>>
>> As Mark indicated, we recently discussed the lack of support for  
>> next-gen on
>> list, at least re: fastq.  I may be hit with the same thing in a  
>> few months
>> time myself, and I recall Jason and a few others also mentioning  
>> the same.
>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  
>> modules but
>> I don't believe it has been committed to trunk yet, so maybe he can  
>> answer.
>>
>> From prior discussions IIRC the issues were:
>>
>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
>> Illumina
>> 1.3) from one another (so maybe some optional validation), and
>
> Following the python rule of thumb for being explicit, Biopython makes
> the user specify which FASTQ variant is being used. I don't think you
> can do anything else. Any attempted validation would have to be
> heuristic based on the ASCII characters found, and would risk false
> positive warnings.

Right; I'm thinking along the same lines.  If anything the most we  
would allow is some level of validation, so if there were a degree of  
uncertainty about the format one could set a validation flag to check  
bounds during the parse and warn if they are exceeded.

>> 2) having a way for the Seq object to either 'know' what format is
>> contained, or we use phred score and convert back and forth from  
>> that (I
>> think the latter makes more sense).
>
> I think it could make sense for BioPerl to convert Solexa scores to/ 
> from
> PHRED scores on the fly (especially now that Illumina is abandoning
> the Solexa score system). Python style tries to avoid implicit  
> conversions,
> so Biopython doesn't automatically do a conversion from Solexa to
> PHRED scores on parsing (but will on writing if the requested output
> format requires this).
>
>> Peter's suggestions also are reasonable, though does biopython have a
>> separate module for each of these variations?  Our version (I  
>> believe)
>> mainly varied the conversion within Bio::SeqIO::fastq itself based  
>> on the
>> fastq variant passed in as a separate named argument.
>
> Biopython's SeqIO gives the three FASTQ variants their own unique
> names. This format name is a required argument for parsing/writing
> (we don't try and guess the file format from the data contents).  
> Internally
> we have three separate FASTQ parsers/writers although they do share
> code.

We could easily do the same if others agree.  Actually, if we  
specified that shorthand for a variant on a format would be designated  
as -format => 'format-variant', I think we could easily hack SeqIO to  
deal with that by splitting on '-' and passing everything to the  
constructor as (-format => 'format', -variant => 'variant').  Very  
little repeated code in this case, just an additional named parameter  
indicating the format variant (and the SeqIO class can do the type  
checking on that within the constructor).

> Other issues to keep in mind:
>
> (3) There should be no warning parsing files where the optional  
> repeated
> title is missing on the "+" lines (as discussed earlier on the  
> BioPerl list).

Agreed, though we'll have to check the current fastq parser to see if  
that's currently the case.  I thought that was fixed but maybe not?

> (4) When writing FASTQ files should BioPerl omit the optional repeated
> title on the "+" line? Biopython omits this as I understand this to be
> common practice, and can make a big different to file sizes -  
> especially
> on short read data from Solexa/Illumina.

Agreed, particularly if it's commonly encountered.

> (5) Also test reading and writing files with an optional description  
> (as well
> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  
> examples,
> e.g.
>
> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

Should be easy enough to implement with a simple regex.

> (6) Test reading and writing files where the encoded quality string  
> starts
> with a "@" or a "+" character, e.g.
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>
> Peter

Mark, getting all that? ;>

chris


From cjfields at illinois.edu  Wed Jun 17 13:09:54 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:09:54 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>


On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

The key issues affecting speed in bioperl are contained object  
instantiation and inheritance (and between those two, the latter much  
more so as it plays a role with contained objects as well as the  
container).

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

Moose/Perl6 roles/traits are one way around that issue, but we are a  
ways off from getting that running.  I think to get that working  
decently would be a from-ground-up endeavor (see my past posts on  
biomoose/bioperl6).

> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan

The quality objects themselves I don't think are that heavy; I think  
the main impediment is inheritance.  One could get around that a bit  
by using a direct_new method to create a blessed hash directly, then  
reimplement methods to lazily create any objects contained on the fly.

chris


From bill at genenformics.com  Wed Jun 17 13:03:16 2009
From: bill at genenformics.com (bill at genenformics.com)
Date: Wed, 17 Jun 2009 10:03:16 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com>

Hopefully this is helpful.

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648

Bill at genenformics

> Warning: This is very ugly code and makes a few assumptions, such as the
> alignment objects are stored in order of their start position. I made
> this assumption as that is how I put them into the object to begin with.
>
> =head1 C<slice>
>
> Function to slice up an alignment sequence based on start and end
> parameters
> and returns a new alignment object.
>
> slice($alignment, $start, $end)
>
> =cut
>
> sub slice
> {
> 	my ($alignment, $start, $end, $new_align) = @_;
>
> 	$$new_align = new Bio::SimpleAlign;
> 	print $$alignment->no_sequences() . "\n";
>
> 	$$new_align->add_seq(
> 			   new Bio::LocatableSeq(
> 				   -seq =>
> 					 substr(
>
> $$alignment->get_seq_by_pos(1)->seq(),
> 							$start - 1, $end
> - $start + 1
> 						   ),
> 				   -id    =>
> $$alignment->get_seq_by_pos(1)->display_id(),
> 				   -start =>
>
> max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
> 				   -end => min(
>
> $$alignment->get_seq_by_pos(1)->end - $start + 1,
> 							   $end - $start
> + 1
> 							  ),
> 				   -alphabet => 'dna',
> 				   -strand   =>
> $$alignment->get_seq_by_pos(1)->strand()
> 			   )
> 	);
>
> 	# implement a binary search to determine a decent offset into
> the alignment
> 	my $probe;
>
> 	if ($$alignment->no_sequences() <= 2) {
> 		$probe = $$alignment->no_sequences();
> 	}
> 	else {
> 	my ($L, $R) = (1, $$alignment->no_sequences());
> 	while (($R - $L) > 1)
> 	{
> 		$probe = floor(($R + $L) / 2);
>
> 		# gotta watch this.  Had the check backwards and so was
> never going
> 		# in the right direction for the search.  If I reverse
> these two
> 		# variables, then I have to either reverse the
> conditions or change
> 		# the > to a <.
> 		if ($$alignment->get_seq_by_pos($probe)->start() >
> $start)
> 		{
> 			$R = $probe;
> 		}
> 		else
> 		{
> 			$L = $probe;
> 		}
> 	}
> 	}
> 	# now go through the results that are after that point
> 	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
> 	{
> 		my $seq = $$alignment->get_seq_by_pos($i);
> 		last if ($seq->start() > $end);
>
> 		# Only concern ourselves with primers that land inside
> the desired region
> 		# other primers will show up in the image maps for each
> gene.
> 		if ($seq->start() >= $start && $seq->end() <= $end)
> 		{
>
> 			# values for the substr pullout of a given
> sequence
> 			my $offset = max($start - $seq->start(), 0);
> 			my $length =
> 			  min($end, $seq->end()) - max($start,
> $seq->start()) + 1;
> 			$$new_align->add_seq(
> 					 new Bio::LocatableSeq(
> 						 -seq   => $seq->seq(),
> 						 -id    =>
> $seq->display_id(),
> 						 -start =>
> max($seq->start - $start + 1, 1),
> 						 -end => min($seq->end -
> $start + 1, $end - $start + 1),
> 						 -alphabet => 'dna',
> 						 -strand   =>
> $seq->strand()
> 					 )
> 			);
> 		}
> 	}
> 	return 1;
> }
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Malcolm Cook
>> Sent: Tuesday, June 16, 2009 1:07 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Alignment->slice() issue?
>>
>> Kevin,
>>
>> I'm getting struck by this old issue you once coded around.
>>
>>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>>
>> Any chance you could share your implementation with  fellow
>> traveller...
>>
>> ??
>>
>> Thanks,
>>
>> Malcolm Cook
>> Stowers insitute for Medical research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Wed Jun 17 13:13:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 13:13:23 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>

I'm on the case! (but maybe not in realtime, today!)

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" 
<e.stupka at ucl.ac.uk>; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
Sent: Wednesday, June 17, 2009 1:06 PM
Subject: Re: [Bioperl-l] Next-gen modules


>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  wrote:
>>>
>>> Elia,
>>>
>>> As Mark indicated, we recently discussed the lack of support for  next-gen 
>>> on
>>> list, at least re: fastq.  I may be hit with the same thing in a  few months
>>> time myself, and I recall Jason and a few others also mentioning  the same.
>>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  modules 
>>> but
>>> I don't believe it has been committed to trunk yet, so maybe he can  answer.
>>>
>>> From prior discussions IIRC the issues were:
>>>
>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, 
>>> Illumina
>>> 1.3) from one another (so maybe some optional validation), and
>>
>> Following the python rule of thumb for being explicit, Biopython makes
>> the user specify which FASTQ variant is being used. I don't think you
>> can do anything else. Any attempted validation would have to be
>> heuristic based on the ASCII characters found, and would risk false
>> positive warnings.
>
> Right; I'm thinking along the same lines.  If anything the most we  would 
> allow is some level of validation, so if there were a degree of  uncertainty 
> about the format one could set a validation flag to check  bounds during the 
> parse and warn if they are exceeded.
>
>>> 2) having a way for the Seq object to either 'know' what format is
>>> contained, or we use phred score and convert back and forth from  that (I
>>> think the latter makes more sense).
>>
>> I think it could make sense for BioPerl to convert Solexa scores to/ from
>> PHRED scores on the fly (especially now that Illumina is abandoning
>> the Solexa score system). Python style tries to avoid implicit  conversions,
>> so Biopython doesn't automatically do a conversion from Solexa to
>> PHRED scores on parsing (but will on writing if the requested output
>> format requires this).
>>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations?  Our version (I  believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based  on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).  Internally
>> we have three separate FASTQ parsers/writers although they do share
>> code.
>
> We could easily do the same if others agree.  Actually, if we  specified that 
> shorthand for a variant on a format would be designated  as -format => 
> 'format-variant', I think we could easily hack SeqIO to  deal with that by 
> splitting on '-' and passing everything to the  constructor as (-format => 
> 'format', -variant => 'variant').  Very  little repeated code in this case, 
> just an additional named parameter  indicating the format variant (and the 
> SeqIO class can do the type  checking on that within the constructor).
>
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional  repeated
>> title is missing on the "+" lines (as discussed earlier on the  BioPerl 
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if  that's 
> currently the case.  I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes -  especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description  (as 
>> well
>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  examples,
>> e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string  starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From e.stupka at ucl.ac.uk  Wed Jun 17 13:49:38 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 18:49:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
Message-ID: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>

I would suggest developing the "standard" version first, then moving  
onto potential optimizations.

When we went through a similar argument in Ensembl about 8 years ago  
we ended up dropping Bio::Root completely...

If one is truly after performance for these large next-gen projects,  
it'd be down to pure piping, shell, and worrying about location and  
copying of files, sticking to systems-level as much as possible, and  
quite far from Bioperl altogether, so I think it's a whole different  
level of optimization issues, probably outside the scope of Bioperl.

Elia

On 17 Jun 2009, at 18:09, Chris Fields wrote:

>
> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>
>> Hello,
>> Regarding next-gen sequences and bioperl, following my
>> experience, another issue is bioperl speed. For example, if
>> you want to trim bad quality bases at ends of 1E6 Solexa
>> reads using Bio::SeqIO::fastq and some methods in
>> Bio::Seq::Quality, well, you've got to be patient (but may
>> be I missed some shortcuts...).
>
> The key issues affecting speed in bioperl are contained object  
> instantiation and inheritance (and between those two, the latter  
> much more so as it plays a role with contained objects as well as  
> the container).
>
> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>
> Moose/Perl6 roles/traits are one way around that issue, but we are a  
> ways off from getting that running.  I think to get that working  
> decently would be a from-ground-up endeavor (see my past posts on  
> biomoose/bioperl6).
>
>> A pure perl solution will be between 100 to 1000x faster...
>> Would it be possible to have an ultra-light quality object
>> with few simple methods for next-gen reads?
>>
>> I can contribute some tests if that sounds like an important
>> point.
>>
>> -Tristan
>
> The quality objects themselves I don't think are that heavy; I think  
> the main impediment is inheritance.  One could get around that a bit  
> by using a direct_new method to create a blessed hash directly, then  
> reimplement methods to lazily create any objects contained on the fly.
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 13:52:49 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:52:49 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>

I think this is a top priority for a fall BioPerl release, maybe 1.6.2  
(I am planning on a summer 1.6.1 release still).  Made it into a bug  
report for tracking:

http://bugzilla.open-bio.org/show_bug.cgi?id=2857

If no one works on this I may take it up after the 1.6.1 release.

chris

On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:

> I'm on the case! (but maybe not in realtime, today!)
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
> Sent: Wednesday, June 17, 2009 1:06 PM
> Subject: Re: [Bioperl-l] Next-gen modules
>
>
>>
>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>
>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>> Fields<cjfields at illinois.edu>  wrote:
>>>>
>>>> Elia,
>>>>
>>>> As Mark indicated, we recently discussed the lack of support for   
>>>> next-gen on
>>>> list, at least re: fastq.  I may be hit with the same thing in a   
>>>> few months
>>>> time myself, and I recall Jason and a few others also mentioning   
>>>> the same.
>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>> modules but
>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>> can  answer.
>>>>
>>>> From prior discussions IIRC the issues were:
>>>>
>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>> 1.0, Illumina
>>>> 1.3) from one another (so maybe some optional validation), and
>>>
>>> Following the python rule of thumb for being explicit, Biopython  
>>> makes
>>> the user specify which FASTQ variant is being used. I don't think  
>>> you
>>> can do anything else. Any attempted validation would have to be
>>> heuristic based on the ASCII characters found, and would risk false
>>> positive warnings.
>>
>> Right; I'm thinking along the same lines.  If anything the most we   
>> would allow is some level of validation, so if there were a degree  
>> of  uncertainty about the format one could set a validation flag to  
>> check  bounds during the parse and warn if they are exceeded.
>>
>>>> 2) having a way for the Seq object to either 'know' what format is
>>>> contained, or we use phred score and convert back and forth from   
>>>> that (I
>>>> think the latter makes more sense).
>>>
>>> I think it could make sense for BioPerl to convert Solexa scores  
>>> to/ from
>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>> the Solexa score system). Python style tries to avoid implicit   
>>> conversions,
>>> so Biopython doesn't automatically do a conversion from Solexa to
>>> PHRED scores on parsing (but will on writing if the requested output
>>> format requires this).
>>>
>>>> Peter's suggestions also are reasonable, though does biopython  
>>>> have a
>>>> separate module for each of these variations?  Our version (I   
>>>> believe)
>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>> based  on the
>>>> fastq variant passed in as a separate named argument.
>>>
>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>> names. This format name is a required argument for parsing/writing
>>> (we don't try and guess the file format from the data contents).   
>>> Internally
>>> we have three separate FASTQ parsers/writers although they do share
>>> code.
>>
>> We could easily do the same if others agree.  Actually, if we   
>> specified that shorthand for a variant on a format would be  
>> designated  as -format => 'format-variant', I think we could easily  
>> hack SeqIO to  deal with that by splitting on '-' and passing  
>> everything to the  constructor as (-format => 'format', -variant =>  
>> 'variant').  Very  little repeated code in this case, just an  
>> additional named parameter  indicating the format variant (and the  
>> SeqIO class can do the type  checking on that within the  
>> constructor).
>>
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional   
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the   
>>> BioPerl list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if  that's currently the case.  I thought that was fixed but maybe  
>> not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -   
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description  (as well
>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>> for  examples,
>>> e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string  starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 14:01:28 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:01:28 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
	<16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
Message-ID: <E0FAC5DB-470E-48E1-A30F-B64E2E63EB86@ucl.ac.uk>

If we reach a consensus on how/who/what, I will be happy to contribute  
some coding time in the coming days.

Would it be a good starting point to start adding the different  
formats as named in BioPython, and test support for reading/wrting  
them? I could start playing with that.

regards,

Elia

On 17 Jun 2009, at 18:52, Chris Fields wrote:

> I think this is a top priority for a fall BioPerl release, maybe  
> 1.6.2 (I am planning on a summer 1.6.1 release still).  Made it into  
> a bug report for tracking:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2857
>
> If no one works on this I may take it up after the 1.6.1 release.
>
> chris
>
> On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:
>
>> I'm on the case! (but maybe not in realtime, today!)
>>
>> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
>> >
>> To: "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
>> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
>> Sent: Wednesday, June 17, 2009 1:06 PM
>> Subject: Re: [Bioperl-l] Next-gen modules
>>
>>
>>>
>>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>>> Fields<cjfields at illinois.edu>  wrote:
>>>>>
>>>>> Elia,
>>>>>
>>>>> As Mark indicated, we recently discussed the lack of support  
>>>>> for  next-gen on
>>>>> list, at least re: fastq.  I may be hit with the same thing in  
>>>>> a  few months
>>>>> time myself, and I recall Jason and a few others also  
>>>>> mentioning  the same.
>>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>>> modules but
>>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>>> can  answer.
>>>>>
>>>>> From prior discussions IIRC the issues were:
>>>>>
>>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>>> 1.0, Illumina
>>>>> 1.3) from one another (so maybe some optional validation), and
>>>>
>>>> Following the python rule of thumb for being explicit, Biopython  
>>>> makes
>>>> the user specify which FASTQ variant is being used. I don't think  
>>>> you
>>>> can do anything else. Any attempted validation would have to be
>>>> heuristic based on the ASCII characters found, and would risk false
>>>> positive warnings.
>>>
>>> Right; I'm thinking along the same lines.  If anything the most  
>>> we  would allow is some level of validation, so if there were a  
>>> degree of  uncertainty about the format one could set a validation  
>>> flag to check  bounds during the parse and warn if they are  
>>> exceeded.
>>>
>>>>> 2) having a way for the Seq object to either 'know' what format is
>>>>> contained, or we use phred score and convert back and forth  
>>>>> from  that (I
>>>>> think the latter makes more sense).
>>>>
>>>> I think it could make sense for BioPerl to convert Solexa scores  
>>>> to/ from
>>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>>> the Solexa score system). Python style tries to avoid implicit   
>>>> conversions,
>>>> so Biopython doesn't automatically do a conversion from Solexa to
>>>> PHRED scores on parsing (but will on writing if the requested  
>>>> output
>>>> format requires this).
>>>>
>>>>> Peter's suggestions also are reasonable, though does biopython  
>>>>> have a
>>>>> separate module for each of these variations?  Our version (I   
>>>>> believe)
>>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>>> based  on the
>>>>> fastq variant passed in as a separate named argument.
>>>>
>>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>>> names. This format name is a required argument for parsing/writing
>>>> (we don't try and guess the file format from the data contents).   
>>>> Internally
>>>> we have three separate FASTQ parsers/writers although they do share
>>>> code.
>>>
>>> We could easily do the same if others agree.  Actually, if we   
>>> specified that shorthand for a variant on a format would be  
>>> designated  as -format => 'format-variant', I think we could  
>>> easily hack SeqIO to  deal with that by splitting on '-' and  
>>> passing everything to the  constructor as (-format => 'format', - 
>>> variant => 'variant').  Very  little repeated code in this case,  
>>> just an additional named parameter  indicating the format variant  
>>> (and the SeqIO class can do the type  checking on that within the  
>>> constructor).
>>>
>>>> Other issues to keep in mind:
>>>>
>>>> (3) There should be no warning parsing files where the optional   
>>>> repeated
>>>> title is missing on the "+" lines (as discussed earlier on the   
>>>> BioPerl list).
>>>
>>> Agreed, though we'll have to check the current fastq parser to see  
>>> if  that's currently the case.  I thought that was fixed but maybe  
>>> not?
>>>
>>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>>> repeated
>>>> title on the "+" line? Biopython omits this as I understand this  
>>>> to be
>>>> common practice, and can make a big different to file sizes -   
>>>> especially
>>>> on short read data from Solexa/Illumina.
>>>
>>> Agreed, particularly if it's commonly encountered.
>>>
>>>> (5) Also test reading and writing files with an optional  
>>>> description  (as well
>>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>>> for  examples,
>>>> e.g.
>>>>
>>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>>
>>> Should be easy enough to implement with a simple regex.
>>>
>>>> (6) Test reading and writing files where the encoded quality  
>>>> string  starts
>>>> with a "@" or a "+" character, e.g.
>>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>>
>>>> Peter
>>>
>>> Mark, getting all that? ;>
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From tristan.lefebure at gmail.com  Wed Jun 17 14:09:42 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 14:09:42 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <200906171409.42558.tristan.lefebure@gmail.com>

Thanks both for the light.

That probably means that the place bioperl will take in the 
handling of the next-gen sequencing raw data (i.e. reads) is 
very limited, nope? (at least until bioperl6). A single GA2 
solexa lane generates about 9 million reads, and I would 
really not called that a big project...

BTW, is there a simple way to see object instantiation and 
inheritance, as well as time consumption for each, when once 
calls next_seq() (or any other method)?

-Tristan

On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
> I would suggest developing the "standard" version first,
> then moving onto potential optimizations.
>
> When we went through a similar argument in Ensembl about
> 8 years ago we ended up dropping Bio::Root completely...
>
> If one is truly after performance for these large
> next-gen projects, it'd be down to pure piping, shell,
> and worrying about location and copying of files,
> sticking to systems-level as much as possible, and quite
> far from Bioperl altogether, so I think it's a whole
> different level of optimization issues, probably outside
> the scope of Bioperl.
>
> Elia
>
> On 17 Jun 2009, at 18:09, Chris Fields wrote:
> > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
> >> Hello,
> >> Regarding next-gen sequences and bioperl, following my
> >> experience, another issue is bioperl speed. For
> >> example, if you want to trim bad quality bases at ends
> >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
> >> methods in Bio::Seq::Quality, well, you've got to be
> >> patient (but may be I missed some shortcuts...).
> >
> > The key issues affecting speed in bioperl are contained
> > object instantiation and inheritance (and between those
> > two, the latter much more so as it plays a role with
> > contained objects as well as the container).
> >
> > http://www.bioperl.org/wiki/Why_BioPerl_is_slow
> >
> > Moose/Perl6 roles/traits are one way around that issue,
> > but we are a ways off from getting that running.  I
> > think to get that working decently would be a
> > from-ground-up endeavor (see my past posts on
> > biomoose/bioperl6).
> >
> >> A pure perl solution will be between 100 to 1000x
> >> faster... Would it be possible to have an ultra-light
> >> quality object with few simple methods for next-gen
> >> reads?
> >>
> >> I can contribute some tests if that sounds like an
> >> important point.
> >>
> >> -Tristan
> >
> > The quality objects themselves I don't think are that
> > heavy; I think the main impediment is inheritance.  One
> > could get around that a bit by using a direct_new
> > method to create a blessed hash directly, then
> > reimplement methods to lazily create any objects
> > contained on the fly.
> >
> > chris
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801


From bix at sendu.me.uk  Wed Jun 17 14:20:00 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 19:20:00 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <4A3933D0.4040808@sendu.me.uk>

Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my 
> experience, another issue is bioperl speed. For example, if 
> you want to trim bad quality bases at ends of 1E6 Solexa 
> reads using Bio::SeqIO::fastq and some methods in 
> Bio::Seq::Quality, well, you've got to be patient (but may 
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant 
set of users out there who are dealing with next-gen sequencing and 
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at 
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster... 
> Would it be possible to have an ultra-light quality object 
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the 
speedup is to not create any Bio::Seq* objects but just return the data 
directly. At that point it's not taking much advantage of BioPerl. But 
certainly it could be done...


From e.stupka at ucl.ac.uk  Wed Jun 17 14:39:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:39:08 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk>

We are using bioperl for simple pre and post-processing of data for  
full Solexa runs, and although it might not be ideal, the scripting  
with Bioperl is not a major killer. When I was referring to large,  
heavy pipelines I was thinking of pipelines that deal with many Solexa  
runs as one project (e.g. 1000 genomes) who really cannot afford any  
bottleneck in their pipelines, because that affects directly their  
storage.

cheers

Elia


On 17 Jun 2009, at 19:09, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...
>
> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan
>
> On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
>> I would suggest developing the "standard" version first,
>> then moving onto potential optimizations.
>>
>> When we went through a similar argument in Ensembl about
>> 8 years ago we ended up dropping Bio::Root completely...
>>
>> If one is truly after performance for these large
>> next-gen projects, it'd be down to pure piping, shell,
>> and worrying about location and copying of files,
>> sticking to systems-level as much as possible, and quite
>> far from Bioperl altogether, so I think it's a whole
>> different level of optimization issues, probably outside
>> the scope of Bioperl.
>>
>> Elia
>>
>> On 17 Jun 2009, at 18:09, Chris Fields wrote:
>>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my
>>>> experience, another issue is bioperl speed. For
>>>> example, if you want to trim bad quality bases at ends
>>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
>>>> methods in Bio::Seq::Quality, well, you've got to be
>>>> patient (but may be I missed some shortcuts...).
>>>
>>> The key issues affecting speed in bioperl are contained
>>> object instantiation and inheritance (and between those
>>> two, the latter much more so as it plays a role with
>>> contained objects as well as the container).
>>>
>>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>>>
>>> Moose/Perl6 roles/traits are one way around that issue,
>>> but we are a ways off from getting that running.  I
>>> think to get that working decently would be a
>>> from-ground-up endeavor (see my past posts on
>>> biomoose/bioperl6).
>>>
>>>> A pure perl solution will be between 100 to 1000x
>>>> faster... Would it be possible to have an ultra-light
>>>> quality object with few simple methods for next-gen
>>>> reads?
>>>>
>>>> I can contribute some tests if that sounds like an
>>>> important point.
>>>>
>>>> -Tristan
>>>
>>> The quality objects themselves I don't think are that
>>> heavy; I think the main impediment is inheritance.  One
>>> could get around that a bit by using a direct_new
>>> method to create a blessed hash directly, then
>>> reimplement methods to lazily create any objects
>>> contained on the fly.
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 14:40:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 13:40:05 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu>

On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...

I don't think it's impossible.  If you parse any very long list of  
sequences in order it will be very slow, yes, but if they were indexed  
or loaded into a DB lookups would of course be magnitudes faster.

We already have perl-based indexing for fastq (Bio::Index::Fastq), so  
maybe something could be built on top of that. I haven't looked but we  
can also wrap other C/C++-based parsers as well. BioLib, for instance,  
has bindings to io_lib, so maybe that could be (ab)used in some way.

> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan

As a simple benchmark, at one point all feature tag information was  
converted into Bio::Annotations.  I reverted that behavior to be  
simple tag/value again and had a pretty decent bump:

http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark

Also, I tried reimplementing some parsers as generic 'event'-based  
driver/handler and they were slightly faster, the key roadblock being  
instantation again.  If I didn't create Features/Annotations I saw a  
significant speedup.  That's not entirely unexpected, as SeqFeatures  
also contain Locations (in turn that can contain subLocations) and  
(until recently) tag-based Bio::Annotation by default.  Annotations  
are collected in an Annotation::Collection and can contain other  
objects I believe (Ontology terms, etc).

The overall lesson is, if you don't have very heavy objects being  
created the overhead is actually quite small; it's only when you  
greedily instantiate everything that you run into problems.

chris


From cjfields at illinois.edu  Wed Jun 17 15:05:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:05:03 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <E92652A7-7622-4183-8DC3-596E6593C587@illinois.edu>

On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote:

> I would suggest developing the "standard" version first, then moving  
> onto potential optimizations.

Yes, agreed.

> When we went through a similar argument in Ensembl about 8 years ago  
> we ended up dropping Bio::Root completely...

They (strangely enough) still use it in a few modules and require  
bioperl 1.2.3, but (in my experience) the latest bioperl works just  
fine.  I asked about that and never got a response.

> If one is truly after performance for these large next-gen projects,  
> it'd be down to pure piping, shell, and worrying about location and  
> copying of files, sticking to systems-level as much as possible, and  
> quite far from Bioperl altogether, so I think it's a whole different  
> level of optimization issues, probably outside the scope of Bioperl.
>
> Elia

In the end I don't think we can run it using perl alone, no, and I  
believe using BioPerl by itself will not be the optimal solution, but  
it can probably interface with something that is.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 15:14:04 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:14:04 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk>

Excellent, I was thinking of working on Maq and BowTie as priorities.

Elia

On 17 Jun 2009, at 14:28, John Marshall wrote:

> On 17 Jun 2009, at 12:29, Elia Stupka wrote:
>> Similarly, there seems to be little in bioperl-run to support tools  
>> that have been developed in this area, such as Maq, BowTie, TopHat,  
>> etc?
>
> FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to  
> submit in the not too distant future.  (First it needs some "blah  
> blah" replaced with actual documentation and a test suite.)
>
> Cheers,
>
>    John
>
> [1] http://www.ebi.ac.uk/~zerbino/velvet/
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From michael.watson at bbsrc.ac.uk  Wed Jun 17 15:15:20 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 17 Jun 2009 20:15:20 +0100
Subject: [Bioperl-l] Next-gen modules
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk>

In answer to your question, yes!  We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results.  This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java.  Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem.

 
________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala
Sent: Wed 17/06/2009 7:20 PM
To: tristan.lefebure at gmail.com
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Next-gen modules


Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 17 15:30:15 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:30:15 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>

On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> Hello,
>> Regarding next-gen sequences and bioperl, following my experience,  
>> another issue is bioperl speed. For example, if you want to trim  
>> bad quality bases at ends of 1E6 Solexa reads using  
>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>> you've got to be patient (but may be I missed some shortcuts...).
>
> This is my concern as well. Or, rather, is there actually a  
> significant set of users out there who are dealing with next-gen  
> sequencing and would consider using BioPerl for their work?
>
> I'm working with all the 1000-genomes data at the Sanger, and we at  
> least are probably never going to use BioPerl for the work.

Are you using pure perl or (gasp) something else?  ;>

Judging by the feedback there are definitely a set of users who would  
like to integrate nextgen into bioperl somehow, probably to take  
advantage of other aspects of bioperl.

>> A pure perl solution will be between 100 to 1000x faster... Would  
>> it be possible to have an ultra-light quality object with few  
>> simple methods for next-gen reads?
>
> The fastq parser itself already seems pretty fast. The way to get  
> the speedup is to not create any Bio::Seq* objects but just return  
> the data directly. At that point it's not taking much advantage of  
> BioPerl. But certainly it could be done...


I suppose the best way to assess what needs to be done is come up with  
a set of 'use cases' specifying what users want so we can design  
around them, otherwise we're shooting in the dark.

I'm personally wondering if this could be done as a sequence database,  
something similar in theme to Lincoln's SeqFeature::Store, but  
sequence only, and returns quality objects in a similar manner (ala  
Storable)?  Not sure whether that's feasible, but it's appears at  
least scalable.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 15:37:26 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:37:26 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<4C3D793879C64A5E84C67FE313C86FA4@NewLife>
Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>

Dear all,

I tried to summarize today's discussion with what seems to be the  
"shaping consensus" on the Wiki page:

http://www.bioperl.org/wiki/Nextgen_in_Bioperl

good night,

Elia


On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:

> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>  ]
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From e.stupka at ucl.ac.uk  Wed Jun 17 16:06:35 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:06:35 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>

Interesting that you mention the database issue. We found that for  
specific memory/CPU intenstive things we also switch to using dbs. For  
example, after many years of loyal use of disconnected_ranges we  
switched to a simple SQL implementation of it, because of the large  
performance gains it would give us.  Similarly in Ensembl as well as  
in the old days of bioperl-db we opted for doing subseq within SQL  
where possible.

Some lean way of SQL'izing specific components could be less  
"disruptive" than avoiding object creation and provide significant  
gains in performance. Could be set as an optional flag, and could use  
temporary ad hoc SQL databases?

Still, priority now is to make SeqIO compliant with all those formats,  
than we can worry about performance :)

Elia

On 17 Jun 2009, at 20:30, Chris Fields wrote:

> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience,  
>>> another issue is bioperl speed. For example, if you want to trim  
>>> bad quality bases at ends of 1E6 Solexa reads using  
>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>> you've got to be patient (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a  
>> significant set of users out there who are dealing with next-gen  
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at  
>> least are probably never going to use BioPerl for the work.
>
> Are you using pure perl or (gasp) something else?  ;>
>
> Judging by the feedback there are definitely a set of users who  
> would like to integrate nextgen into bioperl somehow, probably to  
> take advantage of other aspects of bioperl.
>
>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>> it be possible to have an ultra-light quality object with few  
>>> simple methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get  
>> the speedup is to not create any Bio::Seq* objects but just return  
>> the data directly. At that point it's not taking much advantage of  
>> BioPerl. But certainly it could be done...
>
>
> I suppose the best way to assess what needs to be done is come up  
> with a set of 'use cases' specifying what users want so we can  
> design around them, otherwise we're shooting in the dark.
>
> I'm personally wondering if this could be done as a sequence  
> database, something similar in theme to Lincoln's SeqFeature::Store,  
> but sequence only, and returns quality objects in a similar manner  
> (ala Storable)?  Not sure whether that's feasible, but it's appears  
> at least scalable.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:29:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:29:31 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife>
	<540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife>

Thanks Elia for those wiki notes--
[I would say you received an enthusiatic 'welcome back'!]
cheers, 
Mark
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 3:37 PM
Subject: Re: [Bioperl-l] Next-gen modules


> Dear all,
> 
> I tried to summarize today's discussion with what seems to be the  
> "shaping consensus" on the Wiki page:
> 
> http://www.bioperl.org/wiki/Nextgen_in_Bioperl
> 
> good night,
> 
> Elia
> 
> 
> On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:
> 
>> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>>  ]
>> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 17, 2009 7:29 AM
>> Subject: [Bioperl-l] Next-gen modules
>>
>>
>>> Dear all,
>>> after several years of absence I am slowly coming back to Bioperl,  
>>> and  hope to contribute again to its development.
>>> One area that I was thinking of starting from, since we are  
>>> actively  involved with it, is to improve BIoperl's support fo next- 
>>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>>> on a  lot of recent developments, do let me know if/what is useful.
>>> One example that comes to mind is that the conversion of various   
>>> formats to/from FASTQ does not seem to be supported. Some code can  
>>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>>> fq_all2std.pl but it would be good if it could make its way into   
>>> SeqIO? And similarly, potentially, for other next-gen sequence  
>>> formats?
>>> Similarly, there seems to be little in bioperl-run to support  
>>> tools  that have been developed in this area, such as Maq, BowTie,  
>>> TopHat, etc?
>>> Do let me know if there is a past thread on this, or other people   
>>> actively developing, etc. so that I can find out what priorities are.
>>> thanks and best regards to all (old friends and new),
>>> Elia
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 16:35:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 15:35:38 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>

So, #1 priority is to get fastq up-to-speed, then maybe assess other  
options.

Illuminating discussion, thanks Elia!

urgh, excuse unintended bad pun above...

chris

On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Interesting that you mention the database issue. We found that for  
> specific memory/CPU intenstive things we also switch to using dbs.  
> For example, after many years of loyal use of disconnected_ranges we  
> switched to a simple SQL implementation of it, because of the large  
> performance gains it would give us.  Similarly in Ensembl as well as  
> in the old days of bioperl-db we opted for doing subseq within SQL  
> where possible.
>
> Some lean way of SQL'izing specific components could be less  
> "disruptive" than avoiding object creation and provide significant  
> gains in performance. Could be set as an optional flag, and could  
> use temporary ad hoc SQL databases?
>
> Still, priority now is to make SeqIO compliant with all those  
> formats, than we can worry about performance :)
>
> Elia
>
> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>>
>> Are you using pure perl or (gasp) something else?  ;>
>>
>> Judging by the feedback there are definitely a set of users who  
>> would like to integrate nextgen into bioperl somehow, probably to  
>> take advantage of other aspects of bioperl.
>>
>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>>
>>
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>>
>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 16:36:31 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:36:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>

Better than colorspaced discussions for sure ;)

Elia

On 17 Jun 2009, at 21:35, Chris Fields wrote:

> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
>
> Illuminating discussion, thanks Elia!
>
> urgh, excuse unintended bad pun above...
>
> chris
>
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges  
>> we switched to a simple SQL implementation of it, because of the  
>> large performance gains it would give us.  Similarly in Ensembl as  
>> well as in the old days of bioperl-db we opted for doing subseq  
>> within SQL where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>> Would it be possible to have an ultra-light quality object with  
>>>>> few simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just  
>>>> return the data directly. At that point it's not taking much  
>>>> advantage of BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 16:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:54:00 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife>

unintended! Does that mean your delete key's broke...?
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Elia Stupka" <e.stupka at ucl.ac.uk>
Cc: <bioperl-l at lists.open-bio.org>; <tristan.lefebure at gmail.com>
Sent: Wednesday, June 17, 2009 4:35 PM
Subject: Re: [Bioperl-l] Next-gen modules


> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
> 
> Illuminating discussion, thanks Elia!
> 
> urgh, excuse unintended bad pun above...
> 
> chris
> 
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
> 
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges we  
>> switched to a simple SQL implementation of it, because of the large  
>> performance gains it would give us.  Similarly in Ensembl as well as  
>> in the old days of bioperl-db we opted for doing subseq within SQL  
>> where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>>> it be possible to have an ultra-light quality object with few  
>>>>> simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just return  
>>>> the data directly. At that point it's not taking much advantage of  
>>>> BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hartzell at alerce.com  Wed Jun 17 16:40:03 2009
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 17 Jun 2009 13:40:03 -0700
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com>

Sendu Bala writes:
 > Tristan Lefebure wrote:
 > > Hello,
 > > Regarding next-gen sequences and bioperl, following my 
 > > experience, another issue is bioperl speed. For example, if 
 > > you want to trim bad quality bases at ends of 1E6 Solexa 
 > > reads using Bio::SeqIO::fastq and some methods in 
 > > Bio::Seq::Quality, well, you've got to be patient (but may 
 > > be I missed some shortcuts...).
 > 
 > This is my concern as well. Or, rather, is there actually a significant 
 > set of users out there who are dealing with next-gen sequencing and 
 > would consider using BioPerl for their work?
 > 
 > I'm working with all the 1000-genomes data at the Sanger, and we at 
 > least are probably never going to use BioPerl for the work.
 > [...]

Is it purely a speed issue, or are there other issues (e.g. stability,
correctness, compatibility) that are contributing to your decision?

What *are* you using?

g.


From bix at sendu.me.uk  Wed Jun 17 18:10:57 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:10:57 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <4A3969F1.8080002@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience, 
>>> another issue is bioperl speed. For example, if you want to trim bad 
>>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and 
>>> some methods in Bio::Seq::Quality, well, you've got to be patient 
>>> (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a 
>> significant set of users out there who are dealing with next-gen 
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at 
>> least are probably never going to use BioPerl for the work.
> 
> Are you using pure perl or (gasp) something else?  ;>

We use some perl stuff, some C stuff. My own stuff is OO perl, but much 
lighter weight than BioPerl. Absolute minimal object creation.


>>> A pure perl solution will be between 100 to 1000x faster... Would it 
>>> be possible to have an ultra-light quality object with few simple 
>>> methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get the 
>> speedup is to not create any Bio::Seq* objects but just return the 
>> data directly. At that point it's not taking much advantage of 
>> BioPerl. But certainly it could be done...
> 
> I suppose the best way to assess what needs to be done is come up with a 
> set of 'use cases' specifying what users want so we can design around 
> them, otherwise we're shooting in the dark.

Indeed. Though at least I think we can all agree it would be nice to 
have the functionality there even if it's slow. There will always be at 
least some use-cases where the run speed doesn't matter.


> I'm personally wondering if this could be done as a sequence database, 
> something similar in theme to Lincoln's SeqFeature::Store, but sequence 
> only, and returns quality objects in a similar manner (ala Storable)?  
> Not sure whether that's feasible, but it's appears at least scalable.

I think not. Well, at least SeqFeature::Store doesn't scale. Try storing 
millions of features in a database and watch it crawl to complete 
unusability. I can't imagine a db scaling to holding hundreds of TB of 
data either. I'm also not sure what the benefit is. There are already 
high-speed ways of indexing your fastq or bam files.


From bix at sendu.me.uk  Wed Jun 17 18:24:50 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:24:50 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
Message-ID: <4A396D32.5070909@sendu.me.uk>

George Hartzell wrote:
> Sendu Bala writes:
>  > Tristan Lefebure wrote:
>  > > Hello,
>  > > Regarding next-gen sequences and bioperl, following my 
>  > > experience, another issue is bioperl speed. For example, if 
>  > > you want to trim bad quality bases at ends of 1E6 Solexa 
>  > > reads using Bio::SeqIO::fastq and some methods in 
>  > > Bio::Seq::Quality, well, you've got to be patient (but may 
>  > > be I missed some shortcuts...).
>  > 
>  > This is my concern as well. Or, rather, is there actually a significant 
>  > set of users out there who are dealing with next-gen sequencing and 
>  > would consider using BioPerl for their work?
>  > 
>  > I'm working with all the 1000-genomes data at the Sanger, and we at 
>  > least are probably never going to use BioPerl for the work.
>  > [...]
> 
> Is it purely a speed issue, or are there other issues (e.g. stability,
> correctness, compatibility) that are contributing to your decision?

Too heavy-weight, too slow, too memory intensive, missing too much 
functionality in any case. If I have to write new parsers and wrappers, 
I may as well make them fast (which means they don't "fit" into BioPerl).


> What *are* you using?

There are already great tools written in C that do all the heavy lifting 
and the rest is done in perl written for speed and low memory.


From cjfields at illinois.edu  Wed Jun 17 18:38:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 17:38:26 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3969F1.8080002@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>

On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>> Are you using pure perl or (gasp) something else?  ;>
>
> We use some perl stuff, some C stuff. My own stuff is OO perl, but  
> much lighter weight than BioPerl. Absolute minimal object creation.

Makes sense.

>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>
> Indeed. Though at least I think we can all agree it would be nice to  
> have the functionality there even if it's slow. There will always be  
> at least some use-cases where the run speed doesn't matter.

Agreed.

>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>
> I think not. Well, at least SeqFeature::Store doesn't scale. Try  
> storing millions of features in a database and watch it crawl to  
> complete unusability. I can't imagine a db scaling to holding  
> hundreds of TB of data either. I'm also not sure what the benefit  
> is. There are already high-speed ways of indexing your fastq or bam  
> files.

Interesting that you ran into issues with SF::Store; wonder if object  
storage is the limiting factor there, or if it is something else.  
Anyone else having this issue?

chris


From cjfields at illinois.edu  Wed Jun 17 21:08:55 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 20:08:55 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A396D32.5070909@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
	<4A396D32.5070909@sendu.me.uk>
Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu>

On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris


From jbarrick at msu.edu  Wed Jun 17 23:10:43 2009
From: jbarrick at msu.edu (Jeffrey Barrick)
Date: Wed, 17 Jun 2009 23:10:43 -0400
Subject: [Bioperl-l] svn error
Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu>

Hi all,

I've been trying to download the latest version of "bioperl-live"  
through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion 
] and I keep getting an "svn: Found malformed header in revision file"  
error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to  
stop prematurely.

I also get the error when trying to browse that directory, for example:
http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB

Any ideas?

Thanks,
   --Jeff


From hlapp at gmx.net  Wed Jun 17 21:51:16 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 17 Jun 2009 20:51:16 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <C8873056-793B-4FEE-94EE-3341087478D1@gmx.net>


On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Similarly in Ensembl as well as in the old days of bioperl-db we  
> opted for doing subseq within SQL where possible.


BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL,  
unless you manipulate the sequence, or make it a non-persistent object.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Jun 18 02:45:17 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 18 Jun 2009 07:45:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
	<550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
Message-ID: <4A39E27D.9040807@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:
 >
>>> I'm personally wondering if this could be done as a sequence 
>>> database, something similar in theme to Lincoln's SeqFeature::Store, 
>>> but sequence only, and returns quality objects in a similar manner 
>>> (ala Storable)?  Not sure whether that's feasible, but it's appears 
>>> at least scalable.
>>
>> I think not. Well, at least SeqFeature::Store doesn't scale. Try 
>> storing millions of features in a database and watch it crawl to 
>> complete unusability. I can't imagine a db scaling to holding hundreds 
>> of TB of data either. I'm also not sure what the benefit is. There are 
>> already high-speed ways of indexing your fastq or bam files.
> 
> Interesting that you ran into issues with SF::Store; wonder if object 
> storage is the limiting factor there, or if it is something else.

Object storage certainly was an issue, which is why I patched it to 
(optionally) not store objects. That helped a great deal, but ultimately 
only increased the number of features you could store before it slowed 
down; it didn't solve the problem completely.


From Xianjun.Dong at bccs.uib.no  Thu Jun 18 06:15:47 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Thu, 18 Jun 2009 12:15:47 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4A33D850.1020203@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no>
Message-ID: <4A3A13D3.7050208@ii.uib.no>

Hi, Scott,

Do you mind to have a look of the code (below my signature) if I use the 
-postgrid callback correctly?
I still cannnot get the background for the whole panel.

Thanks

Xianjun


Xianjun Dong wrote:
> Hi, Scott
>
> Before I gave up my own whole solution to use GBrowse, I still want to 
> bother you once:
>
> As you suggested, I put -postgrid option when the panel, which will 
> call a function to draw the background. The code below is almost 
> copied from the online POD of Bio::Graphics::Panel (see 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
> )
>
> But it still does not work. Could you help to have a look? I paste it 
> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while 
> the gap drawing function is gap_it, not draw_gap. I guess it's a typo. 
> or not?)
>
> THanks
>
> Xianjun
>
> ----------------------------------------------- mytestcode.pl 
> --------------------------
>
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 = 
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = 
> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = 
> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans4 = 
> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans5 = 
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans  = 
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 = 
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
> -source=>'a');
> my $trans41 = 
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>                                             -length=>1050,
>                                             -start =>0,
>                                             -pad_left=>12,
>                                             -pad_right=>12
>                                             -postgrid=>\&gap_it);
>
> sub gap_it {
>     my $gd    = shift;
>     my $panel = shift;
>     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>     my $top                  = $panel->top;
>     my $bottom               = $gd->height, #panel->bottom;
>     my $gray                 = $panel->translate_color('red');
>     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
> }
> # the following track works as I expected in bioperl 1.2.3, but not in 
> 1.5 and 1.6
> #$panel->add_track([$trans41,$trans31],
> #          -glyph   => 'background',
> #                  -block_bgcolor => sub{return (shift->source eq 
> 'a')?'#cccccc':'#fffc22'},
> #                  );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>                  -glyph=>'arrow',
>                  -double=>1,
>                  -tick=>2);
>
> $panel->add_track($trans,
>          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>                  -fgcolor => 'darkred',
>                  -bgcolor => 'darkred',
>                  -title => '$source',
>                  -link => 
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
> #EnsEMBL
>                  );
>   print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in 
> Bioperl 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
>
>
>
>
>
>
>
>
>
> Scott Cain wrote:
>> Hi Xianjun,
>>
>> I understand what you want to do, as the current version of gbrowse
>> does this, which uses bioperl 1.6.  Without digging through the code,
>> I can't tell you exactly how this works and you didn't send your code
>> that uses this callback, so I can't try it either.
>>
>> One thing that is different between your code and gbrowse is that each
>> of the tracks is actually a seperate panel (to allow track dragging),
>> so it possible that this sort of callback doesn't work for
>> Bio::Graphics any more.
>>
>> Scott
>>
>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> 
>> wrote:
>>  
>>> Hi, Scott
>>>
>>> Thanks for your reply first.
>>>
>>> I still have question: I dig out the code from GBrowse (which I 
>>> paste below). Method make_postgrid_callback gets all highlight 
>>> region and then use hilite_regions_closure function to draw them 
>>> out, using the following GD function:
>>>
>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>
>>> where the $bottom=$panel->bottom. This is the only difference from 
>>> my code, where I use $gd->height. I guess they are almost same 
>>> (except the pad_bottom), we can see this in the code of 
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 
>>>
>>>
>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, 
>>> for my highlight regions. The output is same, when using the library 
>>> of Bioperl 1.6 (or 1.5). You can see the attached image 
>>> ("test.bioperl1.6.png")
>>>
>>> OK. I might have not explained my question explicitly. My question 
>>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 
>>> 1.2.3), I can get the right image I want (see the attached file 
>>> "test.bioperl1.2.3.png"), where the highlight range will go from the 
>>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
>>> highlight region in its own track, not the whole panel. OK, did I 
>>> explain clearly now? you can see the difference of the two images.
>>>
>>> [I am not sure the mailist allow to attach image, otherwise, I put 
>>> them in the following links:
>>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>>> test.bioperl1.2.3.png:    
>>> http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>
>>> You can test it and see the difference if you have both 1.2.3 and 
>>> 1.6 on your computer?
>>>
>>> Really want to know how this works in bioperl 1.2.3 (Even though 
>>> this might be a bug at that version, or whatever)
>>>
>>> Thanks
>>>
>>> Xianjun
>>> =============================================
>>>
>>> # this generates the callback for highlighting a region
>>> sub make_postgrid_callback {
>>>  my $settings = shift;
>>>  return unless ref $settings->{h_region};
>>>
>>>  my @h_regions = map {
>>>    my ($h_ref,$h_start,$h_end,$h_color) = 
>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>>                 : ()
>>>  }
>>>    @{$settings->{h_region}};
>>>
>>>  return unless @h_regions;
>>>  return hilite_regions_closure(@h_regions);
>>> }
>>>
>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>> # suitable for hilighting a region of a panel.
>>> # The args are a list of [start,end,color]
>>> sub hilite_regions_closure {
>>>  my @h_regions = @_;
>>>
>>>  return sub {
>>>    my $gd     = shift;
>>>    my $panel  = shift;
>>>    my $left   = $panel->pad_left;
>>>    my $top    = $panel->top;
>>>    my $bottom = $panel->bottom;
>>>    for my $r (@h_regions) {
>>>      my ($h_start,$h_end,$h_color) = @$r;
>>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always 
>>> see something
>>>      # assuming top is 0 so as to ignore top padding
>>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>    }
>>>  };
>>> }
>>>
>>>
>>> Scott Cain wrote:
>>>
>>> Hello Xianjun,
>>>
>>> I don't think that approach will work.  What you almost certainly need
>>> to do is a postgrid callback that does the drawing of the highlighted
>>> region.  For example code of how to do this, take a look at the
>>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>>> -postgrid is a method of Bio::Graphics::Panel.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun 
>>> Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>>
>>>
>>> HI,
>>>
>>> I am not sure this is the right place I can get help.
>>>
>>> I've suffered by a problem for several days: I want to highlight 
>>> parts of
>>> regions in my track, using a different background color. To do that, I
>>> defined a glyph named "background", based on the
>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>> method, by adding code like below:
>>>
>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>>
>>> # the script is pasted at the end
>>>
>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>> highlight regions into a list of features, and add_track with
>>> -glyph=>'background'. (see the following script, test.pl) This 
>>> really works
>>> as I expect, which will add a colored block at background of all 
>>> tracks in a
>>> panel (including the ruler arrow). You can see the output image in 
>>> attached
>>> file "test.bioperl1.2.3.png"
>>>
>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it 
>>> does not
>>> work. Well, it works, but the highlight part only shrink to a low 
>>> height,
>>> instead of covering all tracks in the panel. I also attached the output
>>> here, see the file "test.bioperl1.6.png".
>>>
>>> I tried to think about the reason, the 'background' module is based 
>>> on the
>>> generic module. What can cause the difference? Is it because 
>>> $gd->height is
>>> different, or the tracks followed with 'background' track can not 
>>> draw from
>>> the first position?
>>>
>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
>>> person
>>> solve problem, wise person avoid problem"...) But another problem is 
>>> coming:
>>> Bio::Graphics in Bioperl 1.2.3 does not support 
>>> $panel->create_web_map()
>>> function, which means I have to use some higher version if I want to 
>>> create
>>> web map for my graphics, but then I have to give up using highlight
>>> background.
>>>
>>> OK. It's long enough for my first-time submission here. Hope someone 
>>> can
>>> throw me some clue.
>>>
>>> Thanks ahead!!
>>>
>>> Xianjun
>>>
>>>
>>> ==================== test.pl =======================
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 = 
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 = 
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 = 
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans  =
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
>>>
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
>>>
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>                                            -length=>1050,
>>>                                            -start =>0,
>>>                                            -pad_left=>12,
>>>                                            -pad_right=>12);
>>>
>>> # the following track works as I expected in bioperl 1.2.3, but not 
>>> in 1.5
>>> and 1.6
>>> $panel->add_track([$trans41,$trans31],
>>>         -glyph   => 'background',
>>>                 -block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>>                 );
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>                 -glyph=>'arrow',
>>>                 -double=>1,
>>>                 -tick=>2);
>>>
>>> $panel->add_track($trans,
>>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>>                 -fgcolor => 'darkred',
>>>                 -bgcolor => 'darkred',
>>>                 -title => '$source',
>>>                 -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
>>> #EnsEMBL
>>>                 );
>>>  print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in 
>>> Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>> 1;
>>>
>>> ==================== background.pm =======================
>>> package Bio::Graphics::Glyph::background;
>>>
>>> use strict;
>>> use base 'Bio::Graphics::Glyph::generic';
>>> sub pad_top{
>>>  return 0;
>>> }
>>>
>>> sub draw_component {
>>>  my $self = shift;
>>>  #$self->SUPER::draw_component(@_);
>>>  my ($gd,$dx,$dy) = @_;
>>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>
>>>  # draw an arrow to indicate the direction of transcript
>>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>>  $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>> }
>>>
>>> 1;
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>>     
>>
>>   
>

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From charles.tilford at bms.com  Thu Jun 18 09:38:34 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 09:38:34 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
Message-ID: <4A3A435A.8000505@bms.com>

Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. 
Can anyone confirm?

Hi all,

I'm using the SCF Bio::SeqIO module to parse trace data out of 
chromatograms. The SCF files are being produced by phred using the "-cd" 
parameter. The traces come out great, and the corresponding base calls 
from the .phd files align with the peaks wonderfully when I visualize 
them on a rendered trace. However, only the A bases align to the 
appropriate trace channel, the rest are mixed up. I find that if I do 
the following re-mapping, the phred base calls match the

SeqIO : Remapped
A : A
C : G
G : T
T : C

The relevant part of Bio::SeqIO::scf is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9

... which indicates that it expects the pack()ed trace data to be in 
order ATGC. The base call parsing code is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8

... which is unpacking in order ACGT. As far as I can tell, the relevant 
official SCF documentation is here:

http://staden.sourceforge.net/manual/formats_unix_4.html

... which indicates that both trace and base order should be ACGT 
(matching the SeqIO unpack() for bases, but not traces). My empirical 
channel unscrambling mapping implies order ACTG, which is different from 
either of the two orders above. The sequence from the SCF file (should 
be that from original AB1 file, I think) is not perfectly identical to 
that called by phred, but is very similar (to be expected); that is, I 
don't need to remap C, G and T to get it to align with the phred data.

So it looks like the SeqIO module is not mapping the sections of the 
packed trace data to the appropriate bases. The unpack order is 
different than the staden documentation ... but so is the order I impose 
to correct the problem. I am still unclear as to the differences between 
V2 and V3 of the format. The major difference appears to be coding the 
trace absolutely (V2) or relatively to prior values (V3); I'd expect if 
I was using one format and SeqIO was trying to parse the other that I 
would get garbage out. Running in verbose reports "scf.pm is working 
with a version 2 scf."

Thoughts on this would be appreciated - can anyone confirm a problem 
with trace extraction from SCF?

I'm hoping that once I convince our admin to (properly) install 
staden::read that I can work directly with the ab1 files, but I need to 
stopgap on SCF for the time being....

-CAT


From cjfields at illinois.edu  Thu Jun 18 11:31:08 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 10:31:08 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>

Charles,

The best way to make sure this is addressed is to file a ticket (bug  
report) on it so we can properly track it.  I have a local  
installation of io_lib and I believe we also have Geneious installed  
locally (both of which read SCF), so I can work on confirming that.   
If it stays on the list it may not get answered and a possible bug  
report will be lost (to possibly bite someone else later).

AFAIK this module doesn't use staden::read but is pure perl.  You are  
more than welcome to try out Bio::SeqIO::staden::read, but I have to  
warn you that most of us are looking at replacing it's functionality  
at some point with BioLib bindings to io_lib (more stable) and so we  
don't intend on following up with bug fixes.

Note: there is also Bio::SCF (non-bp):

http://search.cpan.org/~lds/Bio-SCF-1.01/

chris

On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:

> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of  
> chromatograms. The SCF files are being produced by phred using the "- 
> cd" parameter. The traces come out great, and the corresponding base  
> calls from the .phd files align with the peaks wonderfully when I  
> visualize them on a rendered trace. However, only the A bases align  
> to the appropriate trace channel, the rest are mixed up. I find that  
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in  
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the  
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT  
> (matching the SeqIO unpack() for bases, but not traces). My  
> empirical channel unscrambling mapping implies order ACTG, which is  
> different from either of the two orders above. The sequence from the  
> SCF file (should be that from original AB1 file, I think) is not  
> perfectly identical to that called by phred, but is very similar (to  
> be expected); that is, I don't need to remap C, G and T to get it to  
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the  
> packed trace data to the appropriate bases. The unpack order is  
> different than the staden documentation ... but so is the order I  
> impose to correct the problem. I am still unclear as to the  
> differences between V2 and V3 of the format. The major difference  
> appears to be coding the trace absolutely (V2) or relatively to  
> prior values (V3); I'd expect if I was using one format and SeqIO  
> was trying to parse the other that I would get garbage out. Running  
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem  
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install  
> staden::read that I can work directly with the ab1 files, but I need  
> to stopgap on SCF for the time being....
>
> -CAT


From MEC at stowers.org  Thu Jun 18 11:42:48 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Thu, 18 Jun 2009 10:42:48 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>

Charles,

Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF

	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm

It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.

Its not in the bioperl project but it is an easy install from CPAN.

I am familiar with staden::read installation woes.  

Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
  

#!/usr/bin/env perl

# PURPOSE: extract from AB1 files into fasta format the sequence in
# the 'clear range' defined by 3 parameters.  If there is no clear
# range, emit warning and skip the sequence.  The fasta 'defline'
# identifier is taken as the sample name.  Other useful attributes are
# also embedded into the defline using attribute=value syntax.

# USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1

# NOTE: 20 4 20 is ABI default settings

# EXAMPLE:
# ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta

# AUTHOR: malcolm_cook at stowers-institute.org

use strict;
use warnings;
use Bio::Trace::ABIF;
use Text::Wrap qw(wrap);
$Text::Wrap::columns = 72;	# wrap the sequence

use File::Basename;
my ($window_width,
    $bad_bases_threshold,
    $quality_threshold,
    @ARGV) = @ARGV;

my $abif = Bio::Trace::ABIF->new();

sub main {} {
  foreach (@ARGV) {
    $abif->open_abif($_) or die "error opening $_ as ABIF";
    my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
								   $bad_bases_threshold,
								   $quality_threshold
								  );
    my $sample_score = $abif->sample_score(
					   $window_width,
					   $bad_bases_threshold,
					   $quality_threshold
					  );
    #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
    #							       $quality_threshold,
    #							       0, # ==> trim_ends
    #							      );
    #    my $length_of_read = $abif->length_of_read(
    #				    $window_width,
    #				    $quality_threshold,
    #				    # $method
    #				   );
    my $defline = 
      join "\t", 
	$abif->sample_name,
	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
	  (map {my $method = $_;
		"$method=". ($abif->$method() || '')}
	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
	     # sample_tracking_id - don't use this - it is internal to ABI software
	     "clear_range_start=$clear_range_start",
	       "clear_range_stop=$clear_range_stop",
		 "sample_score=$sample_score",
		   #"contiguous_read_length=$contiguous_read_length",
		   #"length_of_read=$length_of_read",
		   ;
    if ($clear_range_start == -1) {
      warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
      next;
    }
    my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
    print ">$defline\n$seq\n";
    $abif->close_abif();

  }
}

main ();


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Charles Tilford
> Sent: Thursday, June 18, 2009 8:39 AM
> To: BioPerl List
> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
> 
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
> channels. 
> Can anyone confirm?
> 
> Hi all,
> 
> I'm using the SCF Bio::SeqIO module to parse trace data out 
> of chromatograms. The SCF files are being produced by phred 
> using the "-cd" 
> parameter. The traces come out great, and the corresponding 
> base calls from the .phd files align with the peaks 
> wonderfully when I visualize them on a rendered trace. 
> However, only the A bases align to the appropriate trace 
> channel, the rest are mixed up. I find that if I do the 
> following re-mapping, the phred base calls match the
> 
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
> 
> The relevant part of Bio::SeqIO::scf is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE9
> 
> ... which indicates that it expects the pack()ed trace data 
> to be in order ATGC. The base call parsing code is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE8
> 
> ... which is unpacking in order ACGT. As far as I can tell, 
> the relevant official SCF documentation is here:
> 
> http://staden.sourceforge.net/manual/formats_unix_4.html
> 
> ... which indicates that both trace and base order should be 
> ACGT (matching the SeqIO unpack() for bases, but not traces). 
> My empirical channel unscrambling mapping implies order ACTG, 
> which is different from either of the two orders above. The 
> sequence from the SCF file (should be that from original AB1 
> file, I think) is not perfectly identical to that called by 
> phred, but is very similar (to be expected); that is, I don't 
> need to remap C, G and T to get it to align with the phred data.
> 
> So it looks like the SeqIO module is not mapping the sections 
> of the packed trace data to the appropriate bases. The unpack 
> order is different than the staden documentation ... but so 
> is the order I impose to correct the problem. I am still 
> unclear as to the differences between
> V2 and V3 of the format. The major difference appears to be 
> coding the trace absolutely (V2) or relatively to prior 
> values (V3); I'd expect if I was using one format and SeqIO 
> was trying to parse the other that I would get garbage out. 
> Running in verbose reports "scf.pm is working with a version 2 scf."
> 
> Thoughts on this would be appreciated - can anyone confirm a 
> problem with trace extraction from SCF?
> 
> I'm hoping that once I convince our admin to (properly) 
> install staden::read that I can work directly with the ab1 
> files, but I need to stopgap on SCF for the time being....
> 
> -CAT
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From carze at som.umaryland.edu  Thu Jun 18 13:51:43 2009
From: carze at som.umaryland.edu (Cesar Arze)
Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT)
Subject: [Bioperl-l]  Problems parsing scientific name from a Genbank file
Message-ID: <24095355.post@talk.nabble.com>


Hi all,
   I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
addressed.

When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
NC_005945:

  ORGANISM  Bacillus anthracis str. Sterne
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus
            cereus group.

Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.

Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
-- 
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From charles.tilford at bms.com  Thu Jun 18 15:59:01 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 15:59:01 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
References: <4A3A435A.8000505@bms.com>
	<49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
Message-ID: <4A3A9C85.4000603@bms.com>

Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug  
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl. 
Yes, that's my understanding too. I'm using the SeqIO module because of 
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>   
I have that installed, but have not tried it out yet.

Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>   
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of  
>> chromatograms. The SCF files are being produced by phred using the "- 
>> cd" parameter. The traces come out great, and the corresponding base  
>> calls from the .phd files align with the peaks wonderfully when I  
>> visualize them on a rendered trace. However, only the A bases align  
>> to the appropriate trace channel, the rest are mixed up. I find that  
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in  
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the  
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT  
>> (matching the SeqIO unpack() for bases, but not traces). My  
>> empirical channel unscrambling mapping implies order ACTG, which is  
>> different from either of the two orders above. The sequence from the  
>> SCF file (should be that from original AB1 file, I think) is not  
>> perfectly identical to that called by phred, but is very similar (to  
>> be expected); that is, I don't need to remap C, G and T to get it to  
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the  
>> packed trace data to the appropriate bases. The unpack order is  
>> different than the staden documentation ... but so is the order I  
>> impose to correct the problem. I am still unclear as to the  
>> differences between V2 and V3 of the format. The major difference  
>> appears to be coding the trace absolutely (V2) or relatively to  
>> prior values (V3); I'd expect if I was using one format and SeqIO  
>> was trying to parse the other that I would get garbage out. Running  
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem  
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install  
>> staden::read that I can work directly with the ab1 files, but I need  
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>     
>
>
>
>   


From charles.tilford at bms.com  Thu Jun 18 16:02:53 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 16:02:53 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
Message-ID: <4A3A9D6D.2010106@bms.com>

Cook, Malcolm wrote:
> Charles,
>
> Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF
>
> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>
> It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.
>
> Its not in the bioperl project but it is an easy install from CPAN.
>   
Thanks - we installed that a few weeks ago, and it was on my list of 
things to try, but I had not gotten to it yet since I was getting data 
out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, 
the fact that I need to unscramble it makes me nervous... Thanks, too, 
for the example code. I'll try out the Bio::Trace::ABIF module and see 
if it works with our files.

Thanks,
CAT
> I am familiar with staden::read installation woes.  
>
> Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
> #!/usr/bin/env perl
>
> # PURPOSE: extract from AB1 files into fasta format the sequence in
> # the 'clear range' defined by 3 parameters.  If there is no clear
> # range, emit warning and skip the sequence.  The fasta 'defline'
> # identifier is taken as the sample name.  Other useful attributes are
> # also embedded into the defline using attribute=value syntax.
>
> # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1
>
> # NOTE: 20 4 20 is ABI default settings
>
> # EXAMPLE:
> # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta
>
> # AUTHOR: malcolm_cook at stowers-institute.org
>
> use strict;
> use warnings;
> use Bio::Trace::ABIF;
> use Text::Wrap qw(wrap);
> $Text::Wrap::columns = 72;	# wrap the sequence
>
> use File::Basename;
> my ($window_width,
>     $bad_bases_threshold,
>     $quality_threshold,
>     @ARGV) = @ARGV;
>
> my $abif = Bio::Trace::ABIF->new();
>
> sub main {} {
>   foreach (@ARGV) {
>     $abif->open_abif($_) or die "error opening $_ as ABIF";
>     my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
> 								   $bad_bases_threshold,
> 								   $quality_threshold
> 								  );
>     my $sample_score = $abif->sample_score(
> 					   $window_width,
> 					   $bad_bases_threshold,
> 					   $quality_threshold
> 					  );
>     #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
>     #							       $quality_threshold,
>     #							       0, # ==> trim_ends
>     #							      );
>     #    my $length_of_read = $abif->length_of_read(
>     #				    $window_width,
>     #				    $quality_threshold,
>     #				    # $method
>     #				   );
>     my $defline = 
>       join "\t", 
> 	$abif->sample_name,
> 	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
> 	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
> 	  (map {my $method = $_;
> 		"$method=". ($abif->$method() || '')}
> 	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
> 	     # sample_tracking_id - don't use this - it is internal to ABI software
> 	     "clear_range_start=$clear_range_start",
> 	       "clear_range_stop=$clear_range_stop",
> 		 "sample_score=$sample_score",
> 		   #"contiguous_read_length=$contiguous_read_length",
> 		   #"length_of_read=$length_of_read",
> 		   ;
>     if ($clear_range_start == -1) {
>       warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
>       next;
>     }
>     my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
>     print ">$defline\n$seq\n";
>     $abif->close_abif();
>
>   }
> }
>
> main ();
>
>
>
>
>
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Charles Tilford
>> Sent: Thursday, June 18, 2009 8:39 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
>>
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
>> channels. 
>> Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out 
>> of chromatograms. The SCF files are being produced by phred 
>> using the "-cd" 
>> parameter. The traces come out great, and the corresponding 
>> base calls from the .phd files align with the peaks 
>> wonderfully when I visualize them on a rendered trace. 
>> However, only the A bases align to the appropriate trace 
>> channel, the rest are mixed up. I find that if I do the 
>> following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data 
>> to be in order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, 
>> the relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be 
>> ACGT (matching the SeqIO unpack() for bases, but not traces). 
>> My empirical channel unscrambling mapping implies order ACTG, 
>> which is different from either of the two orders above. The 
>> sequence from the SCF file (should be that from original AB1 
>> file, I think) is not perfectly identical to that called by 
>> phred, but is very similar (to be expected); that is, I don't 
>> need to remap C, G and T to get it to align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections 
>> of the packed trace data to the appropriate bases. The unpack 
>> order is different than the staden documentation ... but so 
>> is the order I impose to correct the problem. I am still 
>> unclear as to the differences between
>> V2 and V3 of the format. The major difference appears to be 
>> coding the trace absolutely (V2) or relatively to prior 
>> values (V3); I'd expect if I was using one format and SeqIO 
>> was trying to parse the other that I would get garbage out. 
>> Running in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a 
>> problem with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) 
>> install staden::read that I can work directly with the ab1 
>> files, but I need to stopgap on SCF for the time being....
>>
>> -CAT
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     


From cjfields at illinois.edu  Thu Jun 18 16:27:02 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 15:27:02 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A9D6D.2010106@bms.com>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
	<4A3A9D6D.2010106@bms.com>
Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu>


On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote:

> Cook, Malcolm wrote:
>> Charles,
>>
>> Another possible stopgap that might work for you, if you're working  
>> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to  
>> use Bio::Trace::ABIF
>>
>> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>>
>> It works great and includes implementation of ABIs algorithm  
>> allowing to (re)compute trace clear ranges using kc-basecallers  
>> quality scores and any windowing/quality parameters.
>>
>> Its not in the bioperl project but it is an easy install from CPAN.
>>
> Thanks - we installed that a few weeks ago, and it was on my list of  
> things to try, but I had not gotten to it yet since I was getting  
> data out of the SCF SeqIO module. Even though the SeqIO::scf data  
> looks ok, the fact that I need to unscramble it makes me nervous...  
> Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF  
> module and see if it works with our files.
>
> Thanks,
> CAT

You definitely shouldn't need to unscramble it; my guess is this is a  
legit bug that just has gone unnoticed.  I see that you have filed a  
ticket on it so we can at least track it.  Thanks!

chris


From scott at scottcain.net  Thu Jun 18 23:25:35 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:25:35 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A3A13D3.7050208@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>

Hi Xianjun,

The attached script (which is not too different from yours--I only did
a little clean up and made the padding consistent) makes the attached
image, which is what I think you want.  I'm using bioperl-live.

Scott


On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott,
>
> Do you mind to have a look of the code (below my signature) if I use the
> -postgrid callback correctly?
> I still cannnot get the background for the whole panel.
>
> Thanks
>
> Xianjun
>
>
> Xianjun Dong wrote:
>>
>> Hi, Scott
>>
>> Before I gave up my own whole solution to use GBrowse, I still want to
>> bother you once:
>>
>> As you suggested, I put -postgrid option when the panel, which will call a
>> function to draw the background. The code below is almost copied from the
>> online POD of Bio::Graphics::Panel (see
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>> )
>>
>> But it still does not work. Could you help to have a look? I paste it
>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>
>> THanks
>>
>> Xianjun
>>
>> ----------------------------------------------- mytestcode.pl
>> --------------------------
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 =
>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 =
>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 =
>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans ?=
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>
>> sub gap_it {
>> ? ?my $gd ? ?= shift;
>> ? ?my $panel = shift;
>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>> }
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> #$panel->add_track([$trans41,$trans31],
>> # ? ? ? ? ?-glyph ? => 'background',
>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>> # ? ? ? ? ? ? ? ? ?);
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>> ? ? ? ? ? ? ? ? -double=>1,
>> ? ? ? ? ? ? ? ? -tick=>2);
>>
>> $panel->add_track($trans,
>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -title => '$source',
>> ? ? ? ? ? ? ? ? -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>> ? ? ? ? ? ? ? ? );
>> ?print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Scott Cain wrote:
>>>
>>> Hi Xianjun,
>>>
>>> I understand what you want to do, as the current version of gbrowse
>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>> I can't tell you exactly how this works and you didn't send your code
>>> that uses this callback, so I can't try it either.
>>>
>>> One thing that is different between your code and gbrowse is that each
>>> of the tracks is actually a seperate panel (to allow track dragging),
>>> so it possible that this sort of callback doesn't work for
>>> Bio::Graphics any more.
>>>
>>> Scott
>>>
>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>> wrote:
>>>
>>>>
>>>> Hi, Scott
>>>>
>>>> Thanks for your reply first.
>>>>
>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>> hilite_regions_closure function to draw them out, using the following GD
>>>> function:
>>>>
>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>
>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>> pad_bottom), we can see this in the code of
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>
>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>
>>>> OK. I might have not explained my question explicitly. My question is:
>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>> where the highlight range will go from the roof to the floor. While in
>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>> difference of the two images.
>>>>
>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>> in the following links:
>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>> test.bioperl1.2.3.png:
>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>
>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>> your computer?
>>>>
>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>> might be a bug at that version, or whatever)
>>>>
>>>> Thanks
>>>>
>>>> Xianjun
>>>> =============================================
>>>>
>>>> # this generates the callback for highlighting a region
>>>> sub make_postgrid_callback {
>>>> ?my $settings = shift;
>>>> ?return unless ref $settings->{h_region};
>>>>
>>>> ?my @h_regions = map {
>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>> ? ? ? ? ? ? ? ?: ()
>>>> ?}
>>>> ? @{$settings->{h_region}};
>>>>
>>>> ?return unless @h_regions;
>>>> ?return hilite_regions_closure(@h_regions);
>>>> }
>>>>
>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>> # suitable for hilighting a region of a panel.
>>>> # The args are a list of [start,end,color]
>>>> sub hilite_regions_closure {
>>>> ?my @h_regions = @_;
>>>>
>>>> ?return sub {
>>>> ? my $gd ? ? = shift;
>>>> ? my $panel ?= shift;
>>>> ? my $left ? = $panel->pad_left;
>>>> ? my $top ? ?= $panel->top;
>>>> ? my $bottom = $panel->bottom;
>>>> ? for my $r (@h_regions) {
>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>> something
>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>> ? }
>>>> ?};
>>>> }
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>> Hello Xianjun,
>>>>
>>>> I don't think that approach will work. ?What you almost certainly need
>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>> region. ?For example code of how to do this, take a look at the
>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>
>>>> HI,
>>>>
>>>> I am not sure this is the right place I can get help.
>>>>
>>>> I've suffered by a problem for several days: I want to highlight parts
>>>> of
>>>> regions in my track, using a different background color. To do that, I
>>>> defined a glyph named "background", based on the
>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>> method, by adding code like below:
>>>>
>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>>
>>>> # the script is pasted at the end
>>>>
>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>> highlight regions into a list of features, and add_track with
>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>> works
>>>> as I expect, which will add a colored block at background of all tracks
>>>> in a
>>>> panel (including the ruler arrow). You can see the output image in
>>>> attached
>>>> file "test.bioperl1.2.3.png"
>>>>
>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>> not
>>>> work. Well, it works, but the highlight part only shrink to a low
>>>> height,
>>>> instead of covering all tracks in the panel. I also attached the output
>>>> here, see the file "test.bioperl1.6.png".
>>>>
>>>> I tried to think about the reason, the 'background' module is based on
>>>> the
>>>> generic module. What can cause the difference? Is it because $gd->height
>>>> is
>>>> different, or the tracks followed with 'background' track can not draw
>>>> from
>>>> the first position?
>>>>
>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>> person
>>>> solve problem, wise person avoid problem"...) But another problem is
>>>> coming:
>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>> function, which means I have to use some higher version if I want to
>>>> create
>>>> web map for my graphics, but then I have to give up using highlight
>>>> background.
>>>>
>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>> throw me some clue.
>>>>
>>>> Thanks ahead!!
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> ==================== test.pl =======================
>>>> #!/usr/bin/perl
>>>>
>>>> use strict;
>>>> use lib "$ENV{HOME}/lib";
>>>>
>>>> use Bio::Graphics;
>>>> use Bio::Graphics::Feature;
>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>
>>>> # processed_transcript
>>>> my $trans1 =
>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>> my $trans2 =
>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>> my $trans3 =
>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans4 =
>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans5 =
>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>> my $trans ?=
>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>
>>>> # hightlight
>>>> my $trans31 =
>>>>
>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>> -source=>'a');
>>>> my $trans41 =
>>>>
>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>> -source=>'b');
>>>>
>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>
>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>> 1.5
>>>> and 1.6
>>>> $panel->add_track([$trans41,$trans31],
>>>> ? ? ? ?-glyph ? => 'background',
>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>> 'a')?'#cccccc':'#fffc22'},
>>>> ? ? ? ? ? ? ? ?);
>>>>
>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>
>>>> $panel->add_track($trans,
>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>> ? ? ? ? ? ? ? ?-link =>
>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>> ?#EnsEMBL
>>>> ? ? ? ? ? ? ? ?);
>>>> ?print $panel->png;
>>>>
>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>> Bioperl
>>>> 1.2.3
>>>> my $map = $panel->create_web_map("image");
>>>> $panel->finished();
>>>>
>>>> 1;
>>>>
>>>> ==================== background.pm =======================
>>>> package Bio::Graphics::Glyph::background;
>>>>
>>>> use strict;
>>>> use base 'Bio::Graphics::Glyph::generic';
>>>> sub pad_top{
>>>> ?return 0;
>>>> }
>>>>
>>>> sub draw_component {
>>>> ?my $self = shift;
>>>> ?#$self->SUPER::draw_component(@_);
>>>> ?my ($gd,$dx,$dy) = @_;
>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>
>>>> ?# draw an arrow to indicate the direction of transcript
>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>> }
>>>>
>>>> 1;
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>>
>>>
>>>
>>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid.pl
Type: application/x-perl
Size: 2140 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid_highlight.png
Type: image/png
Size: 7195 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment-0003.png>

From scott at scottcain.net  Thu Jun 18 23:30:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:30:37 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
	<4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com>

Actually, to be clear, that's bioperl-live and Bio::Graphics version
1.96 from CPAN.

On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain<scott at scottcain.net> wrote:
> Hi Xianjun,
>
> The attached script (which is not too different from yours--I only did
> a little clean up and made the padding consistent) makes the attached
> image, which is what I think you want. ?I'm using bioperl-live.
>
> Scott
>
>
> On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>> Hi, Scott,
>>
>> Do you mind to have a look of the code (below my signature) if I use the
>> -postgrid callback correctly?
>> I still cannnot get the background for the whole panel.
>>
>> Thanks
>>
>> Xianjun
>>
>>
>> Xianjun Dong wrote:
>>>
>>> Hi, Scott
>>>
>>> Before I gave up my own whole solution to use GBrowse, I still want to
>>> bother you once:
>>>
>>> As you suggested, I put -postgrid option when the panel, which will call a
>>> function to draw the background. The code below is almost copied from the
>>> online POD of Bio::Graphics::Panel (see
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>>> )
>>>
>>> But it still does not work. Could you help to have a look? I paste it
>>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>>
>>> THanks
>>>
>>> Xianjun
>>>
>>> ----------------------------------------------- mytestcode.pl
>>> --------------------------
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 =
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 =
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 =
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans ?=
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>>
>>> sub gap_it {
>>> ? ?my $gd ? ?= shift;
>>> ? ?my $panel = shift;
>>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>>> }
>>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>>> and 1.6
>>> #$panel->add_track([$trans41,$trans31],
>>> # ? ? ? ? ?-glyph ? => 'background',
>>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>> # ? ? ? ? ? ? ? ? ?);
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>>> ? ? ? ? ? ? ? ? -double=>1,
>>> ? ? ? ? ? ? ? ? -tick=>2);
>>>
>>> $panel->add_track($trans,
>>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -title => '$source',
>>> ? ? ? ? ? ? ? ? -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>>> ? ? ? ? ? ? ? ? );
>>> ?print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Scott Cain wrote:
>>>>
>>>> Hi Xianjun,
>>>>
>>>> I understand what you want to do, as the current version of gbrowse
>>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>>> I can't tell you exactly how this works and you didn't send your code
>>>> that uses this callback, so I can't try it either.
>>>>
>>>> One thing that is different between your code and gbrowse is that each
>>>> of the tracks is actually a seperate panel (to allow track dragging),
>>>> so it possible that this sort of callback doesn't work for
>>>> Bio::Graphics any more.
>>>>
>>>> Scott
>>>>
>>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi, Scott
>>>>>
>>>>> Thanks for your reply first.
>>>>>
>>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>>> hilite_regions_closure function to draw them out, using the following GD
>>>>> function:
>>>>>
>>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>>
>>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>>> pad_bottom), we can see this in the code of
>>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>>
>>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>>
>>>>> OK. I might have not explained my question explicitly. My question is:
>>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>>> where the highlight range will go from the roof to the floor. While in
>>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>>> difference of the two images.
>>>>>
>>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>>> in the following links:
>>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>>> test.bioperl1.2.3.png:
>>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>>
>>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>>> your computer?
>>>>>
>>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>>> might be a bug at that version, or whatever)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Xianjun
>>>>> =============================================
>>>>>
>>>>> # this generates the callback for highlighting a region
>>>>> sub make_postgrid_callback {
>>>>> ?my $settings = shift;
>>>>> ?return unless ref $settings->{h_region};
>>>>>
>>>>> ?my @h_regions = map {
>>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>>> ? ? ? ? ? ? ? ?: ()
>>>>> ?}
>>>>> ? @{$settings->{h_region}};
>>>>>
>>>>> ?return unless @h_regions;
>>>>> ?return hilite_regions_closure(@h_regions);
>>>>> }
>>>>>
>>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>>> # suitable for hilighting a region of a panel.
>>>>> # The args are a list of [start,end,color]
>>>>> sub hilite_regions_closure {
>>>>> ?my @h_regions = @_;
>>>>>
>>>>> ?return sub {
>>>>> ? my $gd ? ? = shift;
>>>>> ? my $panel ?= shift;
>>>>> ? my $left ? = $panel->pad_left;
>>>>> ? my $top ? ?= $panel->top;
>>>>> ? my $bottom = $panel->bottom;
>>>>> ? for my $r (@h_regions) {
>>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>>> something
>>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>> ? }
>>>>> ?};
>>>>> }
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>> Hello Xianjun,
>>>>>
>>>>> I don't think that approach will work. ?What you almost certainly need
>>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>>> region. ?For example code of how to do this, take a look at the
>>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>>> wrote:
>>>>>
>>>>>
>>>>> HI,
>>>>>
>>>>> I am not sure this is the right place I can get help.
>>>>>
>>>>> I've suffered by a problem for several days: I want to highlight parts
>>>>> of
>>>>> regions in my track, using a different background color. To do that, I
>>>>> defined a glyph named "background", based on the
>>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>>> method, by adding code like below:
>>>>>
>>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>>
>>>>> # the script is pasted at the end
>>>>>
>>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>>> highlight regions into a list of features, and add_track with
>>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>>> works
>>>>> as I expect, which will add a colored block at background of all tracks
>>>>> in a
>>>>> panel (including the ruler arrow). You can see the output image in
>>>>> attached
>>>>> file "test.bioperl1.2.3.png"
>>>>>
>>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>>> not
>>>>> work. Well, it works, but the highlight part only shrink to a low
>>>>> height,
>>>>> instead of covering all tracks in the panel. I also attached the output
>>>>> here, see the file "test.bioperl1.6.png".
>>>>>
>>>>> I tried to think about the reason, the 'background' module is based on
>>>>> the
>>>>> generic module. What can cause the difference? Is it because $gd->height
>>>>> is
>>>>> different, or the tracks followed with 'background' track can not draw
>>>>> from
>>>>> the first position?
>>>>>
>>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>>> person
>>>>> solve problem, wise person avoid problem"...) But another problem is
>>>>> coming:
>>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>>> function, which means I have to use some higher version if I want to
>>>>> create
>>>>> web map for my graphics, but then I have to give up using highlight
>>>>> background.
>>>>>
>>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>>> throw me some clue.
>>>>>
>>>>> Thanks ahead!!
>>>>>
>>>>> Xianjun
>>>>>
>>>>>
>>>>> ==================== test.pl =======================
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use strict;
>>>>> use lib "$ENV{HOME}/lib";
>>>>>
>>>>> use Bio::Graphics;
>>>>> use Bio::Graphics::Feature;
>>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>>
>>>>> # processed_transcript
>>>>> my $trans1 =
>>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>>> my $trans2 =
>>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>>> my $trans3 =
>>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans4 =
>>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans5 =
>>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>>> my $trans ?=
>>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>>
>>>>> # hightlight
>>>>> my $trans31 =
>>>>>
>>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>>> -source=>'a');
>>>>> my $trans41 =
>>>>>
>>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>>> -source=>'b');
>>>>>
>>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>>
>>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>>> 1.5
>>>>> and 1.6
>>>>> $panel->add_track([$trans41,$trans31],
>>>>> ? ? ? ?-glyph ? => 'background',
>>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>>> 'a')?'#cccccc':'#fffc22'},
>>>>> ? ? ? ? ? ? ? ?);
>>>>>
>>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>>
>>>>> $panel->add_track($trans,
>>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>>> ? ? ? ? ? ? ? ?-link =>
>>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>>> ?#EnsEMBL
>>>>> ? ? ? ? ? ? ? ?);
>>>>> ?print $panel->png;
>>>>>
>>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>>> Bioperl
>>>>> 1.2.3
>>>>> my $map = $panel->create_web_map("image");
>>>>> $panel->finished();
>>>>>
>>>>> 1;
>>>>>
>>>>> ==================== background.pm =======================
>>>>> package Bio::Graphics::Glyph::background;
>>>>>
>>>>> use strict;
>>>>> use base 'Bio::Graphics::Glyph::generic';
>>>>> sub pad_top{
>>>>> ?return 0;
>>>>> }
>>>>>
>>>>> sub draw_component {
>>>>> ?my $self = shift;
>>>>> ?#$self->SUPER::draw_component(@_);
>>>>> ?my ($gd,$dx,$dy) = @_;
>>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>>
>>>>> ?# draw an arrow to indicate the direction of transcript
>>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>> }
>>>>>
>>>>> 1;
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087
> Ontario Institute for Cancer Research
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From roy.chaudhuri at gmail.com  Fri Jun 19 06:34:24 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 19 Jun 2009 11:34:24 +0100
Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file
In-Reply-To: <24095355.post@talk.nabble.com>
References: <24095355.post@talk.nabble.com>
Message-ID: <4A3B69B0.8080305@gmail.com>

Hi Cesar,

I can replicate this using an old Bioperl (version 1.5.2), but it 
appears to be fixed in version 1.6 and bioperl-live - the 
scientific_name method returns "Bacillus anthracis str. Sterne".

Hope this helps.
Roy.

Cesar Arze wrote:
> Hi all,
>    I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
> 
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
> 
>   ORGANISM  Bacillus anthracis str. Sterne
>             Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
>             cereus group.
> 
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
> 
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.


From cjfields at illinois.edu  Fri Jun 19 16:57:36 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 19 Jun 2009 15:57:36 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
Message-ID: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>

So, to follow up (and make sure we don't have any overlapping tuits)  
we should probably determine who wants to work on what (i.e. fastq  
updating, etc). I think it's possible to quickly add in Solexa/ 
Illumina/Sanger fastq similar to BioPython, just don't want to step on  
anyone's toes if they are halfway through doing this.

chris

On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:

> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for  
>>> specific memory/CPU intenstive things we also switch to using dbs.  
>>> For example, after many years of loyal use of disconnected_ranges  
>>> we switched to a simple SQL implementation of it, because of the  
>>> large performance gains it would give us.  Similarly in Ensembl as  
>>> well as in the old days of bioperl-db we opted for doing subseq  
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less  
>>> "disruptive" than avoiding object creation and provide significant  
>>> gains in performance. Could be set as an optional flag, and could  
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those  
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>> significant set of users out there who are dealing with next-gen  
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who  
>>>> would like to integrate nextgen into bioperl somehow, probably to  
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>> Would it be possible to have an ultra-light quality object with  
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>> return the data directly. At that point it's not taking much  
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up  
>>>> with a set of 'use cases' specifying what users want so we can  
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence  
>>>> database, something similar in theme to Lincoln's  
>>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Sat Jun 20 04:46:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 20 Jun 2009 09:46:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations? ?Our version (I believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).
>> Internally we have three separate FASTQ parsers/writers although
>> they do share code.
>
> We could easily do the same if others agree. ?Actually, if we specified that
> shorthand for a variant on a format would be designated as -format =>
> 'format-variant', I think we could easily hack SeqIO to deal with that by
> splitting on '-' and passing everything to the constructor as (-format =>
> 'format', -variant => 'variant'). ?Very little repeated code in this case,
> just an additional named parameter indicating the format variant (and the
> SeqIO class can do the type checking on that within the constructor).

Yes, when I started using names like "fastq-solexa" I did have in mind
"main-variant" naming convention, and potentially Biopython may one
day actually use this structure when allocating a Bio.SeqIO job to the
appropriate parser or writer.

For now, the Biopython list of formats is fairly short (and there are
relatively few of these sub-formats) so to keep things simple we just
have a flat mapping from the format name (e.g. "fasta", "fastq",
"fastq-solexa") to the parser/write code.

Peter


From e.stupka at ucl.ac.uk  Sat Jun 20 16:12:18 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Sat, 20 Jun 2009 21:12:18 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
	<E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
Message-ID: <F99E2F7F-05F7-462B-A3ED-96E09746994B@ucl.ac.uk>

Hi Chris,

I agree. I have not written a single line of code so far, while Heikki  
has some (but has been silent for a while) and you have perhaps some  
code ready to roll. I am happy to help where needed, just let me know  
what you'd like me to focus on. If you want to go ahead and implement  
the fastq staff discussed I can focus on bioperl-run.

cheers

Elia


On 19 Jun 2009, at 21:57, Chris Fields wrote:

> So, to follow up (and make sure we don't have any overlapping tuits)  
> we should probably determine who wants to work on what (i.e. fastq  
> updating, etc). I think it's possible to quickly add in Solexa/ 
> Illumina/Sanger fastq similar to BioPython, just don't want to step  
> on anyone's toes if they are halfway through doing this.
>
> chris
>
> On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
>
>> Better than colorspaced discussions for sure ;)
>>
>> Elia
>>
>> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>>
>>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>>> other options.
>>>
>>> Illuminating discussion, thanks Elia!
>>>
>>> urgh, excuse unintended bad pun above...
>>>
>>> chris
>>>
>>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>>
>>>> Interesting that you mention the database issue. We found that  
>>>> for specific memory/CPU intenstive things we also switch to using  
>>>> dbs. For example, after many years of loyal use of  
>>>> disconnected_ranges we switched to a simple SQL implementation of  
>>>> it, because of the large performance gains it would give us.   
>>>> Similarly in Ensembl as well as in the old days of bioperl-db we  
>>>> opted for doing subseq within SQL where possible.
>>>>
>>>> Some lean way of SQL'izing specific components could be less  
>>>> "disruptive" than avoiding object creation and provide  
>>>> significant gains in performance. Could be set as an optional  
>>>> flag, and could use temporary ad hoc SQL databases?
>>>>
>>>> Still, priority now is to make SeqIO compliant with all those  
>>>> formats, than we can worry about performance :)
>>>>
>>>> Elia
>>>>
>>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>>
>>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>>
>>>>>> Tristan Lefebure wrote:
>>>>>>> Hello,
>>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>>> experience, another issue is bioperl speed. For example, if  
>>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>>> shortcuts...).
>>>>>>
>>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>>> significant set of users out there who are dealing with next- 
>>>>>> gen sequencing and would consider using BioPerl for their work?
>>>>>>
>>>>>> I'm working with all the 1000-genomes data at the Sanger, and  
>>>>>> we at least are probably never going to use BioPerl for the work.
>>>>>
>>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>>
>>>>> Judging by the feedback there are definitely a set of users who  
>>>>> would like to integrate nextgen into bioperl somehow, probably  
>>>>> to take advantage of other aspects of bioperl.
>>>>>
>>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>>> Would it be possible to have an ultra-light quality object  
>>>>>>> with few simple methods for next-gen reads?
>>>>>>
>>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>>> return the data directly. At that point it's not taking much  
>>>>>> advantage of BioPerl. But certainly it could be done...
>>>>>
>>>>>
>>>>> I suppose the best way to assess what needs to be done is come  
>>>>> up with a set of 'use cases' specifying what users want so we  
>>>>> can design around them, otherwise we're shooting in the dark.
>>>>>
>>>>> I'm personally wondering if this could be done as a sequence  
>>>>> database, something similar in theme to Lincoln's  
>>>>> SeqFeature::Store, but sequence only, and returns quality  
>>>>> objects in a similar manner (ala Storable)?  Not sure whether  
>>>>> that's feasible, but it's appears at least scalable.
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> ---
>>>> Senior Lecturer, Bioinformatics
>>>> UCL Cancer Institute
>>>> Paul O' Gorman Building
>>>> University College London
>>>> Gower Street
>>>> WC1E 6BT
>>>> London
>>>> UK
>>>>
>>>> Office (UCL): +44 207 679 6493
>>>> Office (ICMS): +44 0207 8822374
>>>>
>>>> Mobile: +44 7597 566 194
>>>> Mobile (Italy): +39 338 8448801
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From lincoln.stein at gmail.com  Sat Jun 20 17:01:43 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Sat, 20 Jun 2009 17:01:43 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <6dce9a0b0906201401j40175dbdscd71360396fe9f7a@mail.gmail.com>

Hi All,

Apropos of this, I am about to release to CPAN a BioPerl interface to SAM
and BAM files. The documentation is still in progress, but you can get CVS
access here:

% cvs -d :pserver:anonymous at gmod.cvs.sourceforge.net:/cvsroot/gmod co
gbrowse-adaptors/Bio-SamTools

Lincoln

On Wed, Jun 17, 2009 at 7:29 AM, Elia Stupka <e.stupka at ucl.ac.uk> wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From hartzell at alerce.com  Mon Jun 22 09:18:20 2009
From: hartzell at alerce.com (George Hartzell)
Date: Mon, 22 Jun 2009 06:18:20 -0700
Subject: [Bioperl-l] Anyone at YAPC?
Message-ID: <19007.33948.411442.197063@already.dhcp.gene.com>


I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.

g.


From cjfields1 at gmail.com  Mon Jun 22 10:05:56 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Mon, 22 Jun 2009 09:05:56 -0500
Subject: [Bioperl-l] changing parameters in Bio::Tools::Run::RemoteBlast
In-Reply-To: <F52FFB80A7304749B467C46E10A2869D@jonas>
References: <F52FFB80A7304749B467C46E10A2869D@jonas>
Message-ID: <67ABC7E3-216E-4F5A-B18E-A775A6B4D8F7@gmail.com>

Jonas,

The best place to send questions is to the mail list (which I've  
cc'd).  If you reply make sure to keep the mail list in the reply-to.

There are two ways to set the parameters you want.  I'll show you what  
I consider the best, but I have no way to test it ATM.

$factory->submit_parameter($foo => 'bar')

is the syntax for setting PUT parameters.  Sad to see they didn't  
provide you with the exact PUT parameter names (as follows):

Max target sequences = 100 # MAX_NUM_SEQ
Expect threshold = 10  # EXPECT
Gap Costs = Existence 11 Extension 1   # GAPCOSTS
Compositional adjustments = Conditional compositional score matrix  
adjustment # COMPOSITION_BASED_STATISTICS

'Compositional adjustments' is as follows (from command-line blastall):

   -C  Use composition-based score adjustments for blastp or tblastn:
       As first character:
       D or d: default (equivalent to T)
       0 or F or f: no composition-based statistics
       2 or T or t: Composition-based score adjustments as in  
Bioinformatics 21:902-911,
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
           2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics  
21:902-911,
           2005, unconditionally
       For programs other than tblastn, must either be absent or be D,  
F or 0.
            As second character, if first character is equivalent to  
1, 2, or 3:

After the factory line and prior to the BLAST call you can add in the  
following (completely untested, excuse any possible mistakes) code:

my %put = (
    MAX_NUM_SEQ => 100,
    EXPECT      => 10,
    GAPCOSTS    => '11 1',
    COMPOSITION_BASED_STATISTICS => 2 # could be 1 as well
);

for my $putName (keys %put) {
    $self->submit_parameter($putName,$put{$putName});
}


chris

On Jun 22, 2009, at 8:14 AM, Jonas Schaer wrote:

> Hi there,
> I hope it's OK to ask you a question about the bio perl module   
> Bio::Tools::Run::RemoteBlast.
> My problem is, that I get different results using this perl-skript:
>
> #######################################################################################################################################################################################
>  use Bio::Seq::SeqFactory;
>  use Bio::Tools::Run::RemoteBlast;
>  use strict;
>  my @blast_report;
>  my $prog = 'blastp';
>  my $db   = 'nr';
>  my $e_val= '1e-10';
>  my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>  #my $input = @_;
>  my  
> $ 
> blast_seq 
> = 
> 'MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE 
> ';
>  #$v is just to turn on and off the messages
>  my $v = 1;
>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
> 'Bio::PrimarySeq');
>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>  
> "$blast_seq");
>  my $filename='temp2.out';
>  my $r = $factory->submit_blast($seq);
>  print STDERR "waiting..." if( $v > 0 );
>    while ( my @rids = $factory->each_rid )
>    {
>        foreach my $rid ( @rids )
>        {
>            my $rc = $factory->retrieve_blast($rid);
>            if( !ref($rc) )
>            {
>                if( $rc < 0 )
>                {
>                    $factory->remove_rid($rid);
>                }
>                print STDERR "." if ( $v > 0 );
>            }
>                else
>                {
>                    my $result = $rc->next_result();
>                    $factory->save_output($filename);
>                    $factory->remove_rid($rid);
>                    print "\nQuery Name: ", $result->query_name(),  
> "\n";
>                    while ( my $hit = $result->next_hit )
>                    {
>                        next unless ( $v > 0);
>                        print "\thit name is ", $hit->name, "\n";
>                        while( my $hsp = $hit->next_hsp )
>                        {
>                            print "\t\tscore is ", $hsp->score, "\n";
>                        }
>                    }
>                }
>        }
>
>
>    }
> @blast_report = get_file_data ($filename);
> return @blast_report;
>
>
> sub get_file_data
> {
>    use strict;
>    my($filename) = @_;
>    use strict;
>    use warnings;
>    # Initialize variables
>    my @filedata = ( );
>    unless( open(GET_FILE_DATA, $filename) )
>    {
>        print STDERR "Cannot open file \"$filename\"\n\n";
>        exit;
>    }
>    @filedata = <GET_FILE_DATA>;
>    close GET_FILE_DATA;
>    print @filedata;
>    return @filedata;
> }
>
> #######################################################################################################################################################################################
>
> ... and the blastp on the ncbi-homepage. The people from NCBI wrote  
> me that I have to change some parameters:
> ""
> You need to have the following:
>
>
> Max target sequences = 100
> Expect threshold = 10
> Gap Costs = Existence 11 Extension 1
> Compositional adjustments = Conditional compositional score matrix  
> adjustment""
>
> Could you please tell me exactly how to change this parameters  
> within my perl-skript? I think I have to use the "put" command, but  
> I just cannot find out, how...
>
> Regards and thank you so much in advance :),
>
> Jonas Schaer


From biopython at maubp.freeserve.co.uk  Mon Jun 22 10:24:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Jun 2009 15:24:55 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
> Peter wrote:
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional repeated
>> title is missing on the "+" lines (as discussed earlier on the BioPerl
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if that's
> currently the case. ?I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes - especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description (as
>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>> for examples, e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris

Another couple of points that I should have remembered earlier,
related to converting between PHRED scores and Solexa scores.
On the bright side, with Illumina abandoning the Solexa scores
in pipeline 1.3+, these issues will go away with time:

(7) If BioPerl will be converting Solexa scores to/from PHRED
scores as integers automatically (as discussed earlier), make
sure you round to the nearest whole number (don't just truncate
with a call to int!). MAQ does this by adding 0.5 before calling
int (while in Biopython I just use Python's round function).

(8) When asked to write out an old Solexa style FASTQ file,
what will you do if given a standard Sanger FASTQ file (or a
new Illumina 1.3+ FASTQ file) containing a base with PHRED
quality zero? This maps to a Solexa quality of minus infinity...
Right now the development version of Biopython will throw an
error in this situation, but mapping to the lowest observed
Solexa score might be reasonable.

Peter


From cjfields at illinois.edu  Mon Jun 22 09:54:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 08:54:22 -0500
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <19007.33948.411442.197063@already.dhcp.gene.com>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
Message-ID: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>

I think some of the regular #bioperl folk are there (Jay Hannah, R.  
Buels, etc).  May be worth going on IRC to find everyone.

I'm giving serious thought to going next year if I can get enough work  
done towards a perl6 or Moose-based bioperl.

chris

On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:

>
> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>
> g.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vofford at rvc.ac.uk  Mon Jun 22 12:10:43 2009
From: vofford at rvc.ac.uk (Offord, Victoria)
Date: Mon, 22 Jun 2009 17:10:43 +0100
Subject: [Bioperl-l] Clustalw
Message-ID: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>

Hi,

 
Can anyone help and tell me where I am going wrong please J 

I am getting this error from the following script:

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
-output=gcg   -matrix=BLOSUM -ktuple=2
-outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
file or directory

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357

STACK: Bio::Tools::Run::Alignment::Clustalw::_run
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756

STACK: Bio::Tools::Run::Alignment::Clustalw::align
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515

STACK: tester.pl:25

-----------------------------------------------------------

 
#--------------------------------------------SCRIPT---------------------
--------------------------#

#!/usr/bin/perl -w

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';

use Bio::Seq;

 
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');

 my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);

 
my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";

my $b =
"NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";

my $seq1 = Bio::Seq->new ( -seq  => $a,

                           -id   => 'real',

                           -desc => 'this is a real Seq');

 my $seq2 = Bio::Seq->new ( -seq  => $b,

                           -id   => 'test',

                           -desc => 'this is a test Seq');


my @seq_array = ($seq1,$seq2);

 
my $seq_array_ref = \@seq_array;

my $aln = $factory->align($seq_array_ref);

 
From Kevin.M.Brown at asu.edu  Mon Jun 22 12:48:27 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 22 Jun 2009 09:48:27 -0700
Subject: [Bioperl-l] Clustalw
In-Reply-To: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
References: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9BAF@EX02.asurite.ad.asu.edu>

Do you have ClustalW installed and in your path? 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Offord, Victoria
> Sent: Monday, June 22, 2009 9:11 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Clustalw
> 
> Hi,
> 
>  
> 
> Can anyone help and tell me where I am going wrong please J 
> 
> I am getting this error from the following script:
> 
>  
> 
>  
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
> -output=gcg   -matrix=BLOSUM -ktuple=2
> -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
> file or directory
> 
> STACK: Error::throw
> 
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::_run
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::align
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515
> 
> STACK: tester.pl:25
> 
> -----------------------------------------------------------
> 
>  
> 
>  
> 
>  
> 
>  
> 
> #--------------------------------------------SCRIPT-----------
> ----------
> --------------------------#
> 
> #!/usr/bin/perl -w
> 
> use Bio::Tools::Run::Alignment::Clustalw;
> 
> $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';
> 
> use Bio::Seq;
> 
>  
> 
>  my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
> 
>  my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
> 
>  
> 
> my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";
> 
> my $b =
> "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";
> 
> my $seq1 = Bio::Seq->new ( -seq  => $a,
> 
>                            -id   => 'real',
> 
>                            -desc => 'this is a real Seq');
> 
>  my $seq2 = Bio::Seq->new ( -seq  => $b,
> 
>                            -id   => 'test',
> 
>                            -desc => 'this is a test Seq');
> 
> 
>                            
> 
> my @seq_array = ($seq1,$seq2);
> 
>  
> 
> my $seq_array_ref = \@seq_array;
> 
> my $aln = $factory->align($seq_array_ref);
> 
>  
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jun 22 15:20:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 14:20:14 -0500
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
	<6DF025D32D664F61BC64B49184A2E6DD@NewLife>
Message-ID: <4766E259-B184-4552-817E-FBBB3A71A17F@illinois.edu>

On Jun 17, 2009, at 11:47 AM, Mark A. Jensen wrote:

> Hi All,
> I thought I'd revisit this thread, since in the last couple weeks,
> have used both techniques (bioperl-dev and branch from trunk) to
> produce completed projects. My thoughts:
>
> Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
> new addition to the core api. There was no pressure to conform to the
> existing api there. In particular, there was no implicit insistence to
> make things work through Bio::Search::Utils, and I was free to factor
> it out. The Tiling api was definitely unstable until the end, when it
> was ported to the core. As I made regular reports to bioperl-l,
> everything was transparent and up front, and I received excellent
> suggestions there (as usual).
> For Bio::Restriction, using the branch was just as natural. Here, the
> existing structure was well established, and all the work needed to
> happen beneath the api. All old t/Restriction tests needed to pass,
> and additional ones created for the new functionality. So here, using
> bioperl-dev wasn't natural, even though some "experiments" needed to
> be tried (some succeeded and some failed, as you can see in the
> commentary at Bug #2855). Even though the new code turned out to
> require substantial effort, the effort was required to fix a true bug
> in the working core, and any fixes needed to work transparently with
> respect to the users for whom this bug had not been an issue. Using
> the branch made it relatively easy to merge quickly back into the core
> when done, and there is a certain psychological pressure too provided
> by an open branch which is helpful.
>
> Hilmar raised the very good point in the previous discussion that
> (essentially) bioperl-dev shouldn't become a sandbox with lots of
> unfinished code scraps and derelict stuff that doesn't work. My view
> is bioperl-dev will become a sandbox only if we treat it like
> one. I've filled out the Bioperl-dev page on the wiki
> (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
> some recognition to devs there whose modules become part of the
> core may be a better way to insure that projects that are started on
> bioperl-dev actually get finished, than to prescribe beforehand what
> kinds of projects may get started. I believe this follows the adage of
> liberality on what is accepted, and strictness on what is emitted.
>
> cheers, MAJ

The main reason I wanted a bioperl-dev is for some code or  
implementations that don't seem to fit on a branch or directly into  
core, but would definitely be of use.  The tendency in the past has  
been to accept anything that works into core (the 'bazaar' approach).   
Initially that worked well, but the long-term end result has become  
potentially unmaintainable code bloat.  Committing new code to a  
branch isn't a great idea either, primarily b/c the code may be lost  
to the branch if it isn't followed up and remerged into trunk.  And  
forcing the code to fit into bioperl (or vice versa, which happened  
re: Feature Annotation) isn't the best way either.

Like Hilmar, though, I don't want dev to become a (sandbox|code  
dumping ground) either, so I think some additional discussion is  
warranted if anyone else wants to chime in.

chris


From mauricio at open-bio.org  Mon Jun 22 15:56:33 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Mon, 22 Jun 2009 14:56:33 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <A53006055C854297AAA58F6650F4F867@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
Message-ID: <4A3FE1F1.40607@open-bio.org>

Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 
release and latest code from bioperl-live. Also added bioperl-dev and 
bioperl-pise to the list.

Cheers,
Mauricio.


Mark A. Jensen wrote:
> cheers Mauricio! MAJ
> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
> <mauricio at open-bio.org>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
> <bioperl-l at bioperl.org>
> Sent: Thursday, June 11, 2009 12:46 PM
> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
> 
> 
>> Hi Mark,
>>
>> I'll take a look into this sometime between today and tomorrow. Will 
>> keep you posted. Thanks for the heads up :)
>>
>> Mauricio.
>>
>>
>> Mark A. Jensen wrote:
>>> Hi Chris and list-
>>> Will documentation for release 1.6 be available in pdoc on 
>>> doc.bioperl.org?
>>> I notice also that autogenerated documentation for bioperl-live 
>>> doesn't contain
>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>> cheers, Mark
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> 
> 


From cjfields at illinois.edu  Mon Jun 22 16:29:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:29:46 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
Message-ID: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>

On Jun 22, 2009, at 9:24 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
>> Peter wrote:
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional  
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the  
>>> BioPerl
>>> list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if that's
>> currently the case.  I thought that was fixed but maybe not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -  
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description (as
>>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>>> for examples, e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>
> Another couple of points that I should have remembered earlier,
> related to converting between PHRED scores and Solexa scores.
> On the bright side, with Illumina abandoning the Solexa scores
> in pipeline 1.3+, these issues will go away with time:
>
> (7) If BioPerl will be converting Solexa scores to/from PHRED
> scores as integers automatically (as discussed earlier), make
> sure you round to the nearest whole number (don't just truncate
> with a call to int!). MAQ does this by adding 0.5 before calling
> int (while in Biopython I just use Python's round function).

That can probably be done with sprintf if needed.  It avoids a call to  
POSIX functions.

> (8) When asked to write out an old Solexa style FASTQ file,
> what will you do if given a standard Sanger FASTQ file (or a
> new Illumina 1.3+ FASTQ file) containing a base with PHRED
> quality zero? This maps to a Solexa quality of minus infinity...
> Right now the development version of Biopython will throw an
> error in this situation, but mapping to the lowest observed
> Solexa score might be reasonable.
>
> Peter

Maybe address with a warning followed by assigning to the lowest  
solexa score?

chris


From cjfields at illinois.edu  Mon Jun 22 16:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:27:32 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <D9414186-E1DD-47B5-A0CF-9B96CD8151F8@illinois.edu>

np.  Thanks Mauricio!

chris

On Jun 22, 2009, at 2:56 PM, Mauricio Herrera Cuadra wrote:

> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0  
> release and latest code from bioperl-live. Also added bioperl-dev  
> and bioperl-pise to the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org 
>> >
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" <bioperl-l at bioperl.org 
>> >
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow.  
>>> Will keep you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on  
>>>> doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live  
>>>> doesn't contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun 22 22:46:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 22:46:58 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <78130116A84C4D989F3BCC217E8C5ACE@NewLife>

Done-- fortinbras-public/bioperl-max-0.1.1 is at ami-b55dbbdc; rakudo cloned at 
00:44 UTC,
parrot @ r39729, bioperl-live @ 15800, nexml @ r1136.
cheers!
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  do you 
> have mysql or pg?
>
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  rakudo 
> and we could do some damage...
>
> chris
>
> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
>
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Mon Jun 22 23:22:48 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 23:22:48 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife><4A3134EB.4080702@open-bio.org><A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <8B93DCE168434F608620AF17CAF12A9F@NewLife>

awesome, MHC- cheers and thanks-MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Monday, June 22, 2009 3:56 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release 
> and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to 
> the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
>> <mauricio at open-bio.org>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
>> <bioperl-l at bioperl.org>
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>
>>
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow. Will keep 
>>> you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live doesn't 
>>>> contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From pmr at ebi.ac.uk  Tue Jun 23 07:00:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 12:00:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
Message-ID: <4A40B5D6.40504@ebi.ac.uk>

We just added FASTQ parsing to EMBOSS and faced the same issues.

Parsing was easy - find the '@' line, read sequence until the '+' line
is reached, then read (seqlen) quality characters ... and check the next
line starts with '@'

Quality scores are kept as phred values. Phred of 0 means unknown, which
in Solexa is -5 (0.75 error rate = could be anything). We assume lower
quality scores are from alignments rather than single reads.

We gave up on trying to guess the quality score standard and require
users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
format files. If we only want the sequence then we don't care so we allow
"fastq" as a sequence format and ignore the quality scores in that case.

We also allow the integer quality score format ... is anyone still using
that (it looks horrible to me :-)

Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.

Any further tips would be very useful.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Tue Jun 23 07:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 12:29:56 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40B5D6.40504@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
Message-ID: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>

On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> We just added FASTQ parsing to EMBOSS and faced the same issues.
>

I was going to chat to you about this at BOSC, and suggest this be
added to EMBOSS - but you are well ahead of me ;)

> Parsing was easy - find the '@' line, read sequence until the '+' line
> is reached, then read (seqlen) quality characters ... and check the next
> line starts with '@'

That is basically what I did for Biopython.

> Quality scores are kept as phred values. Phred of 0 means unknown,
> which in Solexa is -5 (0.75 error rate = could be anything).

A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
quite follow your leap that this corresponds to a Solexa quality of -5. Could
you clarify?

> We assume lower quality scores are from alignments rather than single reads.

Did you mean to say "higher quality scores" (i.e. lower probability of error),
e.g a PHRED score of 80 which you can get from MAQ doing read mapping
or something consensus based.

> We gave up on trying to guess the quality score standard and require
> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
> format files. If we only want the sequence then we don't care so we allow
> "fastq" as a sequence format and ignore the quality scores in that case.

What format names have you used? Ideally we'd have the same names
in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
"fastq-illumina").

> We also allow the integer quality score format ... is anyone still using
> that (it looks horrible to me :-)

Do you mean the QUAL file format holding PHRED scores? Roche provide
tools to turn their SFF files into FASTA and QUAL files, so they are still used.

> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.
>
> Any further tips would be very useful.

Great. See you at BOSC 2009!

Peter
(Biopython)


From pmr at ebi.ac.uk  Tue Jun 23 08:22:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 13:22:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <4A40C909.40803@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>
> 
> I was going to chat to you about this at BOSC, and suggest this be
> added to EMBOSS - but you are well ahead of me ;)

Not that well ahead really ... someone asked for it in our BoF at
BOSC/ISMB last year so we thought we'd better get it done before this
one. it was implemented a couple of days ago :-)

>> Parsing was easy - find the '@' line, read sequence until the '+' line
>> is reached, then read (seqlen) quality characters ... and check the next
>> line starts with '@'
> 
> That is basically what I did for Biopython.
> 
>> Quality scores are kept as phred values. Phred of 0 means unknown,
>> which in Solexa is -5 (0.75 error rate = could be anything).
> 
> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
> quite follow your leap that this corresponds to a Solexa quality of -5. Could
> you clarify?

Phred score is -10 log(p) where p is the probability of error. A phred
of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
(3/4 chance that any base you pick is wrong).

Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
why Solexa scores can go down to -5 in their fastq format.

>> We assume lower quality scores are from alignments rather than single reads.
> 
> Did you mean to say "higher quality scores" (i.e. lower probability of error),
> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
> or something consensus based.

Actually I mean both. Error probabilities below 0.75 for a single base
are silly, and error probabilities below 0.0001 make sense only when two
or more high quality bases are aligned.

>> We gave up on trying to guess the quality score standard and require
>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>> format files. If we only want the sequence then we don't care so we allow
>> "fastq" as a sequence format and ignore the quality scores in that case.
> 
> What format names have you used? Ideally we'd have the same names
> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
> "fastq-illumina").

We don't normally use '-' in our format names so we have fastqsanger,
fastqsolexa, fastqillumina and fastqint. None of these have been tried
on users as yet.

The '-' names look nice though. We can consider introducing them. Do you
have a full list of format names (sequence, feature, alignment, etc.) we
can try to conform to?

>> We also allow the integer quality score format ... is anyone still using
>> that (it looks horrible to me :-)
> 
> Do you mean the QUAL file format holding PHRED scores? Roche provide
> tools to turn their SFF files into FASTA and QUAL files, so they are still used.

Probably ... unless there is a Solexa version too.

regards,

Peter


From rmb32 at cornell.edu  Tue Jun 23 10:28:08 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 23 Jun 2009 07:28:08 -0700
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
	<FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
Message-ID: <4A40E678.8010709@cornell.edu>

Yep, YAPC is great!  This is my first one.  I saw a guy walking around 
here with a nametag that I thought said "Mark Jensen".  MAJ, are you here?

Rob

Chris Fields wrote:
> I think some of the regular #bioperl folk are there (Jay Hannah, R. 
> Buels, etc).  May be worth going on IRC to find everyone.
> 
> I'm giving serious thought to going next year if I can get enough work 
> done towards a perl6 or Moose-based bioperl.
> 
> chris
> 
> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
> 
>>
>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>
>> g.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From maj at fortinbras.us  Tue Jun 23 11:54:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 23 Jun 2009 11:54:24 -0400
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <4A40E678.8010709@cornell.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com><FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
	<4A40E678.8010709@cornell.edu>
Message-ID: <DD5C6FE6AC5842CEAA4487EEC65AC726@NewLife>

I think there are about 75000 of us; that one ain't me, I'm afraid. Maybe next 
year! cheers  MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "bioperl-l List" <bioperl-l at bioperl.org>
Sent: Tuesday, June 23, 2009 10:28 AM
Subject: Re: [Bioperl-l] Anyone at YAPC?


> Yep, YAPC is great!  This is my first one.  I saw a guy walking around here 
> with a nametag that I thought said "Mark Jensen".  MAJ, are you here?
>
> Rob
>
> Chris Fields wrote:
>> I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, 
>> etc).  May be worth going on IRC to find everyone.
>>
>> I'm giving serious thought to going next year if I can get enough work done 
>> towards a perl6 or Moose-based bioperl.
>>
>> chris
>>
>> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
>>
>>>
>>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>>
>>> g.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Tue Jun 23 16:34:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 15:34:48 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <21116F70-93A3-4539-9BE2-61C838BA730E@illinois.edu>


On Jun 23, 2009, at 7:22 AM, Peter Rice wrote:

> Peter wrote:
> ...
>>> Parsing was easy - find the '@' line, read sequence until the '+'  
>>> line
>>> is reached, then read (seqlen) quality characters ... and check  
>>> the next
>>> line starts with '@'
>>
>> That is basically what I did for Biopython.

This is now what bioperl will do (at least when I commit changes today  
or tomorrow).

> ...
>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so  
>>> we allow
>>> "fastq" as a sequence format and ignore the quality scores in that  
>>> case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do  
> you
> have a full list of format names (sequence, feature, alignment,  
> etc.) we
> can try to conform to?

We (bioperl) are using biopython's convention of format-variant, or at  
least that's how I'm coding it up.  With SeqIO it's fairly easy to  
check for the format variant prior to loading the class and pass it in  
as a second named parameter.

I have actually thought of adding in fastqint as an option (it would  
be fairly easy to do).

chris


From cjfields at illinois.edu  Tue Jun 23 17:04:25 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 16:04:25 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <49A4AD93-69FB-406E-8FFB-99C74A457402@illinois.edu>

Just so we're on the same page data-wise, would there be a common set  
of fastq data files to use for tests?  I am using some from SRA (which  
is all converted to Sanger).  Just need a few small ones for older  
solexa and newer illumina.

chris

On Jun 23, 2009, at 6:29 AM, Peter wrote:

> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July  
>> 15th.
>>
>> Any further tips would be very useful.
>
> Great. See you at BOSC 2009!
>
> Peter
> (Biopython)


From biopython at maubp.freeserve.co.uk  Tue Jun 23 17:39:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 22:39:48 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>

On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter wrote:
>> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>>
>>
>> I was going to chat to you about this at BOSC, and suggest this be
>> added to EMBOSS - but you are well ahead of me ;)
>
> Not that well ahead really ... someone asked for it in our BoF at
> BOSC/ISMB last year so we thought we'd better get it done before this
> one. it was implemented a couple of days ago :-)
>

Well, ahead of my asking!

>>> Quality scores are kept as phred values. Phred of 0 means unknown,
>>> which in Solexa is -5 (0.75 error rate = could be anything).
>>
>> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
>> quite follow your leap that this corresponds to a Solexa quality of -5. Could
>> you clarify?
>
> Phred score is -10 log(p) where p is the probability of error. A phred
> of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
> (3/4 chance that any base you pick is wrong).
>
> Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
> why Solexa scores can go down to -5 in their fastq format.
>
>>> We assume lower quality scores are from alignments rather than
>>> single reads.
>>
>> Did you mean to say "higher quality scores" (i.e. lower probability of error),
>> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
>> or something consensus based.
>
> Actually I mean both. Error probabilities below 0.75 for a single base
> are silly, and error probabilities below 0.0001 make sense only when two
> or more high quality bases are aligned.

I see what you mean - a probability of error of 0.75 matches that
for a random base call, obvious when you put it like that. Of course,
there is this nasty little thought at the back of my mind that sooner
or later someone will use FASTQ files for proteins (e.g. from some
mass-spec protein sequencing).

A probability less than that (e.g. 0) is actually worse than random and
could be considered as mean "we're pretty sure this isn't the stated
letter". But that would be silly, as you say.

>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so we allow
>>> "fastq" as a sequence format and ignore the quality scores in that case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do you
> have a full list of format names (sequence, feature, alignment, etc.) we
> can try to conform to?

See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Getting EMBOSS to conforming should be trivial - in general when
picking a format name for Biopython's SeqIO or AlignIO (and we
have avoided multiple aliases with one exception) we have tried to
use anything shared by BioPerl and EMBOSS. The FASTQ variants
are unusual in that Biopython got to invent some names.

In future where would be a good place to discuss these kinds of
cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

>>> We also allow the integer quality score format ... is anyone still
>>> using that (it looks horrible to me :-)
>>
>> Do you mean the QUAL file format holding PHRED scores?
>> Roche provide tools to turn their SFF files into FASTA and
>> QUAL files, so they are still used.
>
> Probably ... unless there is a Solexa version too.

We may be talking at cross purposes here, this is QUAL format:
http://www.bioperl.org/wiki/Qual_sequence_format

Peter


From pmr at ebi.ac.uk  Wed Jun 24 07:48:23 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 12:48:23 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>	
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>	
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
Message-ID: <4A421287.4000203@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> The '-' names look nice though. We can consider introducing them. Do you
>> have a full list of format names (sequence, feature, alignment, etc.) we
>> can try to conform to?
> 
> See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Thanks. I'll take a look at those.

> Getting EMBOSS to conforming should be trivial - in general when
> picking a format name for Biopython's SeqIO or AlignIO (and we
> have avoided multiple aliases with one exception) we have tried to
> use anything shared by BioPerl and EMBOSS. The FASTQ variants
> are unusual in that Biopython got to invent some names.
> 
> In future where would be a good place to discuss these kinds of
> cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

I was planning to suggest a get-together at BOSC in Stockholm so we can
identify common cross-platform issues. I'm sure there are many ways we
can conform with naming and interfaces and perhaps even share code.

>>>> We also allow the integer quality score format ... is anyone still
>>>> using that (it looks horrible to me :-)
>>> Do you mean the QUAL file format holding PHRED scores?
>>> Roche provide tools to turn their SFF files into FASTA and
>>> QUAL files, so they are still used.
>> Probably ... unless there is a Solexa version too.
> 
> We may be talking at cross purposes here, this is QUAL format:
> http://www.bioperl.org/wiki/Qual_sequence_format

Yes that is different. We'll worry about separate QUAL files later (we
already find separate GFF files a pain for features) and still with the
"fastqint" format name.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Wed Jun 24 10:56:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 15:56:13 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A421287.4000203@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
Message-ID: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>

On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> I was planning to suggest a get-together at BOSC in Stockholm so we can
> identify common cross-platform issues. I'm sure there are many ways we
> can conform with naming and interfaces and perhaps even share code.
>

That would be a good idea - but while there are quite a few Biopython
people at BOSC this year, I don't know if there will be many from BioPerl
(there isn't a BioPerl update talk scheduled).

>>>>> We also allow the integer quality score format ... is anyone still
>>>>> using that (it looks horrible to me :-)
>>>> Do you mean the QUAL file format holding PHRED scores?
>>>> Roche provide tools to turn their SFF files into FASTA and
>>>> QUAL files, so they are still used.
>>> Probably ... unless there is a Solexa version too.
>>
>> We may be talking at cross purposes here, this is QUAL format:
>> http://www.bioperl.org/wiki/Qual_sequence_format
>
> Yes that is different. We'll worry about separate QUAL files later (we
> already find separate GFF files a pain for features) and still with the
> "fastqint" format name.

So when you say "fastqint" are you talking about something else?
Could you show us an example record in this format?

Peter
[I need to remember to proof read my evening emails more carefully]


From vecchi.b at gmail.com  Wed Jun 24 12:13:02 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:13:02 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
Message-ID: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>

Jay asked me to forward this to the list, since he sometimes has problems
getting his mails delivered.
Feel free to suggest topics for the bioperl hackathon to take place tomorrow
and on friday!

Bruno.


From: Jay Hannah <jay at jays.net>
Date: June 24, 2009 11:55:42 AM EDT
To: Bioperl <bioperl-l at bioperl.org>
Subject: Hackathon tomorrow (I think)

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

  http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla.

Come yell at me (us?) in IRC:

  http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at illinois.edu  Wed Jun 24 12:22:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:22:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
Message-ID: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>


On Jun 24, 2009, at 9:56 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> I was planning to suggest a get-together at BOSC in Stockholm so we  
>> can
>> identify common cross-platform issues. I'm sure there are many ways  
>> we
>> can conform with naming and interfaces and perhaps even share code.
>>
>
> That would be a good idea - but while there are quite a few Biopython
> people at BOSC this year, I don't know if there will be many from  
> BioPerl
> (there isn't a BioPerl update talk scheduled).

Most of us are caught up with other work, though I will likely be able  
to dedicate more time to it in the ext few months.

Also doesn't help that my travel stipend doesn't start until Aug. 1.

>>>>>> We also allow the integer quality score format ... is anyone  
>>>>>> still
>>>>>> using that (it looks horrible to me :-)
>>>>> Do you mean the QUAL file format holding PHRED scores?
>>>>> Roche provide tools to turn their SFF files into FASTA and
>>>>> QUAL files, so they are still used.
>>>> Probably ... unless there is a Solexa version too.
>>>
>>> We may be talking at cross purposes here, this is QUAL format:
>>> http://www.bioperl.org/wiki/Qual_sequence_format
>>
>> Yes that is different. We'll worry about separate QUAL files later  
>> (we
>> already find separate GFF files a pain for features) and still with  
>> the
>> "fastqint" format name.
>
> So when you say "fastqint" are you talking about something else?
> Could you show us an example record in this format?
>
> Peter
> [I need to remember to proof read my evening emails more carefully]

The same as fastq, except the ASCII quality is converted to actual  
score:

@4_1_912_360
AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
+4_1_912_360
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40  
40 40 40 40 40 40 26 40 40 14 39 40 40
@4_1_54_483
TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
+4_1_54_483
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40  
28 40 40 40 40 40 40 16 40 40 5 40 40
chris


From cjfields at illinois.edu  Wed Jun 24 12:26:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:26:22 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
Message-ID: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>

1) Any help towards bugzilla fixes would be most welcome.
2) Better GFF3 integration
3) Typed but lightweight seqfeatures
4) Bio::Moose?

I can dedicate more time to the latter two in about a month, but I'll  
be tied up until then.  Let me know if anyone needs collab on biomoose  
on github; Mark Jensen's already added.

chris

On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:

> Jay asked me to forward this to the list, since he sometimes has  
> problems
> getting his mails delivered.
> Feel free to suggest topics for the bioperl hackathon to take place  
> tomorrow
> and on friday!
>
> Bruno.
>
>
> From: Jay Hannah <jay at jays.net>
> Date: June 24, 2009 11:55:42 AM EDT
> To: Bioperl <bioperl-l at bioperl.org>
> Subject: Hackathon tomorrow (I think)
>
> Hola,
>
> So a few of us here at YAPC might try to be productive tomorrow (and
> Friday?).
>
> I don't know if we have any commit bits attending.
>
> Feel free to suggest things:
>
>  http://yapc10.org/yn2009/wiki?node=BioPerl
>
> Or point me to list(s) of things. Perhaps we'll try to help out in  
> Bugzilla.
>
> Come yell at me (us?) in IRC:
>
>  http://www.bioperl.org/wiki/Irc
>
> Thanks,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 24 12:27:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 17:27:39 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
Message-ID: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>

On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu> wrote:
>> So when you say "fastqint" are you talking about something else?
>> Could you show us an example record in this format?
>>
>> Peter
>
> The same as fastq, except the ASCII quality is converted to actual score:
>
> @4_1_912_360
> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
> +4_1_912_360
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40
> 40 40 40 40 26 40 40 14 39 40 40
> @4_1_54_483
> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
> +4_1_54_483
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40
> 40 40 40 40 40 16 40 40 5 40 40

OK - and who uses this "Integer FASTQ" files?

Peter


From vecchi.b at gmail.com  Wed Jun 24 12:40:50 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:40:50 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
Message-ID: <1a0c1b750906240940t7c0003f9hf10eb30c0d85a5ce@mail.gmail.com>

>
> Is there a todo list for biomoose? I'd be glad to hack in, but I'm afraid
> to step into someone else's work or to do things without general agreement.
> It would be nice to have directions for small sized chunks of work to do.
> In any case, count me in!
>
> 2009/6/24 Chris Fields <cjfields at illinois.edu>
>
> 1) Any help towards bugzilla fixes would be most welcome.
>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>> 4) Bio::Moose?
>>
>> I can dedicate more time to the latter two in about a month, but I'll be
>> tied up until then.  Let me know if anyone needs collab on biomoose on
>> github; Mark Jensen's already added.
>>
>> chris
>>
>>
>> On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:
>>
>>  Jay asked me to forward this to the list, since he sometimes has problems
>>> getting his mails delivered.
>>> Feel free to suggest topics for the bioperl hackathon to take place
>>> tomorrow
>>> and on friday!
>>>
>>> Bruno.
>>>
>>>
>>> From: Jay Hannah <jay at jays.net>
>>> Date: June 24, 2009 11:55:42 AM EDT
>>> To: Bioperl <bioperl-l at bioperl.org>
>>> Subject: Hackathon tomorrow (I think)
>>>
>>> Hola,
>>>
>>> So a few of us here at YAPC might try to be productive tomorrow (and
>>> Friday?).
>>>
>>> I don't know if we have any commit bits attending.
>>>
>>> Feel free to suggest things:
>>>
>>>  http://yapc10.org/yn2009/wiki?node=BioPerl
>>>
>>> Or point me to list(s) of things. Perhaps we'll try to help out in
>>> Bugzilla.
>>>
>>> Come yell at me (us?) in IRC:
>>>
>>>  http://www.bioperl.org/wiki/Irc
>>>
>>> Thanks,
>>>
>>> Jay Hannah
>>> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From jay at jays.net  Wed Jun 24 12:44:51 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 12:44:51 -0400
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
Message-ID: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>

On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> Let me know if anyone needs collab on biomoose on github; Mark  
> Jensen's already added.

Anything on github should be trivial, even with no perms -- we can  
just fork and then send you (whoever) pull requests. github++  :)

> 1) Any help towards bugzilla fixes would be most welcome.

I don't know how to make any progress in bugzilla if no one has a  
commit bit...?

> 2) Better GFF3 integration
> 3) Typed but lightweight seqfeatures

Are there bugzilla tickets (or somewhere) describing those?

I wonder if anyone can help me get out of sporadic MailMan purgatory...

Thanks,

j


From cjfields at illinois.edu  Wed Jun 24 12:54:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:54:06 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
Message-ID: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>


On Jun 24, 2009, at 11:27 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>> So when you say "fastqint" are you talking about something else?
>>> Could you show us an example record in this format?
>>>
>>> Peter
>>
>> The same as fastq, except the ASCII quality is converted to actual  
>> score:
>>
>> @4_1_912_360
>> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
>> +4_1_912_360
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40  
>> 40 40 40
>> 40 40 40 40 26 40 40 14 39 40 40
>> @4_1_54_483
>> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
>> +4_1_54_483
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40  
>> 40 28 40
>> 40 40 40 40 40 16 40 40 5 40 40
>
> OK - and who uses this "Integer FASTQ" files?
>
> Peter

Not sure, but it is covered by MAQ via the conversion script (as FASTQ- 
int):

http://maq.sourceforge.net/fq_all2std.pl

chris


From jay at jays.net  Wed Jun 24 11:55:42 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 11:55:42 -0400
Subject: [Bioperl-l] Hackathon tomorrow (I think)
Message-ID: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and  
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

    http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in  
Bugzilla.

Come yell at me (us?) in IRC:

    http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From bernd.web at gmail.com  Wed Jun 24 13:11:51 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 24 Jun 2009 19:11:51 +0200
Subject: [Bioperl-l] Bioperl_scripts
Message-ID: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>

Hi,

The bioperl scripts section at
http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
examples.
However, it quite a number of scripts cannot be found anymore and return errors:

For example for the first link (scripts/install_bioperl_scripts.pl)
Filesystem has no item: File not found: revision 15800, path
'/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
/usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245

Also all scripts in the Bio::Graphics section cannot be found.
Is the http://www.bioperl.org/wiki/Bioperl_scripts page still supported?


Regards,
Bernd


From cjfields at illinois.edu  Wed Jun 24 16:57:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 15:57:51 -0500
Subject: [Bioperl-l] Bioperl_scripts
In-Reply-To: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
References: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
Message-ID: <5AF99205-F977-45A1-B4AF-C3858A5727FD@illinois.edu>


On Jun 24, 2009, at 12:11 PM, Bernd Web wrote:

> Hi,
>
> The bioperl scripts section at
> http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
> examples.
> However, it quite a number of scripts cannot be found anymore and  
> return errors:
>
> For example for the first link (scripts/install_bioperl_scripts.pl)
> Filesystem has no item: File not found: revision 15800, path
> '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
> /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245
>
> Also all scripts in the Bio::Graphics section cannot be found.
> Is the http://www.bioperl.org/wiki/Bioperl_scripts page still  
> supported?
>
> Regards,
> Bernd

Re: Bio::Graphics, all modules and related scripts have been moved to  
a separate repo and CPAN release (latest):

http://search.cpan.org/~lds/Bio-Graphics-1.96/

Beyond that I would consider all scripts and the wiki page supported.   
It's best to file this to bugzilla as a documentation issue so we fix  
it and don't about forget it amongst the flurry of email.

chris


From cjfields at illinois.edu  Wed Jun 24 17:10:34 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 16:10:34 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
Message-ID: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>


On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:

> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>> Let me know if anyone needs collab on biomoose on github; Mark  
>> Jensen's already added.
>
> Anything on github should be trivial, even with no perms -- we can  
> just fork and then send you (whoever) pull requests. github++  :)
>
>> 1) Any help towards bugzilla fixes would be most welcome.
>
> I don't know how to make any progress in bugzilla if no one has a  
> commit bit...?

For some reason I thought you had a commit bit; we can add you in if  
needed.  Anyway, patches are most definitely welcome ;>

>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>
> Are there bugzilla tickets (or somewhere) describing those?

No as the issues are more complex than one single bug, but we do have  
something to help track for the time being:

http://www.bioperl.org/wiki/GFF_Refactor
http://www.bioperl.org/wiki/Align_Refactor

I'll probably file TODOs during the process for those refactors.  The  
easiest to tackle would be probably be Align/LocatableSeq refactors.

> I wonder if anyone can help me get out of sporadic MailMan  
> purgatory...
>
> Thanks,
>
> j

-c

PS - Don't feel constrained by the above.  There are many many areas  
to contribute to.


From pmr at ebi.ac.uk  Wed Jun 24 18:44:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 23:44:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
Message-ID: <4A42AC51.3090809@ebi.ac.uk>

Chris Fields wrote:
> Not sure, but it is covered by MAQ via the conversion script (as 
> FASTQ-int):

Are the scores phred or Solexa?

Peter Rice


From adlai at refenestration.com  Wed Jun 24 22:08:31 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 04:08:31 +0200
Subject: [Bioperl-l] Extreme newbie question.
Message-ID: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>

I have been trying to install BioPerl for a while now and after  
pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
Fink installation, a >cpan installation and removing my .cpan folder I  
am still at square 0. I do not want to do anymore damage to my  
computer, yet I really need a working install (especially to interface  
with remote DBs like GenBank. Can anyone give me some advice here?  
After each attempt, I have tried to run perldoc bptutorial.pl and  
tried test scripts with "use Bio::Perl" in the headers and I just  
receive  error mesages like the following:

Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level /Library/ 
Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/ 
Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
Library/Perl/5.8.1 .) at trsh.pl line 1.

I have been working from the OReilly book astering Perl for  
Bioinformatics and the INSTALL file and have scoured around the  
BioPerl website and am still stuck.

Thanks in advance,

Adlai


From kpclancy at hotmail.com  Wed Jun 24 22:31:17 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Wed, 24 Jun 2009 20:31:17 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net> 
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
Message-ID: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>


is there an intention to have a hackathon at ISMB this weekend - I know there is a 2 day BOSC 
kevin

> From: cjfields at illinois.edu
> To: jay at jays.net
> Date: Wed, 24 Jun 2009 16:10:34 -0500
> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> 
> 
> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> 
> > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >> Let me know if anyone needs collab on biomoose on github; Mark  
> >> Jensen's already added.
> >
> > Anything on github should be trivial, even with no perms -- we can  
> > just fork and then send you (whoever) pull requests. github++  :)
> >
> >> 1) Any help towards bugzilla fixes would be most welcome.
> >
> > I don't know how to make any progress in bugzilla if no one has a  
> > commit bit...?
> 
> For some reason I thought you had a commit bit; we can add you in if  
> needed.  Anyway, patches are most definitely welcome ;>
> 
> >> 2) Better GFF3 integration
> >> 3) Typed but lightweight seqfeatures
> >
> > Are there bugzilla tickets (or somewhere) describing those?
> 
> No as the issues are more complex than one single bug, but we do have  
> something to help track for the time being:
> 
> http://www.bioperl.org/wiki/GFF_Refactor
> http://www.bioperl.org/wiki/Align_Refactor
> 
> I'll probably file TODOs during the process for those refactors.  The  
> easiest to tackle would be probably be Align/LocatableSeq refactors.
> 
> > I wonder if anyone can help me get out of sporadic MailMan  
> > purgatory...
> >
> > Thanks,
> >
> > j
> 
> -c
> 
> PS - Don't feel constrained by the above.  There are many many areas  
> to contribute to.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 24 23:54:28 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 22:54:28 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
Message-ID: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>

I have no idea; I don't think there are many bioperl devs attending  
this year unfortunately.  Any meetings in the next year where we could  
set up a bioperl hackathon?  I will likely be available to attend if  
it's stateside...

chris

On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:

>
> is there an intention to have a hackathon at ISMB this weekend - I  
> know there is a 2 day BOSC
> kevin
>
>> From: cjfields at illinois.edu
>> To: jay at jays.net
>> Date: Wed, 24 Jun 2009 16:10:34 -0500
>> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
>>
>>
>> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
>>
>>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>>>> Let me know if anyone needs collab on biomoose on github; Mark
>>>> Jensen's already added.
>>>
>>> Anything on github should be trivial, even with no perms -- we can
>>> just fork and then send you (whoever) pull requests. github++  :)
>>>
>>>> 1) Any help towards bugzilla fixes would be most welcome.
>>>
>>> I don't know how to make any progress in bugzilla if no one has a
>>> commit bit...?
>>
>> For some reason I thought you had a commit bit; we can add you in if
>> needed.  Anyway, patches are most definitely welcome ;>
>>
>>>> 2) Better GFF3 integration
>>>> 3) Typed but lightweight seqfeatures
>>>
>>> Are there bugzilla tickets (or somewhere) describing those?
>>
>> No as the issues are more complex than one single bug, but we do have
>> something to help track for the time being:
>>
>> http://www.bioperl.org/wiki/GFF_Refactor
>> http://www.bioperl.org/wiki/Align_Refactor
>>
>> I'll probably file TODOs during the process for those refactors.  The
>> easiest to tackle would be probably be Align/LocatableSeq refactors.
>>
>>> I wonder if anyone can help me get out of sporadic MailMan
>>> purgatory...
>>>
>>> Thanks,
>>>
>>> j
>>
>> -c
>>
>> PS - Don't feel constrained by the above.  There are many many areas
>> to contribute to.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jun 25 10:00:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 09:00:47 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <CB4314ED-4076-42AD-96CC-64CB429929D5@illinois.edu>


On Jun 24, 2009, at 5:44 PM, Peter Rice wrote:

> Chris Fields wrote:
>> Not sure, but it is covered by MAQ via the conversion script (as  
>> FASTQ-int):
>
> Are the scores phred or Solexa?
>
> Peter Rice

Not sure actually.  The perl script I linked to looks like it converts  
using the same scale as solexa (illumina 1.0).

chris


From chmille4 at gmail.com  Thu Jun 25 10:46:26 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 10:46:26 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
Message-ID: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>

Hi all,

Quick question I came across while writing the Bio::Nexml module.

I'm trying to link taxon data to a Bio::LocatableSeq object inside a
Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
SeqFeatures, but according to this HowTo (
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
considered to refer to a portion of a sequence, whereas something like taxon
data would refer to the entire sequence and should be handled as an
annotation. However, as far as I can tell Bio::LocatableSeq does not support
annotation objects.
What would be the best way to relate taxon data to a single sequence inside
an alignment?


Thanks,
Chase


From Kevin.M.Brown at asu.edu  Thu Jun 25 11:21:02 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 08:21:02 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink

That error suggests that the install fails and you need to figure out
why from the install error messages. I suspect you aren't doing the
install as root, but as a normal user who lacks the needed permissions
to change files in certain directories. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Adlai Burman
> Sent: Wednesday, June 24, 2009 7:09 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Extreme newbie question.
> 
> I have been trying to install BioPerl for a while now and after  
> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
> Fink installation, a >cpan installation and removing my .cpan 
> folder I  
> am still at square 0. I do not want to do anymore damage to my  
> computer, yet I really need a working install (especially to 
> interface  
> with remote DBs like GenBank. Can anyone give me some advice here?  
> After each attempt, I have tried to run perldoc bptutorial.pl and  
> tried test scripts with "use Bio::Perl" in the headers and I just  
> receive  error mesages like the following:
> 
> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level 
> /Library/ 
> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl 
> /Network/Library/ 
> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
> Library/Perl/5.8.1 .) at trsh.pl line 1.
> 
> I have been working from the OReilly book astering Perl for  
> Bioinformatics and the INSTALL file and have scoured around the  
> BioPerl website and am still stuck.
> 
> Thanks in advance,
> 
> Adlai
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From David.Messina at sbc.su.se  Thu Jun 25 12:39:22 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 25 Jun 2009 18:39:22 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <628aabb70906250939l7d1116d0sec9efa2c16235c75@mail.gmail.com>

Hi Adlai,
Did the Bioperl tests run successfully? Did you get the impression that the
installation was successful?

If not, what are the errors you see during the install process?

I ask because the error you included in your message is not necessarily
indicative of a failed installation (it could just be a path issue).

By the way, as I think is indicated somewhere in the installation
instructions, you don't actually need to install Bioperl to use most of its
functionality. Simply having the Bio/ directory in your PERL5LIB path is
enough.


Dave


From cjfields at illinois.edu  Thu Jun 25 13:02:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 12:02:48 -0500
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
Message-ID: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>

On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:

> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something  
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not  
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence  
> inside
> an alignment?
>
> Thanks,
> Chase

 From working with feature/annotation-rich alignment formats such as  
stockholm I found this is one of the areas for Align that needs some  
rethinking. One way to work around this w/o major refactoring is to  
have a full-length SeqFeature (pointing to the proper LocatableSeq)  
that stores the Bio::Annotation.  I don't necessarily like that  
approach as a long-term solution, though, as it's a little hacky and  
indirect, but it might get you started (just mark it as TODO so we can  
catch it at some point).

For a long-term solution I don't think the answer is as simple as  
making LocatableSeq Bio::AnnotatableI; that would not be congruent  
with the PrimarySeq implementation (which is not AnnotatableI).   
LocatableSeq is supposed to represent a simple PrimarySeq that can be  
mapped to other sequences via start/end/strand, and thus inherits from  
both Bio::PrimarySeq (note lack of 'I') and RangeI.

Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and  
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the  
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow  
e.g. features.

I personally think option #2 is easiest, as this means anything that  
is-a PrimarySeq is also AnnotatableI, and it might not break past  
scripts.  Not sure how this would affect overall performance though.

chris


From me at miguel.weapps.com  Thu Jun 25 10:09:29 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Thu, 25 Jun 2009 16:09:29 +0200
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <94da4c880906250709j7b2cb78dk77710bd43e20fd42@mail.gmail.com>

Dear all,
Is there a way to run muscle silently via
Bio::Tools::Run::Alignment::Muscle?

Cheers,

-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]

+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From chmille4 at gmail.com  Thu Jun 25 13:57:25 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 13:57:25 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> 
	<3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
Message-ID: <991fb8210906251057i25bbe511r84f5d1319f191421@mail.gmail.com>

Ok, I'll use the full length SeqFeature for now and mark it with a TODO.
 Thanks for the help.
Chase

On Thu, Jun 25, 2009 at 1:02 PM, Chris Fields <cjfields at illinois.edu> wrote:

> On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:
>
>  Hi all,
>>
>> Quick question I came across while writing the Bio::Nexml module.
>>
>> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
>> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
>> SeqFeatures, but according to this HowTo (
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
>> considered to refer to a portion of a sequence, whereas something like
>> taxon
>> data would refer to the entire sequence and should be handled as an
>> annotation. However, as far as I can tell Bio::LocatableSeq does not
>> support
>> annotation objects.
>> What would be the best way to relate taxon data to a single sequence
>> inside
>> an alignment?
>>
>> Thanks,
>> Chase
>>
>
> From working with feature/annotation-rich alignment formats such as
> stockholm I found this is one of the areas for Align that needs some
> rethinking. One way to work around this w/o major refactoring is to have a
> full-length SeqFeature (pointing to the proper LocatableSeq) that stores the
> Bio::Annotation.  I don't necessarily like that approach as a long-term
> solution, though, as it's a little hacky and indirect, but it might get you
> started (just mark it as TODO so we can catch it at some point).
>
> For a long-term solution I don't think the answer is as simple as making
> LocatableSeq Bio::AnnotatableI; that would not be congruent with the
> PrimarySeq implementation (which is not AnnotatableI).  LocatableSeq is
> supposed to represent a simple PrimarySeq that can be mapped to other
> sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq
> (note lack of 'I') and RangeI.
>
> Three options:
> 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and
> Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
> 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the
> PrimarySeq AnnotationCollection).
> 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g.
> features.
>
> I personally think option #2 is easiest, as this means anything that is-a
> PrimarySeq is also AnnotatableI, and it might not break past scripts.  Not
> sure how this would affect overall performance though.
>
> chris
>


From Kevin.M.Brown at asu.edu  Thu Jun 25 14:54:19 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 11:54:19 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA08F@EX02.asurite.ad.asu.edu>

Please keep your replies on the list. 

> -----Original Message-----
> From: Adlai Burman [mailto:adlai at refenestration.com] 
> Sent: Thursday, June 25, 2009 11:39 AM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Extreme newbie question.
> 
> Thanks, Kevin.
> I did install everything using sudo. I will try again and pay  
> attention to the error log. I hope I did not introduce any conflicts  
> or weird path problems.
> 
> Adlai
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
> 
> > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >
> > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
> >
> > That error suggests that the install fails and you need to 
> figure out
> > why from the install error messages. I suspect you aren't doing the
> > install as root, but as a normal user who lacks the needed 
> permissions
> > to change files in certain directories.
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >> Adlai Burman
> >> Sent: Wednesday, June 24, 2009 7:09 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] Extreme newbie question.
> >>
> >> I have been trying to install BioPerl for a while now and after
> >> pummeling my hard drive (Mac OS 10.5 intel) with several 
> attempts at
> >> Fink installation, a >cpan installation and removing my .cpan
> >> folder I
> >> am still at square 0. I do not want to do anymore damage to my
> >> computer, yet I really need a working install (especially to
> >> interface
> >> with remote DBs like GenBank. Can anyone give me some advice here?
> >> After each attempt, I have tried to run perldoc bptutorial.pl and
> >> tried test scripts with "use Bio::Perl" in the headers and I just
> >> receive  error mesages like the following:
> >>
> >> Can't locate Bio/Perl.pm in @INC (@INC contains: 
> /home/users/dag/lib/
> >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
> >> /Library/
> >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
> >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
> >> /Network/Library/
> >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
> >> Network/Library/Perl 
> /System/Library/Perl/Extras/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/Extras/5.8.8 
> /Library/Perl/5.8.6 /
> >> Library/Perl/5.8.1 .) at trsh.pl line 1.
> >>
> >> I have been working from the OReilly book astering Perl for
> >> Bioinformatics and the INSTALL file and have scoured around the
> >> BioPerl website and am still stuck.
> >>
> >> Thanks in advance,
> >>
> >> Adlai
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> 
> 


From adlai at refenestration.com  Thu Jun 25 14:59:10 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 20:59:10 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
Message-ID: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>

Hey again, I'm right into trying to install again and I now get a new  
error:

Client not fully configured, please proceed with configuring.
  o conf init urllist

any ideas?

Adlai

On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:

> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>
> That error suggests that the install fails and you need to figure out
> why from the install error messages. I suspect you aren't doing the
> install as root, but as a normal user who lacks the needed permissions
> to change files in certain directories.
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Adlai Burman
>> Sent: Wednesday, June 24, 2009 7:09 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Extreme newbie question.
>>
>> I have been trying to install BioPerl for a while now and after
>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>> Fink installation, a >cpan installation and removing my .cpan
>> folder I
>> am still at square 0. I do not want to do anymore damage to my
>> computer, yet I really need a working install (especially to
>> interface
>> with remote DBs like GenBank. Can anyone give me some advice here?
>> After each attempt, I have tried to run perldoc bptutorial.pl and
>> tried test scripts with "use Bio::Perl" in the headers and I just
>> receive  error mesages like the following:
>>
>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/
>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>> /Library/
>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>> /Network/Library/
>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>
>> I have been working from the OReilly book astering Perl for
>> Bioinformatics and the INSTALL file and have scoured around the
>> BioPerl website and am still stuck.
>>
>> Thanks in advance,
>>
>> Adlai
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Thu Jun 25 16:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 15:07:44 -0500
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <F3802595-7617-4CD5-AC8A-2B67069BE001@illinois.edu>

That would mean, within the cpan shell, type 'o conf init  
urllist' (again, requires sudo).

chris

On Jun 25, 2009, at 1:59 PM, Adlai Burman wrote:

> Hey again, I'm right into trying to install again and I now get a  
> new error:
>
> Client not fully configured, please proceed with configuring.
> o conf init urllist
>
> any ideas?
>
> Adlai
>
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
>
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>>
>> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>>
>> That error suggests that the install fails and you need to figure out
>> why from the install error messages. I suspect you aren't doing the
>> install as root, but as a normal user who lacks the needed  
>> permissions
>> to change files in certain directories.
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Adlai Burman
>>> Sent: Wednesday, June 24, 2009 7:09 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Extreme newbie question.
>>>
>>> I have been trying to install BioPerl for a while now and after
>>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>>> Fink installation, a >cpan installation and removing my .cpan
>>> folder I
>>> am still at square 0. I do not want to do anymore damage to my
>>> computer, yet I really need a working install (especially to
>>> interface
>>> with remote DBs like GenBank. Can anyone give me some advice here?
>>> After each attempt, I have tried to run perldoc bptutorial.pl and
>>> tried test scripts with "use Bio::Perl" in the headers and I just
>>> receive  error mesages like the following:
>>>
>>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/ 
>>> lib/
>>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>>> /Library/
>>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>>> /Network/Library/
>>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin- 
>>> thread-
>>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>>
>>> I have been working from the OReilly book astering Perl for
>>> Bioinformatics and the INSTALL file and have scoured around the
>>> BioPerl website and am still stuck.
>>>
>>> Thanks in advance,
>>>
>>> Adlai
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 25 16:19:07 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 25 Jun 2009 21:19:07 +0100
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <4A43DBBB.2050109@sendu.me.uk>

Adlai Burman wrote:
> Hey again, I'm right into trying to install again and I now get a new 
> error:
> 
> Client not fully configured, please proceed with configuring.
>  o conf init urllist

Run cpan and do as it says.


From cjm at berkeleybop.org  Thu Jun 25 20:32:05 2009
From: cjm at berkeleybop.org (Chris Mungall)
Date: Thu, 25 Jun 2009 17:32:05 -0700
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
Message-ID: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>


I've written a module Bio::FeatureIO::seqont_owl, which generates  
Sequence Ontology compliant RDF/OWL. This will allow for example  
loading of GFF into triplestores and inference using OWL reasoners.

- It's experimental, fairly incomplete, and subject to change
- Relies on an experimental extension of SO
- Probably of interest to a minority of bp users
- It's not yet fully documented (but there will be a paper)
- It doesn't introduce any additional dependencies (all done via  
XML::Writer, which is already a dependency)
- Doesn't otherwise impinge on existing code

I'd like to get this under source control. Is the appropriate place  
for this:

- HEAD
- a branch
- bioperl-dev
- a separate repository

?

Cheers
Chris


From maj at fortinbras.us  Thu Jun 25 21:08:43 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 25 Jun 2009 21:08:43 -0400
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
Message-ID: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>

This sounds very Dev to me. Also cool.
MAJ
----- Original Message ----- 
From: "Chris Mungall" <cjm at berkeleybop.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 25, 2009 8:32 PM
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF


>
> I've written a module Bio::FeatureIO::seqont_owl, which generates  Sequence 
> Ontology compliant RDF/OWL. This will allow for example  loading of GFF into 
> triplestores and inference using OWL reasoners.
>
> - It's experimental, fairly incomplete, and subject to change
> - Relies on an experimental extension of SO
> - Probably of interest to a minority of bp users
> - It's not yet fully documented (but there will be a paper)
> - It doesn't introduce any additional dependencies (all done via  XML::Writer, 
> which is already a dependency)
> - Doesn't otherwise impinge on existing code
>
> I'd like to get this under source control. Is the appropriate place  for this:
>
> - HEAD
> - a branch
> - bioperl-dev
> - a separate repository
>
> ?
>
> Cheers
> Chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 25 21:35:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 20:35:06 -0500
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
	<7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
Message-ID: <12F203C3-689B-423E-9691-86EB1D500A7D@illinois.edu>

I agree.  Just to note, FeatureIO (even though it's in core) will be  
operated on at some future point to be simplified (and likely will  
move away from Bio::SF::Annotated).

chris

On Jun 25, 2009, at 8:08 PM, Mark A. Jensen wrote:

> This sounds very Dev to me. Also cool.
> MAJ
> ----- Original Message ----- From: "Chris Mungall" <cjm at berkeleybop.org 
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Thursday, June 25, 2009 8:32 PM
> Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
>
>
>>
>> I've written a module Bio::FeatureIO::seqont_owl, which generates   
>> Sequence Ontology compliant RDF/OWL. This will allow for example   
>> loading of GFF into triplestores and inference using OWL reasoners.
>>
>> - It's experimental, fairly incomplete, and subject to change
>> - Relies on an experimental extension of SO
>> - Probably of interest to a minority of bp users
>> - It's not yet fully documented (but there will be a paper)
>> - It doesn't introduce any additional dependencies (all done via   
>> XML::Writer, which is already a dependency)
>> - Doesn't otherwise impinge on existing code
>>
>> I'd like to get this under source control. Is the appropriate  
>> place  for this:
>>
>> - HEAD
>> - a branch
>> - bioperl-dev
>> - a separate repository
>>
>> ?
>>
>> Cheers
>> Chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rmb32 at cornell.edu  Fri Jun 26 00:27:55 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 25 Jun 2009 21:27:55 -0700
Subject: [Bioperl-l] BioPerl hackathon, hooray!
Message-ID: <4A444E4B.2000808@cornell.edu>

I'm pleased to announce a thoroughly climactic conclusion to the 
YAPC::NA 2009 BioPerl hackathon.

Between Jay Hannah (jhannah) and myself (rbuels), plus #bioperl virtual 
participant Bruno Vecchi (brunov), we SMASHED the HECK out of 6 bugs in 
the BioPerl Bugzilla.

Many thanks to the participants, let's do it again next year!

Rob


From jay at jays.net  Fri Jun 26 00:54:31 2009
From: jay at jays.net (Jay Hannah)
Date: Fri, 26 Jun 2009 00:54:31 -0400
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <4A444E4B.2000808@cornell.edu>
References: <4A444E4B.2000808@cornell.edu>
Message-ID: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>

On Jun 26, 2009, at 12:27 AM, Robert Buels wrote:
> I'm pleased to announce a thoroughly climactic conclusion to the  
> YAPC::NA 2009 BioPerl hackathon.

Feel free to check our work:

    http://github.com/rbuels/bioperl-live

:)

j
http://www.bioperl.org/wiki/User:Jhannah


From rahall2 at ualr.edu  Fri Jun 26 02:28:05 2009
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 26 Jun 2009 01:28:05 -0500
Subject: [Bioperl-l] Random nucleotide string generator?
Message-ID: <fc2dd7b3461f.4a442425@ualr.edu>

All,
 
Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere? 
 
I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
 
If not - what should the namespace be for such a module should it be undone and desirable? 
 
TIA!
 
Roger 
 
 
From David.Messina at sbc.su.se  Fri Jun 26 06:15:04 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 12:15:04 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com>

The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this
post from Neil Saunders' blog:
http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/


You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID
package, available here:
[ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>

<ftp://selab.janelia.org/pub/software/squid/>

If not - what should the namespace be for such a module should it be undone
> and desirable?


Perhaps add it to Bio::SeqUtils?


Dave


From David.Messina at sbc.su.se  Fri Jun 26 07:37:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 13:37:44 +0200
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
References: <4A444E4B.2000808@cornell.edu>
	<E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
Message-ID: <628aabb70906260437r18fc7543oc05761241fe810ff@mail.gmail.com>

Awesome, great work guys!
Thanks so much.


Dave


From David.Messina at sbc.su.se  Fri Jun 26 08:58:20 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 14:58:20 +0200
Subject: [Bioperl-l]  Random nucleotide string generator?
In-Reply-To: <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
References: <fc2dd7b3461f.4a442425@ualr.edu>
	<628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> 
	<1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
Message-ID: <628aabb70906260558k585f6700ycef271e7f26dd1a3@mail.gmail.com>

[Forwarding Bruno's reply.... -Dave]
---------- Forwarded message ----------
From: Bruno Vecchi <vecchi.b at gmail.com>
Date: Fri, Jun 26, 2009 at 14:44
Subject: Re: [Bioperl-l] Random nucleotide string generator?
To: Dave Messina <David.Messina at sbc.su.se>


Here's a little script that I used for a somewhat related task. It produces
a randomized version of an input sequence (thus keeping the original's
composition). Maybe you could adjust it to your needs; providing an input
sequence with the desired length and composition you should get what you
want.

#!perl
use List::Util qw(shuffle);
use Bio::SeqIO;

my ($seqfile, $number) = @ARGV;

my $in = Bio::SeqIO->new(-file => $seqfile);
my $fh = Bio::SeqIO->newFh(-format => 'fasta');

my $seq = $in->next_seq;
my @chars = split '', $seq->seq;

for my $i (1 .. $number) {
    @chars = shuffle @chars;
    my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars);
    print $fh $new_seq;
}

You can use it like this from the command line (assuming you want 20 output
sequences):

shuffle.pl input_sequence.fasta 20 > random_sequences.fasta

Bruno.

2009/6/26 Dave Messina <David.Messina at sbc.su.se>

> The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on
> this
> post from Neil Saunders' blog:
>
> http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/
>
>
> You can also do this outside of BioPerl using shuffle from Sean Eddy's
> SQUID
> package, available here:
> [ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>
>
> <ftp://selab.janelia.org/pub/software/squid/>
>
> If not - what should the namespace be for such a module should it be undone
> > and desirable?
>
>
> Perhaps add it to Bio::SeqUtils?
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Fri Jun 26 04:30:12 2009
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Fri, 26 Jun 2009 10:30:12 +0200 (CEST)
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <Pine.LNX.4.44.0906261028110.14978-100000@bibo.EMBL-Heidelberg.DE>

a non-bioperl option would be to use something external like seq-gen or 
similar - tools designed for outputing "random" sequences simulated over a 
tree - one could simply sample a single simulated sequence at random from 
the output alignment

On Fri, 26 Jun 2009, Roger Hall wrote:

> All,
>  Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>  
> I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
>  
> If not - what should the namespace be for such a module should it be undone and desirable? 
>  
> TIA!
>  
> Roger 
>  
>  
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
----------------------------------------------------------------------
Aidan Budd                                    tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

http://www.embl-heidelberg.de/~budd/
http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html


From me at miguel.weapps.com  Fri Jun 26 04:52:46 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Fri, 26 Jun 2009 10:52:46 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <94da4c880906260152k3a764951u6ea8a6fdfa3b7f2c@mail.gmail.com>

Dear all, dear Roger,
I'm not sure if there is such generator (I think so).  Anyway, if you flag
it as "undone and desirable", please take into account the possibility of
extend the generator for dinucleotides, particularly useful when working
with secondary structure of RNA molecules,

Cheers,

On Fri, Jun 26, 2009 at 8:28 AM, Roger Hall <rahall2 at ualr.edu> wrote:

> All,
>
> Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>
> I noticed a thread about it from 2000 and nothing since (searching for
> "random sequence").
>
> If not - what should the namespace be for such a module should it be undone
> and desirable?
>
> TIA!
>
> Roger
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]


+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From pri2darshini at gmail.com  Fri Jun 26 06:18:55 2009
From: pri2darshini at gmail.com (priya darshini)
Date: Fri, 26 Jun 2009 15:48:55 +0530
Subject: [Bioperl-l] bioperl installation
Message-ID: <7c569a160906260318t5611fdd8nd536ae5139f5b1d4@mail.gmail.com>

Respected Sir,
                    I am K.Lakshmi priya Darshini. My specialization is M.Sc
bioinformatics. I am interseted in learning bioperl. My operating system is
windows Vista. I have followed the steps to install bioperl as given by your
team in the bioperl tutorial. But i am getting the error message as *"Begin
failed".Sir please help me to continue with my installation further. I am
using 5.10 version of perl.Waithing for your reply.*
* thanking you.*
*                  *
**
*regards,*
*lakshmi priya darshini.*


From Jonathan.Moore at warwick.ac.uk  Fri Jun 26 05:55:54 2009
From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan)
Date: Fri, 26 Jun 2009 10:55:54 +0100
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>

I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML files at the TAIR FTP site.

I've tried SeqIO with both tigr and tigrxml formats but both are giving errors in 1.6.0.  Has anyone advice on whether it's likely to be doable, or should I wait til the .gb files are available?

Jay Moore


From fungazid at yahoo.com  Fri Jun 26 07:59:06 2009
From: fungazid at yahoo.com (Fungazid)
Date: Fri, 26 Jun 2009 04:59:06 -0700 (PDT)
Subject: [Bioperl-l] Bio::Assembly::IO
Message-ID: <57633.49243.qm@web65505.mail.ac4.yahoo.com>


Hello,

I received an ACE file containing newbler assembly of 454 cDNA reads, and a corresponding phd.ball file. I was able to view and manipulate the contigs in this assembly using Consed on linux. Consed required ~1.5GB RAM, and the assembly was loaded within ~2 min. 
I would like to parse the assembly within my code (preferentially in Perl, but not necessarily), to fetch all read sequences for each contig, nucleotide quality, alignment to consensus, etc. 
I am trying to use Bio::Assembly::IO , but it eats more than my entire RAM (3GB), and is extremely slow (~1 hour before it crashes).
Maybe you have an idea ?
In addition, do you maybe aware of other non-visual parsers of ACE assembly format for Perl or other languages

Many thanks,
funazid   


From cjfields at illinois.edu  Fri Jun 26 13:00:41 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 12:00:41 -0500
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <FEC1932A-49FE-4E63-9727-F08520FF0252@illinois.edu>

If there are errors this should be submitted as a bug.  You should  
attach example data to the report after it (e.g. don't copy&paste into  
the text box).

http://www.bioperl.org/wiki/Bugs

chris

On Jun 26, 2009, at 4:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From plantboy at gmail.com  Fri Jun 26 14:46:35 2009
From: plantboy at gmail.com (cody h)
Date: Fri, 26 Jun 2009 11:46:35 -0700
Subject: [Bioperl-l] test suite failing on mac os x 10.5
Message-ID: <320708320906261146v2e799c82mc1b921218fc233c5@mail.gmail.com>

Hi,

I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7.
The Build.PL file executes fine, but the test suite fails dramatically,
returning the error "No database selected" for many of the tests. All the
error calls seem to be originating from line 852 in
BasePersistenceAdaptor.pm. I took a look at the code but I could not figure
out why it wasn't working.

I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql
server. The dependencies all seem to be working, but I haven't used them
enough to completely verify this, so that could be part of the problem. I
don't know which ones to check though. Does anyone have any idea why I might
be getting these "No database selected" errors? Here is a sample of the
error messages given by the ./Build test command (note, this same error is
generated byt 15/16 test files)

t/12ontology.t .... 1/738
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: error while executing statement in
Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: t/12ontology.t:44
-----------------------------------------------------------
t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00)


From maj at fortinbras.us  Fri Jun 26 14:50:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 26 Jun 2009 14:50:02 -0400
Subject: [Bioperl-l] Fw: Inquiry about a prog written by [MAJ]
Message-ID: <0581B2DAE8514F418127D54407384905@NewLife>

Thought this should be archived to the list. 
MAJ

----- Original Message ----- 
From: Mark A. Jensen 
To: Ross KK Leung 
Sent: Thursday, June 25, 2009 8:46 AM
Subject: Re: Inquiry about a prog written by you


Hi Ross-
Yes, you can specify the recombinants, as "A/C/G[subtype]" in the query string. Unfortunately, the 10000 record limit is imposed by the Los Alamos site that my program accesses. You might be able to work around this if you're willing to write your own script using the BioPerl modules that are the basis for the hivq.PLS -- by using the modules to perform multiple queries, and collecting the the entire set of sequences over that series of queries. 
You might look at the documentation for the modules for ideas; try looking at http://www.bioperl.org/wiki/Module:Bio::DB::HIV and http://www.bioperl.org/wiki/Module:Bio::DB::Query::HIVQuery . 
best regards- 
Mark
  ----- Original Message ----- 
  From: Ross KK Leung 
  To: maj at fortinbras.us 
  Sent: Thursday, June 25, 2009 6:09 AM
  Subject: Inquiry about a prog written by you


  Dear Mark A. Jensen,

   
  A google search returns your program (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/DB-HIV/hivq.PLS)

   
  I wonder whether the program is able to search recombinants (e.g. B incl. recombinants) and retrieve results more than 50000 records. This limitation is a bottleneck by the web-based search.

   
  Thanks for your advice, Ross


From rmb32 at cornell.edu  Fri Jun 26 17:06:06 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Jun 2009 14:06:06 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
Message-ID: <4A45383E.40207@cornell.edu>

Reposting to bioperl list.

This is a really giant opportunity to expose some of the best 
technologists in the world to what we do in bioinformatics, and possibly 
to entice some of them to help us the heck out!  ;-)

Rob

On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
> University.  Can you offer any lecturer recommendations and could I 
> fill an entire multi day thread with BioPerl lectures?  I would also 
> like to "entice" MJD to come to YAPC with the use of BioPerl.
>
> Thanks for your thoughts.
>
> Heath Bair
> (Candybar)

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain.cshl at gmail.com  Fri Jun 26 17:12:37 2009
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 26 Jun 2009 17:12:37 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <D2A53AB2-E35A-499B-B81A-13B9D61752CA@gmail.com>

Cool--Columbus is just down the road.  I could give a talk (or even  
multiple talks) on a variety of GMOD topics (which I consider BioPerl  
related, since so much of what we do depends on BioPerl).

Scott

On Jun 26, 2009, at 5:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Fri Jun 26 17:49:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 16:49:39 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <642C6C93-8FCD-4463-8A39-E15832F8714C@illinois.edu>

Well, if it's in Columbus I'll be there (I can make a drive out of it).

In short, we should probably get something going, yes. Lots of things  
we can talk about, inc. bioperl6, Bio::Moose, etc.

chris

On Jun 26, 2009, at 4:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Fri Jun 26 23:59:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 26 Jun 2009 20:59:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <19013.39182.97468.604560@already.dhcp.gene.com>


This does seems like a great opportunity.  I think you/the-community
could put together at least a day, and maybe more, of Bio and Perl
stuff.  I think that it's important to range beyond the stuff that's
in the BioPerl namespace and pull in something from the Gene Ontology
project, the Ensembl project[s], maybe libbio, etc....

g.

Robert Buels writes:
 > Reposting to bioperl list.
 > 
 > This is a really giant opportunity to expose some of the best 
 > technologists in the world to what we do in bioinformatics, and possibly 
 > to entice some of them to help us the heck out!  ;-)
 > 
 > Rob
 > 
 > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > > I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > > University.  Can you offer any lecturer recommendations and could I 
 > > fill an entire multi day thread with BioPerl lectures?  I would also 
 > > like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >
 > > Thanks for your thoughts.
 > >
 > > Heath Bair
 > > (Candybar)
 > 
 > -- 
 > Robert Buels
 > Bioinformatics Analyst, Sol Genomics Network
 > Boyce Thompson Institute for Plant Research
 > Tower Rd
 > Ithaca, NY  14853
 > Tel: 503-889-8539
 > rmb32 at cornell.edu
 > http://www.sgn.cornell.edu
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From cjfields at illinois.edu  Sat Jun 27 00:28:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 23:28:14 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <19013.39182.97468.604560@already.dhcp.gene.com>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<19013.39182.97468.604560@already.dhcp.gene.com>
Message-ID: <EB3EB763-05F4-4F75-88F5-8A642E567ABA@illinois.edu>

Agree (and should add GMOD/Gbrowse to that as well).

chris

On Jun 26, 2009, at 10:59 PM, George Hartzell wrote:

>
> This does seems like a great opportunity.  I think you/the-community
> could put together at least a day, and maybe more, of Bio and Perl
> stuff.  I think that it's important to range beyond the stuff that's
> in the BioPerl namespace and pull in something from the Gene Ontology
> project, the Ensembl project[s], maybe libbio, etc....
>
> g.
>
> Robert Buels writes:
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best
>> technologists in the world to what we do in bioinformatics, and  
>> possibly
>> to entice some of them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would
>>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State
>>> University.  Can you offer any lecturer recommendations and could I
>>> fill an entire multi day thread with BioPerl lectures?  I would also
>>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sat Jun 27 00:56:41 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 00:56:41 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <E6D907E51B8D477FBB635ED4B500C257@NewLife>

I think BioPerl has enough to talk about to have its own conference, 
which would coincide with its 15th anniversary in 2010. That may 
put the kibosh on the original  intent of the inviter, which ultimately is 
to get The Dominus to bite (and more power to her, I say. My 
programming style is forever changed, and I haven't even finished
The Book). 

If someone organizes it, I'll bring the chips and dip.
MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Cc: <BAIRH at nationwide.com>
Sent: Friday, June 26, 2009 5:06 PM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


> Reposting to bioperl list.
> 
> This is a really giant opportunity to expose some of the best 
> technologists in the world to what we do in bioinformatics, and possibly 
> to entice some of them to help us the heck out!  ;-)
> 
> Rob
> 
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
>> University.  Can you offer any lecturer recommendations and could I 
>> fill an entire multi day thread with BioPerl lectures?  I would also 
>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
> 
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From maj at fortinbras.us  Sat Jun 27 01:30:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 01:30:34 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net><4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <B44649FB157145A3BE7153D163802926@NewLife>

[...to *him*, that is...pardon]

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Robert Buels" <rmb32 at cornell.edu>; "BioPerl List" 
<bioperl-l at lists.open-bio.org>
Sent: Saturday, June 27, 2009 12:56 AM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


>I think BioPerl has enough to talk about to have its own conference, which 
>would coincide with its 15th anniversary in 2010. That may put the kibosh on 
>the original  intent of the inviter, which ultimately is to get The Dominus to 
>bite (and more power to her, I say. My programming style is forever changed, 
>and I haven't even finished
> The Book).
> If someone organizes it, I'll bring the chips and dip.
> MAJ
> ----- Original Message ----- 
> From: "Robert Buels" <rmb32 at cornell.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Cc: <BAIRH at nationwide.com>
> Sent: Friday, June 26, 2009 5:06 PM
> Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
>
>
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best technologists 
>> in the world to what we do in bioinformatics, and possibly to entice some of 
>> them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would like to 
>>> have a "BioPerl" thread at YAPC::NA::2010 at Ohio State University.  Can you 
>>> offer any lecturer recommendations and could I fill an entire multi day 
>>> thread with BioPerl lectures?  I would also like to "entice" MJD to come to 
>>> YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kpclancy at hotmail.com  Sat Jun 27 06:04:20 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Sat, 27 Jun 2009 04:04:20 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <COL107-W978FB7B4A3E98561F84E5CE320@phx.gbl>


I think ismb will be in Boston in 2010 (feels odd just typing that...)

maybe that is enough of a running start to set something up.

kevin
 
> CC: jay at jays.net; vecchi.b at gmail.com; bioperl-l at bioperl.org
> From: cjfields at illinois.edu
> To: kpclancy at hotmail.com
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> Date: Wed, 24 Jun 2009 22:54:28 -0500
> 
> I have no idea; I don't think there are many bioperl devs attending 
> this year unfortunately. Any meetings in the next year where we could 
> set up a bioperl hackathon? I will likely be available to attend if 
> it's stateside...
> 
> chris
> 
> On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:
> 
> >
> > is there an intention to have a hackathon at ISMB this weekend - I 
> > know there is a 2 day BOSC
> > kevin
> >
> >> From: cjfields at illinois.edu
> >> To: jay at jays.net
> >> Date: Wed, 24 Jun 2009 16:10:34 -0500
> >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> >>
> >>
> >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> >>
> >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >>>> Let me know if anyone needs collab on biomoose on github; Mark
> >>>> Jensen's already added.
> >>>
> >>> Anything on github should be trivial, even with no perms -- we can
> >>> just fork and then send you (whoever) pull requests. github++ :)
> >>>
> >>>> 1) Any help towards bugzilla fixes would be most welcome.
> >>>
> >>> I don't know how to make any progress in bugzilla if no one has a
> >>> commit bit...?
> >>
> >> For some reason I thought you had a commit bit; we can add you in if
> >> needed. Anyway, patches are most definitely welcome ;>
> >>
> >>>> 2) Better GFF3 integration
> >>>> 3) Typed but lightweight seqfeatures
> >>>
> >>> Are there bugzilla tickets (or somewhere) describing those?
> >>
> >> No as the issues are more complex than one single bug, but we do have
> >> something to help track for the time being:
> >>
> >> http://www.bioperl.org/wiki/GFF_Refactor
> >> http://www.bioperl.org/wiki/Align_Refactor
> >>
> >> I'll probably file TODOs during the process for those refactors. The
> >> easiest to tackle would be probably be Align/LocatableSeq refactors.
> >>
> >>> I wonder if anyone can help me get out of sporadic MailMan
> >>> purgatory...
> >>>
> >>> Thanks,
> >>>
> >>> j
> >>
> >> -c
> >>
> >> PS - Don't feel constrained by the above. There are many many areas
> >> to contribute to.
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hartzell at alerce.com  Sat Jun 27 13:08:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 27 Jun 2009 10:08:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <19014.20986.867646.940277@already.dhcp.gene.com>


I had an eye-opening time at YAPC, and I think that it would be very
powerful to have many members of the Bio & Perl community rubbing
elbows with the folks leading (and following, for that matter) the
"Modern Perl" movement (in the broader sense, not _just_ chromatic):
Moose, DBIx::Class, Dist::Zilla, KiokoDB, etc....  I think that it
would help pull BioPerl and the others towards powerful mainstream
technologies and expose many of us to new people, tricks, and tools.
Having us off on our own, or mingling with ISMB'ers, doesn't really
stir the pot.

g.


Mark A. Jensen writes:
 > I think BioPerl has enough to talk about to have its own conference, 
 > which would coincide with its 15th anniversary in 2010. That may 
 > put the kibosh on the original  intent of the inviter, which ultimately is 
 > to get The Dominus to bite (and more power to her, I say. My 
 > programming style is forever changed, and I haven't even finished
 > The Book). 
 > 
 > If someone organizes it, I'll bring the chips and dip.
 > MAJ
 > ----- Original Message ----- 
 > From: "Robert Buels" <rmb32 at cornell.edu>
 > To: "BioPerl List" <bioperl-l at lists.open-bio.org>
 > Cc: <BAIRH at nationwide.com>
 > Sent: Friday, June 26, 2009 5:06 PM
 > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
 > 
 > 
 > > Reposting to bioperl list.
 > > 
 > > This is a really giant opportunity to expose some of the best 
 > > technologists in the world to what we do in bioinformatics, and possibly 
 > > to entice some of them to help us the heck out!  ;-)
 > > 
 > > Rob
 > > 
 > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > >> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > >> University.  Can you offer any lecturer recommendations and could I 
 > >> fill an entire multi day thread with BioPerl lectures?  I would also 
 > >> like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >>
 > >> Thanks for your thoughts.
 > >>
 > >> Heath Bair
 > >> (Candybar)
 > > 
 > > -- 
 > > Robert Buels
 > > Bioinformatics Analyst, Sol Genomics Network
 > > Boyce Thompson Institute for Plant Research
 > > Tower Rd
 > > Ithaca, NY  14853
 > > Tel: 503-889-8539
 > > rmb32 at cornell.edu
 > > http://www.sgn.cornell.edu
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > > 
 > >
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From richard.harrison at edinburgh.ac.uk  Mon Jun 29 18:43:54 2009
From: richard.harrison at edinburgh.ac.uk (Richard Harrison)
Date: Mon, 29 Jun 2009 23:43:54 +0100
Subject: [Bioperl-l] PopGen
Message-ID: <5FBB6056-386D-42E3-8236-1FEB8F5BE520@edinburgh.ac.uk>

Dear all,

I am having trouble with the PopGen modules and I was wondering if  
anyone had any ideas.

I am working with polymorphism data. I am trying to identify the  
derived vs ancestral allele between two species. I have been modifying  
the modules a bit to include different site models etc.  Here is where  
I fall over:

Within aln_to_population I can create a modified Genotype object to  
include details of the ancestral allele (see at end of this post).

However,  the problem that I have hit upon is that aln_to_population  
returns a population object, filled with IndividualI objects.  In  
other words, it takes my array of GenotypeI objects and converts them  
into IndividualI objects, wrapped in a single Population object.  This  
means that the information in the GenotypeI object about the ancestral/ 
derived states is lost. How can I overcome this?


Thanks,
Richard


###excerpt from aln_to_population


  $inds[$i]->add_Genotype(Bio::PopGen::Genotype->new
					   (-marker_name  => $nm,
					    -individual_id=> $inds[$i]->unique_id,
					    -alleles      => [$genotypes[$i]],
					    -outgroup      => $outgroup[0]));


###excerpt from Genotypes.pm

sub new {
   my($class, at args) = @_;

   my $self = $class->SUPER::new(@args);
   my ($name,$desc,$type,$uid,$af,$og) = $self->_rearrange([qw(NAME
							  DESCRIPTION
							  TYPE
							  UNIQUE_ID
							  ALLELE_FREQ
							  OUTGROUP)], at args);
   $self->{'_allele_freqs'} = {};
   $self->{'_outgroup_name'} = {};

   if( ! defined $uid ) {
       $uid = $UniqueCounter++;
   }
   if( defined $name) {
       $self->name($name);
   } else {
       $self->throw("Must provide a name when initializing a Marker");
   }
   defined $desc && $self->description($desc);
   defined $type && $self->type($type);


       $self->outgroup_name($og);


   $self->unique_id($uid);

   return $self;
}

=head2 og
  Title   : name
  Usage   : my $name = $marker->og();
  Function: Get the name of the outgroup
  Returns : string representing the name of the marker
  Args    : [optional] name


=cut

sub outgroup_name{
     my $self = shift;

     return $self->{'_outgroup_name'} = shift if @_;
     return $self->{'_outgroup_name'};
}


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Tue Jun 30 01:03:08 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 29 Jun 2009 22:03:08 -0700
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <E6D82027-AF55-4E64-BC8F-71F3F60D0E7E@bioperl.org>

There are several flavors of TIGR XML for rice and arabidoposis, and  
other projects etc, I don't know which is tracked with the current  
tigrxml version unfortunately but one can compare the test files in t/ 
data to the versions downloaded to see what is currently supported.   
Usually the gbk will be more consistently parseable but we can try and  
work it out if it is a sensible transformation.

On Jun 26, 2009, at 2:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From paola.bisignano at gmail.com  Tue Jun 30 05:12:49 2009
From: paola.bisignano at gmail.com (Paola Bisignano)
Date: Tue, 30 Jun 2009 11:12:49 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25
In-Reply-To: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
Message-ID: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>

Hi,
I need a little help, to parse a file, but I tried to search some
modules of bioperl, but there are a lot, and I don't know how to
start, I find moduls for all db, for different web site, but not for
my favorite PDBsum....so I parsed a lot of thing on my own, even if I
was new in learning perl....but now I'm waiting for help...because I
need to parse a FASTA file, resulted from aligned sequences...I need
to extract the aligned sequences, only for the pdb in my lista....


my fasta file is like:

Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
  1>>>Sequence 3e7e:A - 333 aa
Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
17840403 residues in 79353 sequences

       opt      E()
< 20   286     0:===
  22     1     0:=          one = represents 135 library sequences
  24     1     0:=
  26     0     2:*
  28    21    18:*
  30    36   109:*
  32   237   421:== *
  34   956  1140:========*
  36  1924  2342:===============  *
  38  3591  3871:=========================== *
  40  4904  5400:=====================================  *
  42  6750  6600:================================================*=
  44  7145  7281:=====================================================*
  46  8047  7416:======================================================*=====
.........

>>2np8:A                                                  (159 aa)
 initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
overlap (59-204:13-153)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                                 ::
2np8:A                                               QWALEDFEIGRPLG
                                                             10

               70          80        90         100        110
Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
           20        30        40        50        60        70

         120         130       140       150       160       170
Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
             80        90       100            110       120

           180       190        200       210       220       230
Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
       : ::::.:..::      ::: : . :.: :.
2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
       130             140       150

            240       250       260       270       280       290
Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP

            300       310       320       330
Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

>>2ojg:A                                                  (337 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:1-204)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2ojg:A                                              FDVGPRYTNLSYI-G
                                                            10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
           20        30         40        50             60

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
       70        80        90        100       110       120

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
       130       140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
            190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
            250       260       270       280       290       300

2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
            310       320       330

>>2oji:A                                                  (344 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:5-208)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2oji:A                                          RGQVFDVGPRYTNLSYI-G
                                                        10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
       20        30        40         50             60        70

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
             80        90        100       110       120       130

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
             140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
        190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
        250       260       270       280       290       300

2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
        310       320       330       340

.......
I show a part of the file...if I want for example only that two
alignment? are there moduls to parse...because I've tried to parse
whit regex but....without results :-(....
If anyone has suggestion for muduls or anything else, I'll be very
happy to learn
thanks
Paola


From giles.weaver at googlemail.com  Tue Jun 30 07:28:25 2009
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Tue, 30 Jun 2009 12:28:25 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>

I'm developing a transcriptomics database for use with next-gen data, and
have found processing the raw data to be a big hurdle.

I'm a bit late in responding to this thread, so most issues have already
been discussed. One thing that hasn't been mentioned is removal of adapters
from raw Illumina sequence. This is a PITA, and I'm not aware of any well
developed and documented open source software for removal of adapters (and
poor quality sequence) from Illumina reads.

My current Illumina sequence processing pipeline is an unholy mix of
biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting
the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure
perl to trim the poor quality sequence from each read, and bioperl with
emboss to remove the adapter sequence. I'm aware that the pipeline contains
bugs and would like to simplify it, but at least it does work...

Ideally I'd like to replace as much of the pipeline as possible with
bioperl/bioperl-run, but this isn't currently possible due to both a lack of
features and poor performance. I'm sure the features will come with time,
but the performance is more of a concern to me. I wonder if Bio::Moose might
be used to alleviate some of the performance issues? Might next-gen modules
be an ideal guinea pig for Bio::Moose?

For my purposes the tools that would love to see supported in
bioperl/bioperl-run are:

   - next-gen sequence quality parsing (to output phred scores)
   - sequence quality based trimming
   - sequencing adapter removal
   - filtering based on sequence complexity (repeats, entropy etc)
   - bioperl-run modules for bowtie etc.

Obviously all of these need to be fast!
I'd love to muck in, but I doubt I'll contribute much before
Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares!

Regarding trimming bad quality bases (see comments from Tristan Lefebure)
from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be
much faster than a primarily bioperl based implementation. I found
Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My
current code trims ~1300 sequences/second, including unzipping the raw data
and converting it to sanger fastq with biopython. Processing an entire
sequencing run with the whole pipeline takes in the region of 6-12h.

Hope this looooong post was of interest to someone!

Giles

2009/6/17 Tristan Lefebure <tristan.lefebure at gmail.com>

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
>
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan
>


From manchunjohn-ma at uiowa.edu  Tue Jun 30 12:17:08 2009
From: manchunjohn-ma at uiowa.edu (John M.C. Ma)
Date: Tue, 30 Jun 2009 11:17:08 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker crashes perl
Message-ID: <5486b2980906300917m20e8cd06sbaee207aed3a27c9@mail.gmail.com>

Hi everyone,

(OS: OpenSuSE 11.1, Versions: Perl:v5.10.0-i586-linux-thread-multi,
Bioperl: 1.6.0-cpan, Bioperl-run: 1.6.1-cpan, Ensembl: Ver 54-cvs)

This is the first time I use Bio::Tools::Run::RepeatMasker, and it
came with a strange crash that I can't think of a reason. I would
rather think it's my problem?

My code involved pulling a sequence from Ensembl-variation, put it
into a PrimarySeq Object and run RepeatMasker on it:

use strict;
use warnings;
use Bio::SeqIO;
use Bio::PrimarySeq;
use Bio::Tools::Run::RepeatMasker;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Variation::Variation;
[snips most Ensembl code as the sequence itself looks OK]
	my $ref_allele=$snp_obj->five_prime_flanking_seq.${$snp_obj->get_all_Alleles}[0]->allele.$snp_obj->three_prime_flanking_seq;
	my $mask_seq=Bio::PrimarySeq->new (-seq=>$ref_allele);
	my $rmasker_handle=Bio::Tools::Run::RepeatMasker->new(-species=>'rat',-noisy=>"1");
	my @masked_features=$rmasker_handle->run($mask_seq);
	my $masked_seq=$rmasker_handle->run;

And when I let the wrapper run, perl crashed with these warnings:

--------------------- WARNING ---------------------
MSG: RepeatMasker didn't find any repetitive sequences

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/EWLAmIVymd/wByClB8iqr.masked: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::Root::IO::_initialize_io
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/IO.pm:310
STACK: Bio::SeqIO::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:450
STACK: Bio::SeqIO::fasta::_initialize
/usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO/fasta.pm:81
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:347
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:373
STACK: Bio::Tools::Run::RepeatMasker::_run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:320
STACK: Bio::Tools::Run::RepeatMasker::run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:260
STACK: main::SeqList
/home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:40
STACK: /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:63
-----------------------------------------------------------

What could happen?

Cheers,

John Ma,
University of Iowa


From cjfields at illinois.edu  Tue Jun 30 13:46:27 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 12:46:27 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
Message-ID: <6723B5A0-9A21-4851-BD88-0BA3CC107439@illinois.edu>


On Jun 30, 2009, at 6:28 AM, Giles Weaver wrote:

> I'm developing a transcriptomics database for use with next-gen  
> data, and
> have found processing the raw data to be a big hurdle.
>
> I'm a bit late in responding to this thread, so most issues have  
> already
> been discussed. One thing that hasn't been mentioned is removal of  
> adapters
> from raw Illumina sequence. This is a PITA, and I'm not aware of any  
> well
> developed and documented open source software for removal of  
> adapters (and
> poor quality sequence) from Illumina reads.
>
> My current Illumina sequence processing pipeline is an unholy mix of
> biopython, bioperl, pure perl, emboss and bowtie. Biopython for  
> converting
> the Illumina fastq to Sanger fastq, bioperl to read the quality  
> values, pure
> perl to trim the poor quality sequence from each read, and bioperl  
> with
> emboss to remove the adapter sequence. I'm aware that the pipeline  
> contains
> bugs and would like to simplify it, but at least it does work...

My local bioperl is working with FASTQ parsing of Sanger and Illumina  
(but not solexa yet).  I'll commit what I have today, and we should be  
able to add in solexa soon.  We'll also need to add in write_seq  
support.

> Ideally I'd like to replace as much of the pipeline as possible with
> bioperl/bioperl-run, but this isn't currently possible due to both a  
> lack of
> features and poor performance. I'm sure the features will come with  
> time,
> but the performance is more of a concern to me. I wonder if  
> Bio::Moose might
> be used to alleviate some of the performance issues? Might next-gen  
> modules
> be an ideal guinea pig for Bio::Moose?

We should get FASTQ working in core first then optimize on speed (as  
Elia previously pointed out).  We can do that within the actual SeqIO  
parser using a few simple tricks. For instance my local  
Bio::SeqIO::fastq has a reconfigured next_seq to call an iterator that  
returns raw processed data as a simple hash ref; users have access to  
that method, so if one wanted they could retrieve the raw data  
directly, or pass it through a filter that only creates seq instances  
one wants on the fly (that would be where your quality checks, adaptor  
modification, etc. fit in).

In the end it might be to wrap a C/C++-based solution for speed.  As  
mentioned previously a C-based parser exists from Sanger Centre that  
we could incorporate in some fashion, but I would like if it were able  
to report back file position for fast indexing.  The code is fairly  
simple so it should be too hard to incorporate that in somehow.

Just so there is no confusion, Bio::Moose is an attempt to both lay  
out plans for perl6 and deal with inheritance issues within bioperl  
now. It's still in very early development and may not see a release  
until Dec. at the very earliest, it will be an alpha release then, and  
likely won't have every major class represented at that point.  It's  
also not intended to be backwards-compatible with bioperl core.  It  
may help, but that's not an absolute certainty.  As for bioperl6, it  
will be pre-alpha until perl6 spec reaches a stable draft and we have  
an active implementation.

> For my purposes the tools that would love to see supported in
> bioperl/bioperl-run are:
>
>   - next-gen sequence quality parsing (to output phred scores)
>   - sequence quality based trimming
>   - sequencing adapter removal
>   - filtering based on sequence complexity (repeats, entropy etc)
>   - bioperl-run modules for bowtie etc.
>
> Obviously all of these need to be fast!
> I'd love to muck in, but I doubt I'll contribute much before
> Bio::Moose/bioperl6, as the (bio)perl object system gives me  
> nightmares!

One can only read a file so fast (even with a highly optimized C/C++  
based parser), but I don't think that will be the limiting factor as  
much as object instantiation.

> Regarding trimming bad quality bases (see comments from Tristan  
> Lefebure)
> from Solexa/Illumina reads, I did find a mixed pure/bioperl solution  
> to be
> much faster than a primarily bioperl based implementation. I found
> Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow.  
> My
> current code trims ~1300 sequences/second, including unzipping the  
> raw data
> and converting it to sanger fastq with biopython. Processing an entire
> sequencing run with the whole pipeline takes in the region of 6-12h.

Right, hence coming up with a 'pre-filter' for raw data (hash refs)  
prior to object instantiation to speed things up.  This will be a bit  
easier with Bio::Moose as we can introspect attributes via the meta  
class, but this will be a while yet.

> Hope this looooong post was of interest to someone!
>
> Giles

It's always good to hear about such issues and what one expects.

chris


From cjfields at illinois.edu  Tue Jun 30 17:58:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 16:58:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <A9776DF4-CE78-4973-9ADC-7594A3DAA118@illinois.edu>

All,

I have committed the first run at adding Illumina/Solexa parsing for  
FASTQ along with tests.  It's very possible the quality scores are  
off, particularly for Solexa (Illumina 1.0), so test away and let me  
know if anything pops up (should be a quick fix).  Along with that is  
a small commit to Bio::SeqIO so that we can add format variants (see  
below for an example).  write_seq/write_qual/write_fastq will likely  
not work as expected as I haven't touched them; they are to be tackled  
next.

For faster parsing I have also added a next_dataset method that  
returns a hash reference to the parsed data instead of an object; this  
hash includes quality scores.  This method is called by next_seq and  
the relevant data is passed in to the sequence factory directly; one  
could do something like the following to filter sequences as needed:

use Modern::Perl;
use Bio::SeqIO;
use Bio::Seq::SeqFactory;

my $file = shift;

# same as (-format   => 'fastq', -variant => 'illumina')
my $in = Bio::SeqIO->new(-file     => $file,
                          -format   => 'fastq-illumina');

my $factory = Bio::Seq::SeqFactory->new(-type => 'Bio::Seq::Quality');

while (my $data = $in->next_dataset) {
     next if seq_is_crap($data);
     my $seq = $factory->create(%$data);
}

sub seq_is_crap { # filter here
}


chris


From maj at fortinbras.us  Tue Jun 30 21:41:16 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 30 Jun 2009 21:41:16 -0400
Subject: [Bioperl-l] Parsing a FASTA file (Was:  Bioperl-l Digest, Vol 74,
	Issue 25)
In-Reply-To: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
	<e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>
Message-ID: <9D386274308C4DF98E38918477801541@NewLife>

Hi Paola, 

You want to try Bio::SearchIO, I think. It's not quite clear what you 
want to do, but here's an example of what you can do: 

Get all high-scoring pairs ( the mini-alignments ) involving
the database sequence called "2ojg:A"--

 use Bio::SearchIO;
 
 my $io = Bio::SearchIO->new(-format=>'fasta', -file=>'yourfile.fasta');
 my $result = $io->next_result;
 my @desired_hsps;

 while ( my $hit = $result->next_hit ) {
   push @desired_hsps, grep { $_->subject->seq_id =~ /2ojg:A/ } $hit->hsps;
 }
 
 # now all your desired hsps are in the array @desired_hsps;
 # you can get Bio::SimpleAlign objects from them all, for example:
 my @aligns = map { $_->get_aln } @desired_hsps;
 #...and lots of other things...

Look at http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
and http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods 
for a nice introduction to the Bio::SearchIO system by its authors. They 
use a blast output as an example, but everything applies to fasta output 
as well.

You didn't waste your time writing regexps, by the way. For a Perl
student, that kind of work is like money in the bank.

cheers, 
Mark
      

----- Original Message ----- 
From: "Paola Bisignano" <paola.bisignano at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 30, 2009 5:12 AM
Subject: Re: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25


> Hi,
> I need a little help, to parse a file, but I tried to search some
> modules of bioperl, but there are a lot, and I don't know how to
> start, I find moduls for all db, for different web site, but not for
> my favorite PDBsum....so I parsed a lot of thing on my own, even if I
> was new in learning perl....but now I'm waiting for help...because I
> need to parse a FASTA file, resulted from aligned sequences...I need
> to extract the aligned sequences, only for the pdb in my lista....
> 
> 
> my fasta file is like:
> 
> Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
>  1>>>Sequence 3e7e:A - 333 aa
> Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
> 17840403 residues in 79353 sequences
> 
>       opt      E()
> < 20   286     0:===
>  22     1     0:=          one = represents 135 library sequences
>  24     1     0:=
>  26     0     2:*
>  28    21    18:*
>  30    36   109:*
>  32   237   421:== *
>  34   956  1140:========*
>  36  1924  2342:===============  *
>  38  3591  3871:=========================== *
>  40  4904  5400:=====================================  *
>  42  6750  6600:================================================*=
>  44  7145  7281:=====================================================*
>  46  8047  7416:======================================================*=====
> .........
> 
>>>2np8:A                                                  (159 aa)
> initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
> Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
> overlap (59-204:13-153)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                                 ::
> 2np8:A                                               QWALEDFEIGRPLG
>                                                             10
> 
>               70          80        90         100        110
> Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
>       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
> 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
>           20        30        40        50        60        70
> 
>         120         130       140       150       160       170
> Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
>        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
> 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
>             80        90       100            110       120
> 
>           180       190        200       210       220       230
> Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
>       : ::::.:..::      ::: : . :.: :.
> 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
>       130             140       150
> 
>            240       250       260       270       280       290
> Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP
> 
>            300       310       320       330
> Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
>>>2ojg:A                                                  (337 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:1-204)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2ojg:A                                              FDVGPRYTNLSYI-G
>                                                            10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>           20        30         40        50             60
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>       70        80        90        100       110       120
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>       130       140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>            190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>            250       260       270       280       290       300
> 
> 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
>            310       320       330
> 
>>>2oji:A                                                  (344 aa)
> initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:5-208)
> 
>               10        20        30        40        50        60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
>                                                    :..: . . . .. :
> 2oji:A                                          RGQVFDVGPRYTNLSYI-G
>                                                        10
> 
>               70        80        90        100       110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
>       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
> 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
>       20        30        40         50             60        70
> 
>     120              130       140       150       160       170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
>       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
> 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
>             80        90        100       110       120       130
> 
>            180       190       200        210       220        230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
>       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
> 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
>             140            150       160       170       180
> 
>              240       250       260       270       280       290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
>       ..: .. .:: ..:.  .  ::
> 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
>        190       200       210       220       230       240
> 
>              300       310       320       330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
> 
> 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
>        250       260       270       280       290       300
> 
> 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
>        310       320       330       340
> 
> .......
> I show a part of the file...if I want for example only that two
> alignment? are there moduls to parse...because I've tried to parse
> whit regex but....without results :-(....
> If anyone has suggestion for muduls or anything else, I'll be very
> happy to learn
> thanks
> Paola
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Tue Jun 30 23:48:11 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 22:48:11 -0500
Subject: [Bioperl-l] FASTQ output
Message-ID: <A6217D90-4861-4EEB-B2D8-F3565B81EB4B@illinois.edu>

I am working on FASTQ output and noticed a real oddity.  Apparently,  
there are three write_* methods for this module, with the odd choice  
of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ.   
write_qual() writes Qual format:

http://www.bioperl.org/wiki/Qual_sequence_format

and write_fastq() writes FASTQ.  Now, maybe it's just me, but I think  
an implementation of write_seq() for a specific format should probably  
output that format and not something else entirely unexpected.  Also,  
is there a reason for duplicating output code for qual and FASTA  
output within Bio::SeqIO::fastq, i.e. should we call Bio::SeqIO::fasta/ 
qual instead?

I would consider the write_seq() issue a bug, the others are really  
just maintenance issues.  Anyone have problems with me changing that  
up a bit?

chris


From upgrade32009 at live.com  Mon Jun 29 20:07:57 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:07:57 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780056@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From upgrade32009 at live.com  Mon Jun 29 20:10:43 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:10:43 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780088@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From Jonas_Schaer at gmx.de  Sun Jun 28 06:15:18 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Sun, 28 Jun 2009 12:15:18 +0200
Subject: [Bioperl-l] different results with remote-blast skript
Message-ID: <D6BA00577BC94BDFAB04DF5EF43E9598@jonas>

Hi again :)
please, I only have this little question:
why do I get different results with my remote::blast perl skript then on the ncbi blast homepage?
I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'.
PLEASE help me,
thank you in advance,
Jonas

ps: my skript:
################################################################################
use Bio::Seq::SeqFactory;
  use Bio::Tools::Run::RemoteBlast;
  use strict;
  my @blast_report;
  my $prog = 'blastp';
  my $db   = 'nr';
  my $e_val= '1e-10';
  #my $e_val= '10';
  my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
 $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
  
  my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE';
  #$v is just to turn on and off the messages
  my $v = 1;
  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');   
  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); 
  my $filename='temp2.out';
  my $r = $factory->submit_blast($seq);
  print STDERR "waiting..." if( $v > 0 );
    while ( my @rids = $factory->each_rid ) 
    {
        foreach my $rid ( @rids ) 
        {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) 
            {
                if( $rc < 0 ) 
                {
                    $factory->remove_rid($rid);
                }
                print STDERR "." if ( $v > 0 );
            } 
                else    
                {
                    my $result = $rc->next_result();
                    $factory->save_output($filename);
                    $factory->remove_rid($rid);
                    print "\nQuery Name: ", $result->query_name(), "\n";
                    while ( my $hit = $result->next_hit ) 
                    {
                        next unless ( $v > 0);
                        print "\thit name is ", $hit->name, "\n";
                        while( my $hsp = $hit->next_hsp ) 
                        {
                            print "\t\tscore is ", $hsp->score, "\n";
                        }
                    }
                }
        }
   
    
    }
@blast_report = get_file_data ($filename);
return @blast_report;
##################################################################################


From stevey_mac2k2 at hotmail.com  Sun Jun 28 06:53:04 2009
From: stevey_mac2k2 at hotmail.com (stephenmcgowan1)
Date: Sun, 28 Jun 2009 03:53:04 -0700 (PDT)
Subject: [Bioperl-l]  Installing Bioperl on Mac OS X 10.5.7
Message-ID: <24240541.post@talk.nabble.com>


Hi,

I'm new to the mac way of working and programming aswell as the UNIX
(Terminal) environment. I will describe in as much detail as i can as to
what i have done so far in terms of bioperl installation and try to describe
what my problem is.

Ok so first of all i have downloaded and extracted the files BioPerl-1.6.0
and BioPerl-db-1.6.0 from the site. I have these two folders saved in a
folder on my OSX desktop called "ExerciseTwo".

After doing this, i open up Terminal and locate BioPerl-1.6.0.

i then run:

perl Build.PL (i have also tried sudo perl Build.pl)

i then run ./Build test (again tried this with sudo ./Build test)

after running the build test, i receive the feedback:

Failed Test                              Stat Wstat Total Fail  Failed  List
of Failed
-------------------------------------------------------------------------------
t/AlignIO/AlignIO.t                    255 65280    28   42 150.00%  8-28
t/AlignIO/arp.t                         255 65280    48   92 191.67%  3-48
t/Annotation/Annotation.t          255 65280   159   83  52.20%  9 117
119-159
t/ClusterIO/SequenceFamily.t    255 65280    19   34 178.95%  3-19
t/LocalDB/Flat.t                       255 65280    24   20  83.33%  15-24
t/LocalDB/Index.t                     255 65280    64   66 103.12%  32-64
t/RemoteDB/BioFetch.t              255 65280    36    2   5.56%  36
t/RemoteDB/DB.t                      3   768   113   59  52.21%  83-113
t/RemoteDB/EUtilities.t              1   256   309    1   0.32%  307
t/SeqIO/Handler.t                     255 65280   550 1098 199.64%  2-550
t/SeqIO/chaos.t                        1   256     8    1  12.50%  1
t/SeqIO/swiss.t                        255 65280   240  479 199.58%  1-240
t/SeqTools/GuessSeqFormat.t          1   256    49    2   4.08%  25 50
t/Tools/Analysis/Protein/ELM.t     255 65280    15   22 146.67%  5-15
t/Tools/Analysis/Protein/Scansite  255 65280    14   20 142.86%  5-14
t/Tools/Run/WrapperBase.t            1   256    27    1   3.70%  20
44 tests and 250 subtests skipped.
Failed 16/318 test scripts, 94.97% okay. 1015/15518 subtests failed, 93.46%
okay

Ok so going off this i then decide to run the install: ./Build install

This is a segment of the info i receive back in Terminal after the install:

Manifying blib/script/bp_pairwise_kaks.pl ->
blib/bindoc/bp_pairwise_kaks.pl.1
Manifying blib/script/bp_seqret.pl -> blib/bindoc/bp_seqret.pl.1
Manifying blib/script/bp_seq_length.pl -> blib/bindoc/bp_seq_length.pl.1
Manifying blib/script/bp_query_entrez_taxa.pl ->
blib/bindoc/bp_query_entrez_taxa.pl.1
Manifying blib/script/bp_load_gff.pl -> blib/bindoc/bp_load_gff.pl.1
Manifying blib/script/bp_fastam9_to_table.pl ->
blib/bindoc/bp_fastam9_to_table.pl.1
Manifying blib/script/bp_process_wormbase.pl ->
blib/bindoc/bp_process_wormbase.pl.1
Manifying blib/script/bp_nrdb.pl -> blib/bindoc/bp_nrdb.pl.1
Manifying blib/script/bp_composite_LD.pl -> blib/bindoc/bp_composite_LD.pl.1
Manifying blib/script/bp_classify_hits_kingdom.pl ->
blib/bindoc/bp_classify_hits_kingdom.pl.1
Manifying blib/script/bp_blast2tree.pl -> blib/bindoc/bp_blast2tree.pl.1
Manifying blib/script/bp_heterogeneity_test.pl ->
blib/bindoc/bp_heterogeneity_test.pl.1
Manifying blib/script/bp_generate_histogram.pl ->
blib/bindoc/bp_generate_histogram.pl.1
Manifying blib/script/bp_process_gadfly.pl ->
blib/bindoc/bp_process_gadfly.pl.1
mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

now these bp_files such as bp_nrdb.pl should be installed onto my Unix
somewhere? but i'm not sure if the install has worked, and these files saved
to the made directory, as is the case here:

mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

is there something wrong with my install? i think /usr/local/share should be
created and then all of these bp_files should go into this folder. Is there
anything that i'm doing wrong here?

Thanks

Stephen.


-- 
View this message in context: http://www.nabble.com/Installing-Bioperl-on-Mac-OS-X-10.5.7-tp24240541p24240541.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From w.bryant at ucl.ac.uk  Mon Jun  1 08:06:58 2009
From: w.bryant at ucl.ac.uk (Will Bryant)
Date: Mon, 01 Jun 2009 09:06:58 +0100
Subject: [Bioperl-l] Extract genomic data from GenBank
Message-ID: <4A238C22.9090604@ucl.ac.uk>

I'm trying to retrieve the complete GenBank format sequence file for a 
specified bacterium using get_Seq_by_gi, but I keep getting 'gi does not 
exist' errors, even when trying the example gi '405830'.  The script was 
running fine September last year, but when I came back to it this week 
it wasn't working.  Am I missing something obvious?

In case it's important, I'm using ActivePerl 5.10.0, bioperl 1.5.2_100

Code:

#!/usr/bin/perl -w

use strict;
use Bio::Perl;
use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank(-db => 'genome', -format => 'genbank');

my $straincomp = $gb->get_Seq_by_gi('405830');

my $seqout = 0;

#my $set_output_file = '$seqout = Bio::SeqIO->new( -format => 
\'genbank\', -file => 
\'>c:\\phd\\modelling\\working\\gi'.$ARGV[0].'_data.gb\');';

#print $set_output_file;
eval ($set_output_file);

$seqout -> write_seq($straincomp);


Error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: gi does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw c:/perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_gi 
c:/perl/site/lib/Bio/DB/WebDBSeqI.pm:209
STACK: c:\phd\modelling\perl_scripts\retrieve_genome_data.pl:12
-----------------------------------------------------------

Many thanks,

Will Bryant.


From David.Messina at sbc.su.se  Mon Jun  1 09:04:40 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 1 Jun 2009 11:04:40 +0200
Subject: [Bioperl-l] Extract genomic data from GenBank
In-Reply-To: <4A238C22.9090604@ucl.ac.uk>
References: <4A238C22.9090604@ucl.ac.uk>
Message-ID: <628aabb70906010204y46139e1dy702fd53380adecf7@mail.gmail.com>

Hey Will,
I think there have been API changes in GenBank's remote query interface that
have occurred after 1.5.2_100 of BioPerl was written. Try upgrading to
BioPerl 1.6 and see if that works for you.

(Note that I've only glanced at your code -- I'm assuming that's not the
problem since it worked fine for you before.)


Dave


From fontanez at fas.harvard.edu  Mon Jun  1 12:41:06 2009
From: fontanez at fas.harvard.edu (Kristina Fontanez)
Date: Mon, 1 Jun 2009 08:41:06 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<4A205502.2030701@sendu.me.uk>
	<024B0302-7885-4005-851D-5D582122ED06@fas.harvard.edu>
	<4A205D46.4090105@sendu.me.uk>
	<C00A2D77-4B41-4FF0-ACE5-1A4F6D46F27A@illinois.edu>
Message-ID: <855163D8-6B40-4DF4-84B6-C14611D1CA42@fas.harvard.edu>

Hey everyone-

Thanks for all the advice. I reinstalled Xcode tools, installed Fink  
and downloaded bioperl successfully. It's now working smoothly.

Thanks again,
Kristina
---------------------------------------------------------------
Kristina Fontanez
PhD candidate
Department of Organismic and Evolutionary Biology
Cavanaugh lab
Harvard University
16 Divinity Ave.
Cambridge, MA 02138

tel: 617-495-1138
fax: 617-496-6933
email: fontanez at fas.harvard.edu


On May 29, 2009, at 10:40 PM, Chris Fields wrote:

Kristina,

You aren't running as superuser:

 > term dump: > > dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez 
$ cpan

You'll need to run cpan using 'sudo cpan' if installing modules  
anywhere requiring superuser permissions.

chris

On May 29, 2009, at 5:10 PM, Sendu Bala wrote:

> Kristina Fontanez wrote:
>> Hello everyone-
>> Sendu - I took your advice but doing Install Bundle::CPAN did not  
>> take care of the dependencies. It still failed. See attached txt  
>> file with my terminal output. Does anyone have any idea how this  
>> might be?
>
> From reading the output it seems like perhaps you don't have 'make'  
> or there is something wrong when using it. If you're on a mac you  
> may need to install the dev tools. Someone else want to jump in here  
> with advice?
>
> Also, check your CPAN configuration to ensure it is trying to use  
> the correct make commands. ('o conf' etc.)
>
>
>> If I wanted to wipe all perl from my computer and simply start  
>> over, how might this be accomplished?
>
> Don't do that. At least not until you know you have a working make  
> setup.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jun  1 14:55:50 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 10:55:50 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
Message-ID: <13190185F84E43BDA99993CEB44394C4@NewLife>

Hi All 
Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an exhibition of B::S::Tiling, use cases, code snippets, design, implementation and algorithm discussions. We're just about ready to port over to core from bioperl-dev; please shout out if this is not a good idea. 
cheers and thanks for all input--
Mark


From cjfields at illinois.edu  Mon Jun  1 15:21:30 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 10:21:30 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
Message-ID: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>

A autogenerated passthrough Makefile.PL is generated with the  
distribution:

http://cpansearch.perl.org/src/CJFIELDS/BioPerl-1.6.0/Makefile.PL

We may remove that in future releases, but it should work regardless  
(i.e. call Module::Build and Build.PL).  I'm pretty convinced that the  
issue was permissions-based at heart.  Note Kristina ran 'cpan'  
instead of 'sudo cpan' to invoke the shell, so the shell is using  
current user config instead of su for installation.  You need to use  
'sudo' to install anything /Library/Perl on Mac (unless you are  
already 'root', but on recent OS X version logging in as 'root' is  
turned off).

I just noticed nothing is mentioned along these lines in the  
installation docs, so we'll need to update those.

chris

On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:

> Hi Kristina,
>
> [Don't forget to reply-all, so the list stays in the loop. Many many  
> more helpers
> there.]
>
> Apparently cpan can't make the Makefile, but can download and expand  
> the
> library directories, in your .cpan directory (see edited highlights  
> below).
>
> Let's appeal to the BioPerl brethren/sestren---answers?
>
> MAJ
>
>
> term dump:
>
> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
> Terminal does not support AddHistory.
>
> cpan shell -- CPAN exploration and modules installation (v1.7602)
> ReadLine support available (try 'install Bundle::CPAN')
>
> cpan> install Test::Harness
> CPAN: Storable loaded ok
> Going to read /Users/kristinafontanez/.cpan/Metadata
> Database was generated on Fri, 29 May 2009 11:27:00 GMT
> Running install for module Test::Harness
> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
> CPAN: Digest::MD5 loaded ok
> CPAN: Compress::Zlib loaded ok
> Checksum for /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ 
> ANDYA/Test-Harness-3.17.tar.gz ok
> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
> Test-Harness-3.17/
> Test-Harness-3.17/Build.PL
> ...
> Test-Harness-3.17/xt/perls/sample-tests/
> Test-Harness-3.17/xt/perls/sample-tests/perl_version
> Removing previously used /Users/kristinafontanez/.cpan/build/Test- 
> Harness-3.17
>
> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>
> Checking if your kit is complete...
> Looks good
> Writing Makefile for Test::Harness
>   -- NOT OK
> Running make test
> Can't test without successful make
> Running make install
> make had returned bad status, install seems impossible
>
> cpan> install File::HomeDir
> ...[more of same]...
>
>
> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu 
> >
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Friday, May 29, 2009 3:56 PM
> Subject: Re: [Bioperl-l] problem with bioperl install
>
>
>> Mr. Jensen-
>>
>> Thank you for your help but unfortunately the installation of
>> Test::Harness etc didn't work. I copied my terminal output and
>> attached the file. Any advice on what's still going wrong?
>>
>> Thanks,
>> Kristina
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>>
>>
>>
>> ---------------------------------------------------------------
>> Kristina Fontanez
>> PhD candidate
>> Department of Organismic and Evolutionary Biology
>> Cavanaugh lab
>> Harvard University
>> 16 Divinity Ave.
>> Cambridge, MA 02138
>>
>> tel: 617-495-1138
>> fax: 617-496-6933
>> email: fontanez at fas.harvard.edu
>>
>>
>>
>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>
>> The message says you are first updating your CPAN.pm.
>> That module needs modules you don't have, so
>>
>> use cpan to install the dependencies you don't have, viz.
>>>   Test::Harness
>>>   File::HomeDir
>>
>> $ cpan
>>> install Test::Harness
>> etc.
>> Then install CPAN.pm again (or run the Bioperl install again).
>>
>> Lather, rinse, repeat the install of Bioperl until it completes
>> without errors.
>>
>> ----- Original Message ----- From: "Kristina Fontanez" <fontanez at fas.harvard.edu
>> >
>> To: <bioperl-l at bioperl.org>
>> Sent: Friday, May 29, 2009 3:07 PM
>> Subject: [Bioperl-l] problem with bioperl install
>>
>>
>>> Hello-
>>>
>>> I am trying to install bioperl and I ran into some problems. See
>>> list  below.
>>>
>>>
>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>
>>> Checking if your kit is complete...
>>> Looks good
>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>> Writing Makefile for CPAN
>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>> CPAN-1.94.tar.gz] -----
>>>   Test::Harness
>>>   File::HomeDir
>>>
>>>
>>> How can I fix this?
>>>
>>>
>>> Thanks,
>>> Kristina
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jun  1 16:14:07 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 11:14:07 -0500
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <13190185F84E43BDA99993CEB44394C4@NewLife>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
Message-ID: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>

I think, as long is it doesn't significantly impact SearchIO  
performance wise (from reading the HOWTO I can't see how it will), I  
say commit away. In fact, I consider this a bug fix that should be in  
the next 1.6 point release. We should add deprecation warnings where  
needed for 1.7...

chris

On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:

> Hi All
> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  
> exhibition of B::S::Tiling, use cases, code snippets, design,  
> implementation and algorithm discussions. We're just about ready to  
> port over to core from bioperl-dev; please shout out if this is not  
> a good idea.
> cheers and thanks for all input--
> Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.bolser at gmail.com  Mon Jun  1 16:27:30 2009
From: dan.bolser at gmail.com (Dan Bolser)
Date: Mon, 1 Jun 2009 17:27:30 +0100
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
Message-ID: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>

2009/6/1 Chris Fields <cjfields at illinois.edu>:

...
> for installation. ?You need to use 'sudo' to install anything /Library/Perl
> on Mac (unless you are already 'root', but on recent OS X version logging in
...

local::lib is supposed to take care of this. Is this broken on Mac?
Building stuff as root is generally considered to be bad.


> I just noticed nothing is mentioned along these lines in the installation
> docs, so we'll need to update those.

I tried to write down a clear 'recipe' for getting things installed
(this was actually on the GMod wiki). I really think the install docs
could be improved. Sometimes less verbose is better.

Dan

> chris
>
> On May 29, 2009, at 4:08 PM, Mark A. Jensen wrote:
>
>> Hi Kristina,
>>
>> [Don't forget to reply-all, so the list stays in the loop. Many many more
>> helpers
>> there.]
>>
>> Apparently cpan can't make the Makefile, but can download and expand the
>> library directories, in your .cpan directory (see edited highlights
>> below).
>>
>> Let's appeal to the BioPerl brethren/sestren---answers?
>>
>> MAJ
>>
>>
>> term dump:
>>
>> dhcp-0019353043-25-35:BioPerl-1.6.0 kristinafontanez$ cpan
>> Terminal does not support AddHistory.
>>
>> cpan shell -- CPAN exploration and modules installation (v1.7602)
>> ReadLine support available (try 'install Bundle::CPAN')
>>
>> cpan> install Test::Harness
>> CPAN: Storable loaded ok
>> Going to read /Users/kristinafontanez/.cpan/Metadata
>> Database was generated on Fri, 29 May 2009 11:27:00 GMT
>> Running install for module Test::Harness
>> Running make for A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> CPAN: Digest::MD5 loaded ok
>> CPAN: Compress::Zlib loaded ok
>> Checksum for
>> /Users/kristinafontanez/.cpan/sources/authors/id/A/AN/ANDYA/Test-Harness-3.17.tar.gz
>> ok
>> Scanning cache /Users/kristinafontanez/.cpan/build for sizes
>> Test-Harness-3.17/
>> Test-Harness-3.17/Build.PL
>> ...
>> Test-Harness-3.17/xt/perls/sample-tests/
>> Test-Harness-3.17/xt/perls/sample-tests/perl_version
>> Removing previously used
>> /Users/kristinafontanez/.cpan/build/Test-Harness-3.17
>>
>> CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz
>>
>> Checking if your kit is complete...
>> Looks good
>> Writing Makefile for Test::Harness
>> ?-- NOT OK
>> Running make test
>> Can't test without successful make
>> Running make install
>> make had returned bad status, install seems impossible
>>
>> cpan> install File::HomeDir
>> ...[more of same]...
>>
>>
>> ----- Original Message ----- From: "Kristina Fontanez"
>> <fontanez at fas.harvard.edu>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Friday, May 29, 2009 3:56 PM
>> Subject: Re: [Bioperl-l] problem with bioperl install
>>
>>
>>> Mr. Jensen-
>>>
>>> Thank you for your help but unfortunately the installation of
>>> Test::Harness etc didn't work. I copied my terminal output and
>>> attached the file. Any advice on what's still going wrong?
>>>
>>> Thanks,
>>> Kristina
>>>
>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------
>>> Kristina Fontanez
>>> PhD candidate
>>> Department of Organismic and Evolutionary Biology
>>> Cavanaugh lab
>>> Harvard University
>>> 16 Divinity Ave.
>>> Cambridge, MA 02138
>>>
>>> tel: 617-495-1138
>>> fax: 617-496-6933
>>> email: fontanez at fas.harvard.edu
>>>
>>>
>>>
>>> On May 29, 2009, at 3:35 PM, Mark A. Jensen wrote:
>>>
>>> The message says you are first updating your CPAN.pm.
>>> That module needs modules you don't have, so
>>>
>>> use cpan to install the dependencies you don't have, viz.
>>>>
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>
>>> $ cpan
>>>>
>>>> install Test::Harness
>>>
>>> etc.
>>> Then install CPAN.pm again (or run the Bioperl install again).
>>>
>>> Lather, rinse, repeat the install of Bioperl until it completes
>>> without errors.
>>>
>>> ----- Original Message ----- From: "Kristina Fontanez"
>>> <fontanez at fas.harvard.edu
>>> >
>>> To: <bioperl-l at bioperl.org>
>>> Sent: Friday, May 29, 2009 3:07 PM
>>> Subject: [Bioperl-l] problem with bioperl install
>>>
>>>
>>>> Hello-
>>>>
>>>> I am trying to install bioperl and I ran into some problems. See
>>>> list ?below.
>>>>
>>>>
>>>> CPAN.pm: Going to build A/AN/ANDK/CPAN-1.94.tar.gz
>>>>
>>>> Checking if your kit is complete...
>>>> Looks good
>>>> Warning: prerequisite File::HomeDir 0.69 not found.
>>>> Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.
>>>> Writing Makefile for CPAN
>>>> ---- Unsatisfied dependencies detected during [A/AN/ANDK/
>>>> CPAN-1.94.tar.gz] -----
>>>> ?Test::Harness
>>>> ?File::HomeDir
>>>>
>>>>
>>>> How can I fix this?
>>>>
>>>>
>>>> Thanks,
>>>> Kristina
>>>> ---------------------------------------------------------------
>>>> Kristina Fontanez
>>>> PhD candidate
>>>> Department of Organismic and Evolutionary Biology
>>>> Cavanaugh lab
>>>> Harvard University
>>>> 16 Divinity Ave.
>>>> Cambridge, MA 02138
>>>>
>>>> tel: 617-495-1138
>>>> fax: 617-496-6933
>>>> email: fontanez at fas.harvard.edu
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Mon Jun  1 17:15:42 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 12:15:42 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
Message-ID: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>


On Jun 1, 2009, at 11:27 AM, Dan Bolser wrote:

> 2009/6/1 Chris Fields <cjfields at illinois.edu>:
>
> ...
>> for installation.  You need to use 'sudo' to install anything / 
>> Library/Perl
>> on Mac (unless you are already 'root', but on recent OS X version  
>> logging in
> ...
>
> local::lib is supposed to take care of this. Is this broken on Mac?
> Building stuff as root is generally considered to be bad.

You can install to a local lib, yes, but cpan needs to be manually  
configured to do this; I don't think it is automatically configured to  
do so in OS X, eg. it defaults to /Library/Perl.

Frankly, I sidestep the whole issue with my own custom perl  
installation, but that's me.

>> I just noticed nothing is mentioned along these lines in the  
>> installation
>> docs, so we'll need to update those.
>
> I tried to write down a clear 'recipe' for getting things installed
> (this was actually on the GMod wiki). I really think the install docs
> could be improved. Sometimes less verbose is better.
>
> Dan

True, but I would much rather have reasonable instructions that  
outline most installation issues than ones that aren't detailed enough.

My thought is to strip down the INSTALL doc that comes with BioPerl  
down to the essentials and point to the wiki for the more detailed  
ones (including problems encountered).  It's too hard to maintain both  
and backport the wiki into plain text.

chris


From maj at fortinbras.us  Mon Jun  1 19:03:05 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 1 Jun 2009 15:03:05 -0400
Subject: [Bioperl-l] a HOWTO for Tiling
In-Reply-To: <6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
References: <13190185F84E43BDA99993CEB44394C4@NewLife>
	<6DCD3564-6756-4416-899A-F32DC7310AD2@illinois.edu>
Message-ID: <AABEFA992F2345548C861ADDFDC50132@NewLife>

Thanks, Chris--

Bio::Search::Tiling is now ported to core; the snapshot of the ported version is 
in bioperl-dev/tags/tiling-port-to-core-060109.
Bunch o' tests performed by t/SearchIO/Tiling.t; bunch more if one sets 
BIOPERL_TILING_EXHAUSTIVE_TESTS .

Cry 'Havoc!' and let slip the dogs of war...

MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Sendu Bala" <bix at sendu.me.uk>; "Dave Messina" <dave at davemessina.com>; 
"BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Monday, June 01, 2009 12:14 PM
Subject: Re: [Bioperl-l] a HOWTO for Tiling


>I think, as long is it doesn't significantly impact SearchIO  performance wise 
>(from reading the HOWTO I can't see how it will), I  say commit away. In fact, 
>I consider this a bug fix that should be in  the next 1.6 point release. We 
>should add deprecation warnings where  needed for 1.7...
>
> chris
>
> On Jun 1, 2009, at 9:55 AM, Mark A. Jensen wrote:
>
>> Hi All
>> Please peruse http://www.bioperl.org/wiki/HOWTO:Tiling for an  exhibition of 
>> B::S::Tiling, use cases, code snippets, design,  implementation and algorithm 
>> discussions. We're just about ready to  port over to core from bioperl-dev; 
>> please shout out if this is not  a good idea.
>> cheers and thanks for all input--
>> Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From koenvanderdrift at gmail.com  Mon Jun  1 22:22:23 2009
From: koenvanderdrift at gmail.com (Koen van der Drift)
Date: Mon, 1 Jun 2009 18:22:23 -0400
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
Message-ID: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>


On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:

> My thought is to strip down the INSTALL doc that comes with BioPerl  
> down to the essentials and point to the wiki for the more detailed  
> ones (including problems encountered).  It's too hard to maintain  
> both and backport the wiki into plain text.


Good idea, please then also update the file PLATFORMS. It has a link  
to a very outdated website for the installation of bioperl on OS X.  
And maybe a line + link to the bioperl wiki can be added that  
recommends the use of fink as an alternative to cpan?

cheers,

- Koen.


From cjfields at illinois.edu  Mon Jun  1 23:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 1 Jun 2009 18:27:32 -0500
Subject: [Bioperl-l] problem with bioperl install
In-Reply-To: <2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
References: <BCF27AD6-75B3-4CE8-83AE-C7BAF99AE800@fas.harvard.edu>
	<2023E087846042178215CF9EBDE12C75@NewLife>
	<BB7756FB-7A18-4E23-9DBB-34B6A2FEE27F@fas.harvard.edu>
	<FD5880DBC2054B5FA9E89FB5CA8B139B@NewLife>
	<8739E8BC-9586-4EA2-9377-23D13CC6ED20@illinois.edu>
	<2c8757af0906010927x2ff3bce7r62522f108b43c414@mail.gmail.com>
	<87C523DD-6F47-4CFF-8907-566A18A0A08E@illinois.edu>
	<2E5C7781-D115-415F-BA28-120613B221C3@gmail.com>
Message-ID: <98605D05-706B-4ACB-B444-4F0A9CEC879D@illinois.edu>


On Jun 1, 2009, at 5:22 PM, Koen van der Drift wrote:

>
> On Jun 1, 2009, at 1:15 PM, Chris Fields wrote:
>
>> My thought is to strip down the INSTALL doc that comes with BioPerl  
>> down to the essentials and point to the wiki for the more detailed  
>> ones (including problems encountered).  It's too hard to maintain  
>> both and backport the wiki into plain text.
>
>
> Good idea, please then also update the file PLATFORMS. It has a link  
> to a very outdated website for the installation of bioperl on OS X.  
> And maybe a line + link to the bioperl wiki can be added that  
> recommends the use of fink as an alternative to cpan?
>
> cheers,
>
> - Koen.

Done. I've added a ticket on bugzilla for tracking this so it doesn't  
get lost:

http://bugzilla.open-bio.org/show_bug.cgi?id=2846

chris


From shalabh.sharma7 at gmail.com  Tue Jun  2 14:44:25 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 10:44:25 -0400
Subject: [Bioperl-l] Refseq Hits
Message-ID: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>

Hi All,
          This is not really a bioperl query, but i am really confused and
need some help.
I blasted some sequences against refseq database (locally). After parsing
the blast result what i noticed that some description fields contain two hit
names like:
hit_name ->    gi|71082715|ref|YP_265434.1|
Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
[Candidatus Pelagibacter ubique HTCC1002]

So besides giving me description for hit_name (HTCC 1062) its also giving me
HTCC 1002.
I will really appreciate if someone can help me out.

Thanks
Shalabh
_________________________________________________
Shalabh Sharma
Scientific Computing Professional Associate
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636

phone: 706-542-0341
email: ssharmai at uga.edu


From jonathancrabtree at gmail.com  Tue Jun  2 15:04:33 2009
From: jonathancrabtree at gmail.com (Jonathan Crabtree)
Date: Tue, 2 Jun 2009 11:04:33 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
Message-ID: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>

Hi Shalabh-

I believe RefSeq is a non-redundant database, in which sequence entries with
identical sequences are merged and their descriptions are concatenated in
the FASTA defline.  If you look up the two accession numbers/gi numbers from
your search results I think you'll see that both are valid matches because
their polypeptide sequences are identical:

http://www.ncbi.nlm.nih.gov/protein/71082715
http://www.ncbi.nlm.nih.gov/protein/91762865

You're just getting a single match with two descriptions instead of two
matches with one description, but the sequence is the same and so, therefore
are the blast alignments.

Jonathan

On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>          This is not really a bioperl query, but i am really confused and
> need some help.
> I blasted some sequences against refseq database (locally). After parsing
> the blast result what i noticed that some description fields contain two
> hit
> names like:
> hit_name ->    gi|71082715|ref|YP_265434.1|
> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
> [Candidatus Pelagibacter ubique HTCC1002]
>
> So besides giving me description for hit_name (HTCC 1062) its also giving
> me
> HTCC 1002.
> I will really appreciate if someone can help me out.
>
> Thanks
> Shalabh
> _________________________________________________
> Shalabh Sharma
> Scientific Computing Professional Associate
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>
> phone: 706-542-0341
> email: ssharmai at uga.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From shalabh.sharma7 at gmail.com  Tue Jun  2 15:15:45 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Tue, 2 Jun 2009 11:15:45 -0400
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
Message-ID: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>

Hi Jonathan,                  Your information is really helpful. Thanks a
lot.

-Shalabh


On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
jonathancrabtree at gmail.com> wrote:

>
> Hi Shalabh-
>
> I believe RefSeq is a non-redundant database, in which sequence entries
> with identical sequences are merged and their descriptions are concatenated
> in the FASTA defline.  If you look up the two accession numbers/gi numbers
> from your search results I think you'll see that both are valid matches
> because their polypeptide sequences are identical:
>
> http://www.ncbi.nlm.nih.gov/protein/71082715
> http://www.ncbi.nlm.nih.gov/protein/91762865
>
> You're just getting a single match with two descriptions instead of two
> matches with one description, but the sequence is the same and so, therefore
> are the blast alignments.
>
> Jonathan
>
> On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > wrote:
>
>> Hi All,
>>          This is not really a bioperl query, but i am really confused and
>> need some help.
>> I blasted some sequences against refseq database (locally). After parsing
>> the blast result what i noticed that some description fields contain two
>> hit
>> names like:
>> hit_name ->    gi|71082715|ref|YP_265434.1|
>> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
>> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
>> protein
>> [Candidatus Pelagibacter ubique HTCC1002]
>>
>> So besides giving me description for hit_name (HTCC 1062) its also giving
>> me
>> HTCC 1002.
>> I will really appreciate if someone can help me out.
>>
>> Thanks
>> Shalabh
>> _________________________________________________
>> Shalabh Sharma
>> Scientific Computing Professional Associate
>> Department of Marine Sciences
>> University of Georgia
>> Athens, GA 30602-3636
>>
>> phone: 706-542-0341
>> email: ssharmai at uga.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>


From tristan.lefebure at gmail.com  Tue Jun  2 16:24:21 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Tue, 2 Jun 2009 12:24:21 -0400
Subject: [Bioperl-l] Creating a fastq format file?
In-Reply-To: <ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
References: <2c8757af0904220349v4e4a9b89xe374a46f9e82cfdc@mail.gmail.com>
	<ddde1f420904262242s533bd5abqeb9db75463d5a8f2@mail.gmail.com>
	<ddde1f420904270238w2bad577fq49def99607597793@mail.gmail.com>
Message-ID: <200906021224.21439.tristan.lefebure@gmail.com>

On Monday 27 April 2009 05:38:40 Heikki Lehvaslaiho wrote:
> I convinced at least myself to the degree that I wrote
> the range_convert() method - with plenty of tests. I
> mention this now so that no-one else need to start
> thinking through all the edge values.
>
> :)
>
> I'll contribute it to the code base once there is a
> consensus of best way forward.
>

Heikki,

This thread has been quiet for a while, but I don't see 
anything new in Bio::Seq::Quality. Did we reach a consensus 
or are you waiting for some more discussion on the subject?

(I'm pretty impatient to see bioperl handling both sanger 
and illumina ranges on the fly!)

--Tristan

>     -Heikki
>
> 2009/4/27 Heikki Lehvaslaiho 
<heikki.lehvaslaiho at gmail.com>:
> >> I have tried to summarise this in a central place:
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >
> > Torsten,
> >
> > Thanks for putting this together. Very helpful.
> >
> > Do you have a plan of action?  Let me propose one for
> > BioPerl. It based on following assumptions:
> >
> > 1. There is multitude of different ways of coding
> > quality values out there. 2. Bio::Seq::Quality is
> > agnostic of any quality value range rules 3. The
> > emerging open standard is the Sanger fastq
> > specification 4. Open source programs use the Sanger
> > fastq specs
> >
> >
> > From these it follows that:
> >
> >
> > 1. BioPerl should support Sanger fastq standard
> >
> > 1.1. it already does and there are other SeqIO modules
> > for dealing with other non-fastq formats.
> >
> > 2. BioPerl should offer simple ways of converting
> > between quality range rules
> >
> > 2.1. Have a generic method accessible from
> > Bio::Seq::Quality with preset versions of the method
> > for converting between known variants (Sanger fastq and
> > the two Illumina versions)
> >
> > For example:
> >
> > range_convert ($from_lower, $from_upper, $to_lower,
> > $to_upper, $value) throw if $value < $from_lower or
> > $value > $from_upper return $newvalue
> >
> > range_convert_illumina2fastq(),
> > range_convert_fastq2illumina(),
> > range_convert_fastq2phred(),
> >  range_convert_phred2fastq()....
> >
> > (assuming that illumina 1.3 eq phred)
> >
> > 2.2. Bio::SeqIO::Fastq::next_seq methods should convert
> > Illumina qualities into Sanger fastq on the fly
> >
> > 2.2.1 Bio::SeqIO::Fastq::next_seq should detect the
> > incoming stream of quality value range either
> > automatically or be given a keyword parameter
> > indicating the range.
> >
> > 2.2.2. Bio::SeqIO::Fastq::next_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.3. Bio::SeqIO::Fastq::write_seq should throw an
> > error if it detects a quality value out of range.
> >
> > 2.2.4. It would be useful but not absolutely necessary
> > for Bio::SeqIO::Fastq::write_seq to be able to write
> > out in Illumina ranges
> >
> >
> > What do you think?
> >
> >    -Heikki
> >
> > 2009/4/26 Torsten Seemann 
<torsten.seemann at infotech.monash.edu.au>:
> >>> > This might be a good place to ask the question:
> >>> > having looked at the fastq.pm page, is the fastq
> >>> > format defined (only) by a "@'" followed by
> >>>
> >>> a
> >>>
> >>> > sequence line and a "+" header followed by a
> >>> > quality line and the two headers have to agree? Now
> >>> > that Illumina is using phred scaling, are 'Sanger'
> >>> > and 'Illumina' versions the same?
> >>>
> >>> No they aren't the same, Illumina still encodes the
> >>> ascii as value + 64 and Sanger as value + 33.
> >>
> >> Illumina have now CHANGED how they calculate the
> >> quality value however in the last month or so... Their
> >> Q range used to be -5..40 mapped to ASCII 64+, but now
> >> they produce Q >= 0 and it is unclear if they start at
> >> 69 or 64 now...
> >>
> >> I have tried to summarise this in a central place:
> >>
> >> http://en.wikipedia.org/wiki/FASTQ_format
> >>
> >> Corrections welcome!
> >>
> >>
> >> --Torsten Seemann
> >> --Victorian Bioinformatics Consortium, Dept.
> >> Microbiology, Monash University, AUSTRALIA
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> >    -Heikki
> > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho
> > cell: +27 (0)714328090
> > Sent from Claremont, WC, South Africa


From Russell.Smithies at agresearch.co.nz  Tue Jun  2 20:56:26 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 3 Jun 2009 08:56:26 +1200
Subject: [Bioperl-l] Refseq Hits
In-Reply-To: <9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
References: <9fcc48c70906020744x1301136fm852443d1fe96941b@mail.gmail.com>
	<8e5b8bf80906020804s5e8f1737je6539365c38a9226@mail.gmail.com>
	<9fcc48c70906020815x2295ee9ay8023d521a50238ca@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493EB1D18@exchsth.agresearch.co.nz>

The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Wednesday, 3 June 2009 3:16 a.m.
> To: Jonathan Crabtree
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Refseq Hits
> 
> Hi Jonathan,                  Your information is really helpful. Thanks a
> lot.
> 
> -Shalabh
> 
> 
> On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
> jonathancrabtree at gmail.com> wrote:
> 
> >
> > Hi Shalabh-
> >
> > I believe RefSeq is a non-redundant database, in which sequence entries
> > with identical sequences are merged and their descriptions are concatenated
> > in the FASTA defline.  If you look up the two accession numbers/gi numbers
> > from your search results I think you'll see that both are valid matches
> > because their polypeptide sequences are identical:
> >
> > http://www.ncbi.nlm.nih.gov/protein/71082715
> > http://www.ncbi.nlm.nih.gov/protein/91762865
> >
> > You're just getting a single match with two descriptions instead of two
> > matches with one description, but the sequence is the same and so, therefore
> > are the blast alignments.
> >
> > Jonathan
> >
> > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > > wrote:
> >
> >> Hi All,
> >>          This is not really a bioperl query, but i am really confused and
> >> need some help.
> >> I blasted some sequences against refseq database (locally). After parsing
> >> the blast result what i noticed that some description fields contain two
> >> hit
> >> names like:
> >> hit_name ->    gi|71082715|ref|YP_265434.1|
> >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
> >> protein
> >> [Candidatus Pelagibacter ubique HTCC1002]
> >>
> >> So besides giving me description for hit_name (HTCC 1062) its also giving
> >> me
> >> HTCC 1002.
> >> I will really appreciate if someone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _________________________________________________
> >> Shalabh Sharma
> >> Scientific Computing Professional Associate
> >> Department of Marine Sciences
> >> University of Georgia
> >> Athens, GA 30602-3636
> >>
> >> phone: 706-542-0341
> >> email: ssharmai at uga.edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From maj at fortinbras.us  Tue Jun  2 21:05:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 2 Jun 2009 17:05:03 -0400
Subject: [Bioperl-l] Bio::Search::Tiling
Message-ID: <B006036D760941179148C9F8E2AD7E05@NewLife>

All-
Bio::Search::Tiling is now in bioperl-live, passes all tests.
Thanks, 
Mark


From shalabh.sharma7 at gmail.com  Wed Jun  3 17:27:59 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Wed, 3 Jun 2009 13:27:59 -0400
Subject: [Bioperl-l] gbf to gff
Message-ID: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>

Hi all,                 I am working on Roseobacters. Many times I've
converted gbk file from GenBank to gff format but now one genome
"Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
gbf files:

https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain

So now how i can convert this genome to one gff file so i can use it in
gbrowse?
I would really appreciate if anyone can help me out.

Thanks


From scott at scottcain.net  Wed Jun  3 18:11:54 2009
From: scott at scottcain.net (Scott Cain)
Date: Wed, 3 Jun 2009 14:11:54 -0400
Subject: [Bioperl-l] gbf to gff
In-Reply-To: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
References: <9fcc48c70906031027h3381a93fya349fba8e5ba464c@mail.gmail.com>
Message-ID: <536f21b00906031111l4b02a846o6f281c536b77460d@mail.gmail.com>

Hi Shalabh,

Do you want them combined onto a single reference sequence?  I'm
guessing this is a circular microbial genome in two segments.  Do you
know how to the coordinates in one genbank file relates to the other
(or are you willing to make something up)?  I imagine the way I would
do it would be to convert both files to gff and then write a quicky
script to convert the coordinates and reference sequence name (column
1) of one file to be consistent with the other.

Scott


On Wed, Jun 3, 2009 at 1:27 PM, shalabh sharma
<shalabh.sharma7 at gmail.com> wrote:
> Hi all, ? ? ? ? ? ? ? ? I am working on Roseobacters. Many times I've
> converted gbk file from GenBank to gff format but now one genome
> "Silicibacter lacuscaerulensis" does not have a gbk file instead it has two
> gbf files:
>
> https://research.venterinstitute.org/moore/SingleOrganism.do?speciesTag=SL1157&pageAttr=pageMain
>
> So now how i can convert this genome to one gff file so i can use it in
> gbrowse?
> I would really appreciate if anyone can help me out.
>
> Thanks
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From alperyilmaz at gmail.com  Fri Jun  5 18:50:46 2009
From: alperyilmaz at gmail.com (Alper Yilmaz)
Date: Fri, 5 Jun 2009 14:50:46 -0400
Subject: [Bioperl-l] GBroswe2 - feature details
Message-ID: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>

Dear all,

I have a question about utilizing the tag/value pairs that were used
in 9th of GFF. If my 9th column is like this:

ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22

How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
print name and sequence of a BindingSite, what do I need to replace
question marks below?

balloon hover = <font size=small color=red>Motif name: $name,
Sequence: ???????</font>


The manual is mentioning that it's possible to use user defined
tag/value pairs, but I couldn't figure out how. The manual is
mentioning:
 [feature_type:details]
 tag1 = formatting rule
 tag2 = formatting rule
 tag3 = formatting rule

can be used to adjust formatting of a tag, but I don't how this can be
used to assign value to a tag? I tried ;
[cis-elements:details]
bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
mentioned, tags are case-insensitive)
 OR
$bs_seq = <b>$value</b>

but, I cannot use $bs_seq in hover link option after doing this. What
am I doing wrong?

thanks,

Alper Yilmaz
Post-doctoral Researcher
Plant Biotechnology Center
The Ohio State University
1060 Carmack Rd
Columbus, OH 43210
(614)688-4954
www.grassius.org


From cjfields at illinois.edu  Fri Jun  5 20:43:04 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 5 Jun 2009 15:43:04 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] Bug in genbank.pm?
In-Reply-To: <52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
References: <002b01c9e567$e09b0de0$a1d129a0$@edu>
	<A145C0B1-D2B3-47CB-BA46-DCCDD693D05F@illinois.edu>
	<52A1BBE501D0BD40AF1EC72C14EA255D0BE06A@MAILBOX-31.home.ku.edu>
Message-ID: <C29B8160-5682-48AF-BD9E-A5FF26EC679F@illinois.edu>

(Just so this is going to the correct list)

Marcos,

I'll look into it.  This may have been fixed in between the releases,  
though.

There isn't a PPM available for 1.6 yet (several prereqs were missing  
at the time of the 1.6 release, such as Graphviz and so on).  A bug  
report is in the queue for this, though, as a reminder.  I think those  
are now available, though, so we should *theoretically* be capable of  
getting a PPM ready.  I say 'theoretically' b/c I don't have easy  
access to a PC running Windows (I have moved to OS X).  I'll see what  
I can do about that in the next few weeks.

In the meantime, if you need it you can download 1.6 or the 'nightly  
build' version (nightly snapshots of svn code) and add it to PERL5LIB  
or "use lib 'PATH_TO_BIOPERL';" in your scripts; it should work.

Nightly builds:

http://bioperl.org/DIST/nightly_builds/

chris

On Jun 4, 2009, at 10:17 PM, Barbeitos, Marcos wrote:

> OK, I attached the first record for both files.  These are GenBank  
> flat files that were emailed to us and transferred from Macs to PCs,  
> so I am not sure if the encoding/line terminations got messed up at  
> some point.  I converted the line terminations to Unix and the  
> encoding to Western European Windows, still, it didn't work. May be  
> worth it mention that BioEdit did understand the format after I  
> fixed the encoding.
>
> The data was erased because my boss is kind of finicky about sharing  
> information.  However, I tested the files attached to this email and  
> got the same results.
>
> I am still using Bio-Perl 1.5.2_100 in a PC, PPM has not flagged the  
> availability of an upgrade from CPAN, are you releasing the PPD as  
> well?
>
> Thanks!
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thu 6/4/2009 8:05 PM
> To: Barbeitos, Marcos
> Cc: bioperl-guts-l at lists.open-bio.org
> Subject: Re: [Bioperl-guts-l] Bug in genbank.pm?
>
> Marcos,
>
> We need the GenBank file (or the accession) you are attempting to
> parse.  Also, what version are you using?  We have released v. 1.6 on
> CPAN, and I intend on releasing 1.6.1 soon.
>
> chris
>
> On Jun 4, 2009, at 5:57 PM, Marcos S. Barbeitos wrote:
>
>> Hello.  I am trying to parse the Info from GeneBank flat files using
>> Bio::SeqIO.  I got two file which are virtually identical and one of
>> them
>> gets parsed just fine.  However, in the case of the other, the  
>> program
>> croaks when trying to parse the features and gives me:
>>
>>
>>
>> -------------------- WARNING ---------------------
>>
>> MSG: Unexpected error in feature table for  Skipping feature,
>> attempting to
>> recover
>>
>> ---------------------------------------------------
>>
>>
>>
>> I noticed that it does that after it reads the entry '/organism' in
>> Features.  The only difference I can see between the two files is the
>> presence of the feature ' /organelle' and of the line BASE COUNT in
>> one of
>> them, but the error persists even after I remove these lines.  Apart
>> from
>> that, there are the number of white spaces that precede the
>> beginning of
>> each line.   Any ideas?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Marcos S. Barbeitos
>>
>> Post-Doc Fellow
>>
>> The University of Kansas
>> Department of Ecology and Evolutionary Biology
>> 2041 Haworth Hall
>> 1200 Sunnyside Avenue
>> Lawrence, Kansas 66045
>> p: 785.864.5887
>> f: 785.864.5860
>>
>>
>>
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
>
> <BioPerlTest.gb>


From Russell.Smithies at agresearch.co.nz  Sun Jun  7 20:32:27 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 8 Jun 2009 08:32:27 +1200
Subject: [Bioperl-l] GBroswe2 - feature details
In-Reply-To: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
References: <dac81b0d0906051150o116c8bcvd747b71189d7a722@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493F1CA41@exchsth.agresearch.co.nz>

For the first part of your question, you can use a sub to access values in your annotations:

balloon hover = sub{my $f = shift;
			my %a = $f->attributes;
			my $name = $f->name;
			my $seq = $a{'BS_Seq'};
			return "<font size=small color=red>Motif name: $name, Sequence: $seq</font>" if defined $seq;
			return "<font size=small color=red>Motif name: $name, No sequence defined</font>";
			}


For the second bit, here's the formatting rules I'm using to create hyperlinks:

[Dbxref:DETAILS]
URL = sub {
      my ($tag,$value)=@_;
      if ($value =~ /NCBI_gi:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=$1";
       }
      if ($value =~ /NCBI_Gene:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=$1";
       }
       return;
     }

And this is what the gff looks like:
BTA10	refseq	mRNA	10011147	10176454	0	-	.	ID=NM_001076052;Name=NM_001076052;Index=1;Alias=HOMER1;Note=homer homolog 1 (Drosophila);Dbxref=NCBI_gi:115496957;Dbxref=NCBI_Gene:535311;
BTA10	refseq	mRNA	10241506	10301142	0	+	.	ID=NM_001046361;Name=NM_001046361;Index=1;Alias=PAPD4,MGC138008;Note=PAP associated domain containing 4;Dbxref=NCBI_gi:114052221;Dbxref=NCBI_Gene:533862;

Hopefully, this will get you going :-)


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E? russell.smithies at agresearch.co.nz 

Invermay? Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T? +64 3 489 3809?? 
F? +64 3 489 9174? 
www.agresearch.co.nz 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Alper Yilmaz
> Sent: Saturday, 6 June 2009 6:51 a.m.
> To: BioPerl List
> Subject: [Bioperl-l] GBroswe2 - feature details
> 
> Dear all,
> 
> I have a question about utilizing the tag/value pairs that were used
> in 9th of GFF. If my 9th column is like this:
> 
> ID=BS_1;BS_Seq=cacatg;BS_Color=Purple;Name=AtMYC2 BS in RD22
> 
> How can I use BS_Seq, BS_Color tags, say, in a balloon? If want to
> print name and sequence of a BindingSite, what do I need to replace
> question marks below?
> 
> balloon hover = <font size=small color=red>Motif name: $name,
> Sequence: ???????</font>
> 
> 
> The manual is mentioning that it's possible to use user defined
> tag/value pairs, but I couldn't figure out how. The manual is
> mentioning:
>  [feature_type:details]
>  tag1 = formatting rule
>  tag2 = formatting rule
>  tag3 = formatting rule
> 
> can be used to adjust formatting of a tag, but I don't how this can be
> used to assign value to a tag? I tried ;
> [cis-elements:details]
> bs_seq = <b>$value</b>     (I didn't use BS_Seq, since it was
> mentioned, tags are case-insensitive)
>  OR
> $bs_seq = <b>$value</b>
> 
> but, I cannot use $bs_seq in hover link option after doing this. What
> am I doing wrong?
> 
> thanks,
> 
> Alper Yilmaz
> Post-doctoral Researcher
> Plant Biotechnology Center
> The Ohio State University
> 1060 Carmack Rd
> Columbus, OH 43210
> (614)688-4954
> www.grassius.org
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From bernd.jagla at pasteur.fr  Mon Jun  8 16:24:12 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 8 Jun 2009 18:24:12 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
Message-ID: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>

Hi, 

 
I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
'install Bio::Das'
This is perl, v5.8.9 built for darwin-2level
(please let me know if you need anything else)

 
I am trying to install Bio::Das 1.11

 
I get the following error:

 
not ok 3

not ok 4

Can't call method "description" on an undefined value at t/01das.t line 62.

 
When going into the sources for 01das.t and printing out $db I get:

 
$VAR1 = \bless( {

                   'autotypes' => undef,

                   'default_dsn' => undef,

                   'autocategories' => undef,

                   'sockets' => {},

                   'aggregators' => [

                                      bless( {

                                               'sub_parts' => [

 
'coding_exon'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'CDS',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator'
),

                                      bless( {

                                               'sub_parts' => [

                                                                'EST_match'

                                                              ],

                                               'require_whole_object' =>
undef,

                                               'main_method' => 'alignment',

                                               'method' => 'alignment'

                                             }, 'Bio::DB::GFF::Aggregator' )

                                    ],

                   'timeout' => undef,

                   'oldstyle_api' => 1,

                   'default_server' => 'http://www.wormbase.org/db/seq/das'

                 }, 'Bio::Das' );

 
@sources is empty

And test(3, at sources) fails.

 
Please advise.

 
Thanks,

 
Bernd

 
From lincoln.stein at gmail.com  Mon Jun  8 17:00:48 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Mon, 8 Jun 2009 13:00:48 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
Message-ID: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>

Hi,

The regression tests require an active Internet connection, as well as the
DAS test server being up and running. It may be there was a temporary
failure of one of those two. I just tested on my end and the regression
tests ran ok, so could you try it again?

Lincoln

On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:

> Hi,
>
>
>
> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN -e
> 'install Bio::Das'
> This is perl, v5.8.9 built for darwin-2level
> (please let me know if you need anything else)
>
>
>
> I am trying to install Bio::Das 1.11
>
>
>
> I get the following error:
>
>
>
> not ok 3
>
> not ok 4
>
> Can't call method "description" on an undefined value at t/01das.t line 62.
>
>
>
> When going into the sources for 01das.t and printing out $db I get:
>
>
>
> $VAR1 = \bless( {
>
>                   'autotypes' => undef,
>
>                   'default_dsn' => undef,
>
>                   'autocategories' => undef,
>
>                   'sockets' => {},
>
>                   'aggregators' => [
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>
> 'coding_exon'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'CDS',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator'
> ),
>
>                                      bless( {
>
>                                               'sub_parts' => [
>
>                                                                'EST_match'
>
>                                                              ],
>
>                                               'require_whole_object' =>
> undef,
>
>                                               'main_method' => 'alignment',
>
>                                               'method' => 'alignment'
>
>                                             }, 'Bio::DB::GFF::Aggregator' )
>
>                                    ],
>
>                   'timeout' => undef,
>
>                   'oldstyle_api' => 1,
>
>                   'default_server' => 'http://www.wormbase.org/db/seq/das'
>
>                 }, 'Bio::Das' );
>
>
>
>
>
> @sources is empty
>
> And test(3, at sources) fails.
>
>
>
> Please advise.
>
>
>
> Thanks,
>
>
>
> Bernd
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From lsbrath at gmail.com  Mon Jun  8 20:28:46 2009
From: lsbrath at gmail.com (lsbrath at gmail.com)
Date: Mon, 08 Jun 2009 20:28:46 +0000
Subject: [Bioperl-l] fasta conversion
Message-ID: <000e0cd6aa4cd53993046bdc1675@google.com>

Hello!

I am running into trouble while trying to convert a text file to fasta. It  
should be simple enough but I am getting a wierd error message.

This is my script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use File::Copy;
use Bio::SeqIO;


my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
my $maid = '13063';

opendir my $dh, "$maid_dir"; # directory to search
my @files = readdir $dh;
#find the _fasta file
for my $f (@files){
my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
my $r = $maid_dir."/".$maid."_hu_1kb.txt";
open (my $in,$r);
if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta

print Dumper($f);
my $hu_1kb = $maid.'_hu_1kb'; #file to convert
my $in = Bio::SeqIO->new(-file => $r,
-format => 'raw');
my $out = Bio::SeqIO->new(-file => ">$fa",
-format => 'Fasta');
while ( my $seq = $in->next_seq()) {
$out->write_seq($seq);
}
}
}

I keep getting the following error message:

-------------------- WARNING ---------------------
MSG: seq doesn't validate, mismatch is 13063
---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Attempting to set the sequence to [13063HU] which does not look healthy
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
STACK: Bio::Seq::SeqFactory::create  
C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
-----------------------------------------------------------

Anyone out there that can help me solve this?


From kjaja27 at yahoo.com  Fri Jun  5 23:42:13 2009
From: kjaja27 at yahoo.com (kayj)
Date: Fri, 5 Jun 2009 16:42:13 -0700 (PDT)
Subject: [Bioperl-l]  finding SNPs in a given region
Message-ID: <23897107.post@talk.nabble.com>


Hi All,

Is there a way to find the SNPs in a given region, I have the start and the
end base pair position, I am looking to download the SNPs in different
regions, Is that possible ?
 This is my first time using bioperl and any help will be greatly
appreciated

Thanks

-- 
View this message in context: http://www.nabble.com/finding-SNPs-in-a-given-region-tp23897107p23897107.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From kjaja27 at yahoo.com  Mon Jun  8 13:49:24 2009
From: kjaja27 at yahoo.com (kayj)
Date: Mon, 8 Jun 2009 06:49:24 -0700 (PDT)
Subject: [Bioperl-l]  How to extract SNPs
Message-ID: <23924432.post@talk.nabble.com>


Hi All,
I have several regions on the genome each is defined with the start and the
end base pair position. I am looking into using HapMap
http://hapmart.hapmap.org/BioMart/martview

 to extract the SNPs in these region given a population. I am new to bioperl
and any help will be greatly appreciated.


-- 
View this message in context: http://www.nabble.com/How-to-extract-SNPs-tp23924432p23924432.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bernd at pasteur.fr  Mon Jun  8 20:31:57 2009
From: bernd at pasteur.fr (bernd at pasteur.fr)
Date: Mon, 8 Jun 2009 22:31:57 +0200 (CEST)
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
Message-ID: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>

I tested the connection with wget and everything works fine.
I suspect that our proxy might be the problem but all variables are set
correctly (ftp_proxy, http_proxy and many more) I am not sure which
environment variable are being used...
I am not too familiar with all this and don't know where to look for the
right configurations.

Thanks,

Bernd

> Hi,
>
> The regression tests require an active Internet connection, as well as the
> DAS test server being up and running. It may be there was a temporary
> failure of one of those two. I just tested on my end and the regression
> tests ran ok, so could you try it again?
>
> Lincoln
>
> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>> -e
>> 'install Bio::Das'
>> This is perl, v5.8.9 built for darwin-2level
>> (please let me know if you need anything else)
>>
>>
>>
>> I am trying to install Bio::Das 1.11
>>
>>
>>
>> I get the following error:
>>
>>
>>
>> not ok 3
>>
>> not ok 4
>>
>> Can't call method "description" on an undefined value at t/01das.t line
>> 62.
>>
>>
>>
>> When going into the sources for 01das.t and printing out $db I get:
>>
>>
>>
>> $VAR1 = \bless( {
>>
>>                   'autotypes' => undef,
>>
>>                   'default_dsn' => undef,
>>
>>                   'autocategories' => undef,
>>
>>                   'sockets' => {},
>>
>>                   'aggregators' => [
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>
>> 'coding_exon'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' => 'CDS',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator'
>> ),
>>
>>                                      bless( {
>>
>>                                               'sub_parts' => [
>>
>>                                                                'EST_match'
>>
>>                                                              ],
>>
>>                                               'require_whole_object' =>
>> undef,
>>
>>                                               'main_method' =>
>> 'alignment',
>>
>>                                               'method' => 'alignment'
>>
>>                                             },
>> 'Bio::DB::GFF::Aggregator' )
>>
>>                                    ],
>>
>>                   'timeout' => undef,
>>
>>                   'oldstyle_api' => 1,
>>
>>                   'default_server' =>
>> 'http://www.wormbase.org/db/seq/das'
>>
>>                 }, 'Bio::Das' );
>>
>>
>>
>>
>>
>> @sources is empty
>>
>> And test(3, at sources) fails.
>>
>>
>>
>> Please advise.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Bernd
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Mon Jun  8 21:12:03 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 8 Jun 2009 17:12:03 -0400
Subject: [Bioperl-l] fasta conversion
In-Reply-To: <000e0cd6aa4cd53993046bdc1675@google.com>
References: <000e0cd6aa4cd53993046bdc1675@google.com>
Message-ID: <4737A1AB29FA47AF8FF4913448F5FAA3@NewLife>

you're getting the sequence descriptor rather than the sequence in the return 
from
$in->next_seq. Read up on what the 'raw' format actually entails in the 
Bio::SeqIO pod..
cheers MAJ
----- Original Message ----- 
From: <lsbrath at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, June 08, 2009 4:28 PM
Subject: [Bioperl-l] fasta conversion


> Hello!
>
> I am running into trouble while trying to convert a text file to fasta. It 
> should be simple enough but I am getting a wierd error message.
>
> This is my script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use File::Copy;
> use Bio::SeqIO;
>
>
> my $maid_dir = "C:/Documents and Settings/mgavi.brathwaite/Desktop/msa";
> my $maid = '13063';
>
> opendir my $dh, "$maid_dir"; # directory to search
> my @files = readdir $dh;
> #find the _fasta file
> for my $f (@files){
> my $fa = $maid_dir."/".$maid."_hu_1kb.fa";
> my $r = $maid_dir."/".$maid."_hu_1kb.txt";
> open (my $in,$r);
> if($f=~ m/^(\d+)_hu_1kb/){ # convert to fasta
>
> print Dumper($f);
> my $hu_1kb = $maid.'_hu_1kb'; #file to convert
> my $in = Bio::SeqIO->new(-file => $r,
> -format => 'raw');
> my $out = Bio::SeqIO->new(-file => ">$fa",
> -format => 'Fasta');
> while ( my $seq = $in->next_seq()) {
> $out->write_seq($seq);
> }
> }
> }
>
> I keep getting the following error message:
>
> -------------------- WARNING ---------------------
> MSG: seq doesn't validate, mismatch is 13063
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Attempting to set the sequence to [13063HU] which does not look healthy
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::PrimarySeq::seq C:/Perl/site/lib/Bio/PrimarySeq.pm:258
> STACK: Bio::PrimarySeq::new C:/Perl/site/lib/Bio/PrimarySeq.pm:210
> STACK: Bio::Seq::new C:/Perl/site/lib/Bio/Seq.pm:484
> STACK: Bio::Seq::SeqFactory::create 
> C:/Perl/site/lib/Bio/Seq/SeqFactory.pm:116
> STACK: C:/Perl/site/lib/Bio\SeqIO\raw.pm:119
> -----------------------------------------------------------
>
> Anyone out there that can help me solve this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From stefan.kirov at bms.com  Mon Jun  8 21:26:17 2009
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Mon, 08 Jun 2009 17:26:17 -0400
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina>
	<6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com>
	<47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
Message-ID: <4A2D81F9.8060509@bms.com>

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>                                                                'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bernd.jagla at pasteur.fr  Tue Jun  9 07:05:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Tue, 9 Jun 2009 09:05:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <4A2D81F9.8060509@bms.com>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr>
	<4A2D81F9.8060509@bms.com>
Message-ID: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>

Great, that works!!!
But since I am using Bio::Das within GBrowse I can't/don't want to  change
those sources. I tried setting some environment variable but that doesn't
seem to work either...
So far I have the set the following:
FTP_PROXY=http://...
HTTP_PROXY=http://...
PROXYFTP=http://...
PROXYHTTP=http://...
ftp_proxy=http://...
http_proxy=http://...
PROXY=http://...

Any suggestions are welcome.

Thanks,

Bernd


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Stefan Kirov
Sent: Monday, June 08, 2009 11:26 PM
To: bernd at pasteur.fr
Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

bernd at pasteur.fr wrote:
Try to add this line
-proxy => 'http:<YOUR PROXY HERE>',
in t/01das.t where the Bio::Das object is created (I think line 41).
Hope this works for you, it did for me.
Stefan
> I tested the connection with wget and everything works fine.
> I suspect that our proxy might be the problem but all variables are set
> correctly (ftp_proxy, http_proxy and many more) I am not sure which
> environment variable are being used...
> I am not too familiar with all this and don't know where to look for the
> right configurations.
>
> Thanks,
>
> Bernd
>
>   
>> Hi,
>>
>> The regression tests require an active Internet connection, as well as
the
>> DAS test server being up and running. It may be there was a temporary
>> failure of one of those two. I just tested on my end and the regression
>> tests ran ok, so could you try it again?
>>
>> Lincoln
>>
>> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla <bernd.jagla at pasteur.fr>
>> wrote:
>>
>>     
>>> Hi,
>>>
>>>
>>>
>>> I am working on a MAC 10.5.7; try to install Bio::Das using perl -MCPAN
>>> -e
>>> 'install Bio::Das'
>>> This is perl, v5.8.9 built for darwin-2level
>>> (please let me know if you need anything else)
>>>
>>>
>>>
>>> I am trying to install Bio::Das 1.11
>>>
>>>
>>>
>>> I get the following error:
>>>
>>>
>>>
>>> not ok 3
>>>
>>> not ok 4
>>>
>>> Can't call method "description" on an undefined value at t/01das.t line
>>> 62.
>>>
>>>
>>>
>>> When going into the sources for 01das.t and printing out $db I get:
>>>
>>>
>>>
>>> $VAR1 = \bless( {
>>>
>>>                   'autotypes' => undef,
>>>
>>>                   'default_dsn' => undef,
>>>
>>>                   'autocategories' => undef,
>>>
>>>                   'sockets' => {},
>>>
>>>                   'aggregators' => [
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
>>> 'coding_exon'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' => 'CDS',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator'
>>> ),
>>>
>>>                                      bless( {
>>>
>>>                                               'sub_parts' => [
>>>
>>>
'EST_match'
>>>
>>>                                                              ],
>>>
>>>                                               'require_whole_object' =>
>>> undef,
>>>
>>>                                               'main_method' =>
>>> 'alignment',
>>>
>>>                                               'method' => 'alignment'
>>>
>>>                                             },
>>> 'Bio::DB::GFF::Aggregator' )
>>>
>>>                                    ],
>>>
>>>                   'timeout' => undef,
>>>
>>>                   'oldstyle_api' => 1,
>>>
>>>                   'default_server' =>
>>> 'http://www.wormbase.org/db/seq/das'
>>>
>>>                 }, 'Bio::Das' );
>>>
>>>
>>>
>>>
>>>
>>> @sources is empty
>>>
>>> And test(3, at sources) fails.
>>>
>>>
>>>
>>> Please advise.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Bernd
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>       
>>
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Tue Jun  9 11:20:35 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Tue, 9 Jun 2009 12:20:35 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
Message-ID: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>

Hi,

I have been experimenting with the Bio::DB::EUtilities module, with  
help from the Cookbook. But I can't seem to figure out how to get the  
DNA sequence of a gene; all the examples seem to be fetching protein  
sequence.

How would i go about fetching a sequence using an Entrez GeneID?

thanks for any help

adam


From Kevin.M.Brown at asu.edu  Tue Jun  9 15:25:45 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 9 Jun 2009 08:25:45 -0700
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com>
	<19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
Message-ID: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Tue Jun  9 16:08:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 11:08:46 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
Message-ID: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>

All,

I've noticed a few methods in bioperl with names like 'no_Foo' that  
mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
problem I foresee are possible ambiguities, particularly with negative  
boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
Foo'), something that BioPerl also has with various settings.

I suggest we alias these as num_* to disambiguate that.  There's no  
easy way to change already in-place flag setting w/o going through a  
deprecation cycle, but we can promote using positive booleans where  
possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
the older 'no_*' methods as is for the time being and maybe deprecate  
them later.

If no one has objections I'll add these in as needed.

chris


From SMarkel at accelrys.com  Tue Jun  9 16:26:08 2009
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 9 Jun 2009 12:26:08 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>

Chris,

I just checked our code for the Sequence Analysis Collection in
Pipeline Pilot.  We've got a few places we'd need to make code
changes, but we like your suggestion.  So, no objections from us.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, 09 June 2009 9:09 AM
> To: BioPerl List
> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
> 
> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
> problem I foresee are possible ambiguities, particularly with negative
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no
> easy way to change already in-place flag setting w/o going through a
> deprecation cycle, but we can promote using positive booleans where
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave
> the older 'no_*' methods as is for the time being and maybe deprecate
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jun  9 17:03:16 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 12:03:16 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<1F1240778FB0AF46B4E5A72C44D2C7472A636328@exch1-hi.accelrys.net>
Message-ID: <A5461F02-AA81-4A02-88DA-181B33EE41FE@illinois.edu>

I don't think it would require code changes right away; for the time  
being no_* will just alias num_*.  We can probably have deprecation  
warnings activate when we reach a particular version.

chris

On Jun 9, 2009, at 11:26 AM, Scott Markel wrote:

> Chris,
>
> I just checked our code for the Sequence Analysis Collection in
> Pipeline Pilot.  We've got a few places we'd need to make code
> changes, but we like your suggestion.  So, no objections from us.
>
> Scott
>
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Tuesday, 09 June 2009 9:09 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] use of no_* to mean 'number_of', negative  
>> booleans
>>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The
>> problem I foresee are possible ambiguities, particularly with  
>> negative
>> boolean checks (eg 'no_Foo' could also mean 'this instance contains  
>> no
>> Foo'), something that BioPerl also has with various settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no
>> easy way to change already in-place flag setting w/o going through a
>> deprecation cycle, but we can promote using positive booleans where
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
>> leave
>> the older 'no_*' methods as is for the time being and maybe deprecate
>> them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun  9 16:32:51 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 9 Jun 2009 12:32:51 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <4BA7FB5466B34B59B7C455E1173C1FA7@NewLife>

+1, absolutely- MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 09, 2009 12:08 PM
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans


> All,
> 
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with negative  
> boolean checks (eg 'no_Foo' could also mean 'this instance contains no  
> Foo'), something that BioPerl also has with various settings.
> 
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can leave  
> the older 'no_*' methods as is for the time being and maybe deprecate  
> them later.
> 
> If no one has objections I'll add these in as needed.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hlapp at gmx.net  Tue Jun  9 17:18:05 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 13:18:05 -0400
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
Message-ID: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>

Great suggestions, I'm all for it.

	-hilmar

On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:

> All,
>
> I've noticed a few methods in bioperl with names like 'no_Foo' that  
> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The  
> problem I foresee are possible ambiguities, particularly with  
> negative boolean checks (eg 'no_Foo' could also mean 'this instance  
> contains no Foo'), something that BioPerl also has with various  
> settings.
>
> I suggest we alias these as num_* to disambiguate that.  There's no  
> easy way to change already in-place flag setting w/o going through a  
> deprecation cycle, but we can promote using positive booleans where  
> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can  
> leave the older 'no_*' methods as is for the time being and maybe  
> deprecate them later.
>
> If no one has objections I'll add these in as needed.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From florent.angly at gmail.com  Tue Jun  9 18:41:51 2009
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 09 Jun 2009 11:41:51 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
Message-ID: <4A2EACEF.3090809@gmail.com>

Agree! no_* is prone to misunderstandings.
Also, some BioPerl code uses nof_*, which I quite like.
Florent

Hilmar Lapp wrote:
> Great suggestions, I'm all for it.
>
>     -hilmar
>
> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>
>> All,
>>
>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>> problem I foresee are possible ambiguities, particularly with 
>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>> contains no Foo'), something that BioPerl also has with various 
>> settings.
>>
>> I suggest we alias these as num_* to disambiguate that.  There's no 
>> easy way to change already in-place flag setting w/o going through a 
>> deprecation cycle, but we can promote using positive booleans where 
>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>> leave the older 'no_*' methods as is for the time being and maybe 
>> deprecate them later.
>>
>> If no one has objections I'll add these in as needed.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Tue Jun  9 18:55:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 13:55:48 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EACEF.3090809@gmail.com>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>
	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>
	<4A2EACEF.3090809@gmail.com>
Message-ID: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>

We could probably alias nof_* with num_* just for consistency, but  
leave nof_* as is and not deprecate it (I don't think anyone would  
confuse nof* with no*).

chris

On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:

> Agree! no_* is prone to misunderstandings.
> Also, some BioPerl code uses nof_*, which I quite like.
> Florent
>
> Hilmar Lapp wrote:
>> Great suggestions, I'm all for it.
>>
>>    -hilmar
>>
>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>
>>> All,
>>>
>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>> The problem I foresee are possible ambiguities, particularly with  
>>> negative boolean checks (eg 'no_Foo' could also mean 'this  
>>> instance contains no Foo'), something that BioPerl also has with  
>>> various settings.
>>>
>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>> no easy way to change already in-place flag setting w/o going  
>>> through a deprecation cycle, but we can promote using positive  
>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>> time being and maybe deprecate them later.
>>>
>>> If no one has objections I'll add these in as needed.
>>>
>>> chris
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mauricio at open-bio.org  Tue Jun  9 19:33:18 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Tue, 09 Jun 2009 14:33:18 -0500
Subject: [Bioperl-l] Project Help
In-Reply-To: <146497.36250.qm@web8407.mail.in.yahoo.com>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
Message-ID: <4A2EB8FE.4080402@open-bio.org>

Hi Chirag,

The OBF applied for the GSoC 2009 but unfortunately we were not 
accepted. However, other organizations/projects made their way into it 
and have been kind enough to adopt some of the ideas originally proposed 
under the OBF's initiative. I'm Cc'ing this to the BioPerl mailing list 
so the people involved with those projects can give you more details.

Regards,
Mauricio.


chirag matkar wrote:
> Hello,
> THis is Chirag Matkar wanting to know whether there were any GSOC 2009 projects underway in open Bioinformatics Foundation.
> Also as i am myself a perl developer can i can some stipend or internship for building perl modules?.
> 
> Thanking You,
> Regards Chirag.
> 
> 
>       Explore and discover exciting holidays and getaways with Yahoo! India Travel http://in.travel.yahoo.com/
> 


From rmb32 at cornell.edu  Tue Jun  9 19:12:54 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 12:12:54 -0700
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
Message-ID: <4A2EB436.8020506@cornell.edu>

Why not just add deprecation warnings now?  Or you could add deprecation 
warnings now that only print if $Bio::Root::Version::VERSION >= 
something.  Best to do it while one is thinking about it, I always say. 
  Cause I always forget to do it later.  ;-)

Rob

Chris Fields wrote:
> We could probably alias nof_* with num_* just for consistency, but leave 
> nof_* as is and not deprecate it (I don't think anyone would confuse 
> nof* with no*).
> 
> chris
> 
> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
> 
>> Agree! no_* is prone to misunderstandings.
>> Also, some BioPerl code uses nof_*, which I quite like.
>> Florent
>>
>> Hilmar Lapp wrote:
>>> Great suggestions, I'm all for it.
>>>
>>>    -hilmar
>>>
>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>
>>>> All,
>>>>
>>>> I've noticed a few methods in bioperl with names like 'no_Foo' that 
>>>> mean 'number of Foo' (such as SimpleAlign's no_sequences).  The 
>>>> problem I foresee are possible ambiguities, particularly with 
>>>> negative boolean checks (eg 'no_Foo' could also mean 'this instance 
>>>> contains no Foo'), something that BioPerl also has with various 
>>>> settings.
>>>>
>>>> I suggest we alias these as num_* to disambiguate that.  There's no 
>>>> easy way to change already in-place flag setting w/o going through a 
>>>> deprecation cycle, but we can promote using positive booleans where 
>>>> possible (eg 'is_foo' or 'has_foo' instead of 'no_foo').  We can 
>>>> leave the older 'no_*' methods as is for the time being and maybe 
>>>> deprecate them later.
>>>>
>>>> If no one has objections I'll add these in as needed.
>>>>
>>>> chris
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 20:19:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:19:03 -0500
Subject: [Bioperl-l] use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2EB436.8020506@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
Message-ID: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>

On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:

> Why not just add deprecation warnings now?  Or you could add  
> deprecation warnings now that only print if  
> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
> is thinking about it, I always say.  Cause I always forget to do it  
> later.  ;-)
>
> Rob

Actually, that's one thing I want to implement within Root, namely the  
ability to do this:

$self->deprecated(-message     => 'method Foo is deprecated',
                   -start_ver   => $version1,
                   -throw_ver   => $version2
);

So it's essentially a noop and invisible up to start_ver (upon where  
it warns), then throws after, well, throw_ver.  I could probably  
finagle that in w/o destroying things...

chris

> Chris Fields wrote:
>> We could probably alias nof_* with num_* just for consistency, but  
>> leave nof_* as is and not deprecate it (I don't think anyone would  
>> confuse nof* with no*).
>> chris
>> On Jun 9, 2009, at 1:41 PM, Florent Angly wrote:
>>> Agree! no_* is prone to misunderstandings.
>>> Also, some BioPerl code uses nof_*, which I quite like.
>>> Florent
>>>
>>> Hilmar Lapp wrote:
>>>> Great suggestions, I'm all for it.
>>>>
>>>>   -hilmar
>>>>
>>>> On Jun 9, 2009, at 12:08 PM, Chris Fields wrote:
>>>>
>>>>> All,
>>>>>
>>>>> I've noticed a few methods in bioperl with names like 'no_Foo'  
>>>>> that mean 'number of Foo' (such as SimpleAlign's no_sequences).   
>>>>> The problem I foresee are possible ambiguities, particularly  
>>>>> with negative boolean checks (eg 'no_Foo' could also mean 'this  
>>>>> instance contains no Foo'), something that BioPerl also has with  
>>>>> various settings.
>>>>>
>>>>> I suggest we alias these as num_* to disambiguate that.  There's  
>>>>> no easy way to change already in-place flag setting w/o going  
>>>>> through a deprecation cycle, but we can promote using positive  
>>>>> booleans where possible (eg 'is_foo' or 'has_foo' instead of  
>>>>> 'no_foo').  We can leave the older 'no_*' methods as is for the  
>>>>> time being and maybe deprecate them later.
>>>>>
>>>>> If no one has objections I'll add these in as needed.
>>>>>
>>>>> chris
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu


From cjfields at illinois.edu  Tue Jun  9 20:45:37 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 15:45:37 -0500
Subject: [Bioperl-l] deprecated(), was Re:  use of no_* to mean 'number_of',
	negative booleans
In-Reply-To: <EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>
	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>
	<4A2EB436.8020506@cornell.edu>
	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
Message-ID: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>

On Jun 9, 2009, at 3:19 PM, Chris Fields wrote:

> On Jun 9, 2009, at 2:12 PM, Robert Buels wrote:
>
>> Why not just add deprecation warnings now?  Or you could add  
>> deprecation warnings now that only print if  
>> $Bio::Root::Version::VERSION >= something.  Best to do it while one  
>> is thinking about it, I always say.  Cause I always forget to do it  
>> later.  ;-)
>>
>> Rob
>
> Actually, that's one thing I want to implement within Root, namely  
> the ability to do this:
>
> $self->deprecated(-message     => 'method Foo is deprecated',
>                  -start_ver   => $version1,
>                  -throw_ver   => $version2
> );
>
> So it's essentially a noop and invisible up to start_ver (upon where  
> it warns), then throws after, well, throw_ver.  I could probably  
> finagle that in w/o destroying things...
>
> chris

Just to note, this is mainly to allow us devs the opportunity to add  
these to main trunk w/o having to worry about merges over to the 1.6  
branch (where the version is different).  We don't want the dep  
warnings showing up there right away, but maybe in a point release or  
minor version.

chris


From hlapp at gmx.net  Tue Jun  9 23:09:26 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 9 Jun 2009 19:09:26 -0400
Subject: [Bioperl-l] Project Help
In-Reply-To: <4A2EB8FE.4080402@open-bio.org>
References: <146497.36250.qm@web8407.mail.in.yahoo.com>
	<4A2EB8FE.4080402@open-bio.org>
Message-ID: <74C0D011-A5A4-4DF1-93D8-13401A18E29A@gmx.net>

Hi Chirag,

check out the Bio{Perl,Python,Ruby}-related projects (go to 'Accepted  
Projects') at

http://hackathon.nescent.org/Phyloinformatics_Summer_of_Code_2009

	-hilmar

On Jun 9, 2009, at 3:33 PM, Mauricio Herrera Cuadra wrote:

> Hi Chirag,
>
> The OBF applied for the GSoC 2009 but unfortunately we were not  
> accepted. However, other organizations/projects made their way into  
> it and have been kind enough to adopt some of the ideas originally  
> proposed under the OBF's initiative. I'm Cc'ing this to the BioPerl  
> mailing list so the people involved with those projects can give you  
> more details.
>
> Regards,
> Mauricio.
>
>
> chirag matkar wrote:
>> Hello,
>> THis is Chirag Matkar wanting to know whether there were any GSOC  
>> 2009 projects underway in open Bioinformatics Foundation.
>> Also as i am myself a perl developer can i can some stipend or  
>> internship for building perl modules?.
>> Thanking You,
>> Regards Chirag.
>>      Explore and discover exciting holidays and getaways with  
>> Yahoo! India Travel http://in.travel.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rmb32 at cornell.edu  Wed Jun 10 01:13:36 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 09 Jun 2009 18:13:36 -0700
Subject: [Bioperl-l] deprecated(),
 was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
Message-ID: <4A2F08C0.3010609@cornell.edu>

Chris Fields wrote:
>> Actually, that's one thing I want to implement within Root, namely the 
>> ability to do this:
>>
>> $self->deprecated(-message     => 'method Foo is deprecated',
>>                  -start_ver   => $version1,
>>                  -throw_ver   => $version2
>> );

Here's a patch with tests against the svn trunk head.  Is this what you 
had in mind?

-- 
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deprecated.patch
Type: text/x-diff
Size: 5601 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090609/431738da/attachment-0004.bin>

From cjfields at illinois.edu  Wed Jun 10 02:54:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 21:54:47 -0500
Subject: [Bioperl-l] deprecated(),
	was Re:  use of no_* to mean 'number_of', negative booleans
In-Reply-To: <4A2F08C0.3010609@cornell.edu>
References: <02E6233D-52CF-4853-B4F0-E2AFC2FA3557@illinois.edu>	<B23E5145-0DBB-4C3D-8536-063B3E9711DF@gmx.net>	<4A2EACEF.3090809@gmail.com>	<FF3C4540-ED73-4878-92B9-5D6A610D3C4E@illinois.edu>	<4A2EB436.8020506@cornell.edu>	<EFB0C00A-49CA-49BD-A22D-4B23D0C09E7B@illinois.edu>
	<E2D3CA55-A48A-424D-9AC3-24F7C44A9548@illinois.edu>
	<4A2F08C0.3010609@cornell.edu>
Message-ID: <20652B6B-1BF3-477C-9619-4149748E5B9B@illinois.edu>

On Jun 9, 2009, at 8:13 PM, Robert Buels wrote:

> Chris Fields wrote:
>>> Actually, that's one thing I want to implement within Root, namely  
>>> the ability to do this:
>>>
>>> $self->deprecated(-message     => 'method Foo is deprecated',
>>>                 -start_ver   => $version1,
>>>                 -throw_ver   => $version2
>>> );
>
> Here's a patch with tests against the svn trunk head.  Is this what  
> you had in mind?
>
> -- 
> Rob

Funny, I had written up almost exactly the same code, just a little  
rearranged.  I've modified mine to follow your use of -warn_version (I  
also had -throw_version as a synonym of -version, JIC).  Also, for the  
tests I created a temp class in the tests and ran tests off that.   
Thanks for the patch!

chris


From maj at fortinbras.us  Wed Jun 10 04:10:12 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:10:12 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
Message-ID: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>

Hi All, 

I've built a public Amazon machine image, loaded with many many 
goodies, including the most recent (r15747) trunks of 
- bioperl-live
- bioperl-run
- bioperl-db/biosql
The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml, 
emboss, and more are all there (and most even pass bioperl-run tests), and 
perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
(r1071) and others. This is *not* a lean mean fighting machine. 

Please give it a try if you're so inclined. Fuller details (including 
image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max.

Ping me if it doesn't work.

Cheers, 
Mark


From cjfields at illinois.edu  Wed Jun 10 04:36:40 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 9 Jun 2009 23:36:40 -0500
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>

I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
do you have mysql or pg?

Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
rakudo and we could do some damage...

chris

On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jun 10 04:39:36 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 10 Jun 2009 00:39:36 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <6A7D85B8037848F090C35A639C84D870@NewLife>

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  
> do you have mysql or pg?

-both (I'm all about options...)


> 
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  
> rakudo and we could do some damage...
> 

bioperl-max-0.1.1, here we come...


> chris
> 

cheers MAJ

> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
> 
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  
>> tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
>> .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From bernd.jagla at pasteur.fr  Wed Jun 10 07:43:47 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 09:43:47 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <7F2215CBC16B48BE8C548BB69E131890@zillumina>

I wrote a small test program to test the environment variables and I have
them:

          'SSH_CLIENT' => '157.
          'FTP_PROXY' => 'http://
          'HTTP_PROXY' => 'http://cache.past
          'SSH_TTY' => '/dev/ttys002',
          'ftp_proxy' => 'http://
          'http_proxy' => 'http://

Using the "-proxy" works, without it doesn't. 

(and yes, I export the variables..)

Thanks for any suggestions.

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.jagla at pasteur.fr  Wed Jun 10 08:16:08 2009
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Wed, 10 Jun 2009 10:16:08 +0200
Subject: [Bioperl-l] Bio:Das 1.11 installation problem
In-Reply-To: <1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
References: <7A76102A29824B1C9DDB7F3B1DC45D41@zillumina><6dce9a0b0906081000j7c489fd3t3c5974794ad3e3e7@mail.gmail.com><47482.157.99.64.103.1244493117.squirrel@php.pasteur.fr><4A2D81F9.8060509@bms.com><19FC487A25B6478FA4DE91B81A1FC52C@zillumina>
	<1A4207F8295607498283FE9E93B775B40604FBFB@EX02.asurite.ad.asu.edu>
Message-ID: <F5844533CFCB425DA400C888A9995F70@zillumina>

To whom it may concern:

I added 
  $self->proxy($ENV{'HTTP_PROXY'}) if $ENV{'HTTP_PROXY'};

Around line 72 before:
  $self->proxy($proxy) if $proxy;

In Das.pm. This did the trick.

For completeness I also edited Fetch.pm:
Around line 134:
  $proxy = $ENV{'HTTP_PROXY'} if $ENV{'HTTP_PROXY'};
Before:
  my $dest = $proxy || $request->url;

Best,

Bernd

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Kevin Brown
Sent: Tuesday, June 09, 2009 5:26 PM
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem

Dumb question, but are you exporting the variables after you set them?

FTP_PROXY=http://...
HTTP_PROXY=http://...
export FTP_PROXY HTTP_PROXY 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bernd Jagla
> Sent: Tuesday, June 09, 2009 12:06 AM
> To: 'Stefan Kirov'; bernd at pasteur.fr
> Cc: 'Lincoln Stein'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> Great, that works!!!
> But since I am using Bio::Das within GBrowse I can't/don't 
> want to  change
> those sources. I tried setting some environment variable but 
> that doesn't
> seem to work either...
> So far I have the set the following:
> FTP_PROXY=http://...
> HTTP_PROXY=http://...
> PROXYFTP=http://...
> PROXYHTTP=http://...
> ftp_proxy=http://...
> http_proxy=http://...
> PROXY=http://...
> 
> Any suggestions are welcome.
> 
> Thanks,
> 
> Bernd
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Stefan Kirov
> Sent: Monday, June 08, 2009 11:26 PM
> To: bernd at pasteur.fr
> Cc: Lincoln Stein; Bernd Jagla; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:Das 1.11 installation problem
> 
> bernd at pasteur.fr wrote:
> Try to add this line
> -proxy => 'http:<YOUR PROXY HERE>',
> in t/01das.t where the Bio::Das object is created (I think line 41).
> Hope this works for you, it did for me.
> Stefan
> > I tested the connection with wget and everything works fine.
> > I suspect that our proxy might be the problem but all 
> variables are set
> > correctly (ftp_proxy, http_proxy and many more) I am not sure which
> > environment variable are being used...
> > I am not too familiar with all this and don't know where to 
> look for the
> > right configurations.
> >
> > Thanks,
> >
> > Bernd
> >
> >   
> >> Hi,
> >>
> >> The regression tests require an active Internet 
> connection, as well as
> the
> >> DAS test server being up and running. It may be there was 
> a temporary
> >> failure of one of those two. I just tested on my end and 
> the regression
> >> tests ran ok, so could you try it again?
> >>
> >> Lincoln
> >>
> >> On Mon, Jun 8, 2009 at 12:24 PM, Bernd Jagla 
> <bernd.jagla at pasteur.fr>
> >> wrote:
> >>
> >>     
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I am working on a MAC 10.5.7; try to install Bio::Das 
> using perl -MCPAN
> >>> -e
> >>> 'install Bio::Das'
> >>> This is perl, v5.8.9 built for darwin-2level
> >>> (please let me know if you need anything else)
> >>>
> >>>
> >>>
> >>> I am trying to install Bio::Das 1.11
> >>>
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>>
> >>>
> >>> not ok 3
> >>>
> >>> not ok 4
> >>>
> >>> Can't call method "description" on an undefined value at 
> t/01das.t line
> >>> 62.
> >>>
> >>>
> >>>
> >>> When going into the sources for 01das.t and printing out 
> $db I get:
> >>>
> >>>
> >>>
> >>> $VAR1 = \bless( {
> >>>
> >>>                   'autotypes' => undef,
> >>>
> >>>                   'default_dsn' => undef,
> >>>
> >>>                   'autocategories' => undef,
> >>>
> >>>                   'sockets' => {},
> >>>
> >>>                   'aggregators' => [
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> >>> 'coding_exon'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               
> 'main_method' => 'CDS',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator'
> >>> ),
> >>>
> >>>                                      bless( {
> >>>
> >>>                                               'sub_parts' => [
> >>>
> >>>
> 'EST_match'
> >>>
> >>>                                                              ],
> >>>
> >>>                                               
> 'require_whole_object' =>
> >>> undef,
> >>>
> >>>                                               'main_method' =>
> >>> 'alignment',
> >>>
> >>>                                               'method' => 
> 'alignment'
> >>>
> >>>                                             },
> >>> 'Bio::DB::GFF::Aggregator' )
> >>>
> >>>                                    ],
> >>>
> >>>                   'timeout' => undef,
> >>>
> >>>                   'oldstyle_api' => 1,
> >>>
> >>>                   'default_server' =>
> >>> 'http://www.wormbase.org/db/seq/das'
> >>>
> >>>                 }, 'Bio::Das' );
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> @sources is empty
> >>>
> >>> And test(3, at sources) fails.
> >>>
> >>>
> >>>
> >>> Please advise.
> >>>
> >>>
> >>>
> >>> Thanks,
> >>>
> >>>
> >>>
> >>> Bernd
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>       
> >>
> >> --
> >> Lincoln D. Stein
> >> Director, Informatics and Biocomputing Platform
> >> Ontario Institute for Cancer Research
> >> 101 College St., Suite 800
> >> Toronto, ON, Canada M5G0A3
> >> 416 673-8514
> >> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >   
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ron at ron.dk  Wed Jun 10 07:35:09 2009
From: ron at ron.dk (Rasmus Ory Nielsen)
Date: Wed, 10 Jun 2009 09:35:09 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using rebase
	file.
Message-ID: <4A2F622D.5060500@ron.dk>

Hi,

This is my first time using bioperl for restriction analysis, so please bear 
with me, if this is a FAQ.

I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
script shown at the bottom of the mail.
My bioperl version is bioperl-live nightly from 09-Jun-2009.

The scripts throws an exception - see below. But, if I comment out the 
'-enzymes' argument, so it uses the built-in collection of enzymes, it works.

My problem is, that I need to use some of the enzymes that are only available 
in rebase. So how do I get this working?

Thanks for your attention.

Best regards,
Rasmus Ory Nielsen


############################################################
Output from the script:
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------

------------- EXCEPTION -------------
MSG: Bad end parameter (11). End must be less than the total length of 
sequence (total=7)
STACK Bio::PrimarySeq::subseq 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
STACK Bio::Restriction::Analysis::_enzyme_sites 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
STACK Bio::Restriction::Analysis::_cuts 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
STACK Bio::Restriction::Analysis::cut 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
STACK Bio::Restriction::Analysis::fragment_maps 
/usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
STACK toplevel ./restriction_test.pl:30
-------------------------------------

[roni at ksdhcp ~]$


############################################################
Output from the script with the '-enzymes' argument commented out
############################################################

[roni at ksdhcp ~]$ ./restriction_test.pl

--------------------- WARNING ---------------------
MSG: The enzyme name CviKI-1 was changed to CviKI-I
---------------------------------------------------
$VAR1 = [
           {
             'seq' => 'CTCGACCGTTAGCAA',
             'end' => 15,
             'start' => '1'
           },
           {
             'seq' => 'AGCTTTCTACCGTTATCGT',
             'end' => 34,
             'start' => '16'
           }
         ];
[roni at ksdhcp ~]$

############################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::PrimarySeq;
use Bio::Restriction::IO;
use Bio::Restriction::Analysis;
use Data::Dumper;

# create seq obj
my $seqobj = new Bio::PrimarySeq(
     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
     -primary_id => 'test',
     -molecule   => 'dna'
);

# read rebase file
my $rebase_io = Bio::Restriction::IO->new(
     -file   => 'withrefm.906',
     -format => 'withrefm',
);
my $rebase_collection = $rebase_io->read;

# start restriction analysis
my $restriction_analysis = Bio::Restriction::Analysis->new(
     -seq     => $seqobj,
     -enzymes => $rebase_collection,    # it works with this line commented out
);

# retrieve fragment maps
my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
print Dumper \@fragment_maps;


From awitney at sgul.ac.uk  Wed Jun 10 11:19:55 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 12:19:55 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
Message-ID: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>

Hi,

I am going through the EUtilities Cookbook, but the last example (in  
section 2.3.1) fails with:

Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.

This is with BioPerl 1.6.0, perl v5.8.8

thanks for any help

adam


From hlapp at gmx.net  Wed Jun 10 12:08:54 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 10 Jun 2009 08:08:54 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
Message-ID: <4B3BCEA2-DA96-46B5-9BA2-F4EDDACC3A96@gmx.net>

Very cool! -hilmar

On Jun 10, 2009, at 12:10 AM, Mark A. Jensen wrote:

> Hi All,
>
> I've built a public Amazon machine image, loaded with many many
> goodies, including the most recent (r15747) trunks of
> - bioperl-live
> - bioperl-run
> - bioperl-db/biosql
> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
> emboss, and more are all there (and most even pass bioperl-run  
> tests), and
> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
> (r1071) and others. This is *not* a lean mean fighting machine.
>
> Please give it a try if you're so inclined. Fuller details (including
> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max 
> .
>
> Ping me if it doesn't work.
>
> Cheers,
> Mark
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at illinois.edu  Wed Jun 10 12:28:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 07:28:44 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
Message-ID: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>

I can reproduce that; I'll look into it.

chris

On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:

> Hi,
>
> I am going through the EUtilities Cookbook, but the last example (in  
> section 2.3.1) fails with:
>
> Can't use an undefined value as an ARRAY reference at /usr/lib/perl5/ 
> site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>
> This is with BioPerl 1.6.0, perl v5.8.8
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 13:20:43 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:20:43 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
Message-ID: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>

EntrezGene doesn't contain the sequence information; I believe it just  
links to the sequence in a specified nuc record with given  
coordinates.  You can get to it, but it takes a little trickery; in  
essence you need to use the UID to get the gene summary information,  
extract that, then grab the sequence record using seqstart, seqend,  
and seqstrand.

A dump of esummary info for UID 18131, for instance, (using $eutil- 
 >print_all) gives this info (abbreviated somewhat):

UID                 :18131
Name                :Notch3
Description         :Notch gene homolog 3 (Drosophila)
Orgname             :Mus musculus
...
GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837
GeneWeight          :23049

The genomic info section gives the accession.version, start, end, and  
(implicitly) the strand (ChrStop is less that ChrStart). I have added  
an example to the cookbook:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F

chris

On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:

> Hi,
>
> I have been experimenting with the Bio::DB::EUtilities module, with  
> help from the Cookbook. But I can't seem to figure out how to get  
> the DNA sequence of a gene; all the examples seem to be fetching  
> protein sequence.
>
> How would i go about fetching a sequence using an Entrez GeneID?
>
> thanks for any help
>
> adam
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 13:33:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 08:33:51 -0500
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
	<98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>
Message-ID: <10B8484F-AE84-4E0A-964F-0DC964F5156C@illinois.edu>

Adam,

Okay, fixed that and the previous issue with 'use an undefined value  
as an ARRAY reference'.  The previous issue appears to be due to a  
change in the XML output from NCBI (it used to give the IDs at one  
point).  Also made the wiki changes for this; didn't take long to find  
everything.

Thanks for pointing that out!  If you find any more issues feel free  
to make the necessary changes on the wiki or point them out if they're  
in code.

chris

On Jun 10, 2009, at 8:12 AM, Adam Witney wrote:

> Hi Chris,
>
> not sure if I should start a new thread for this, but it is related  
> to the EUtilities Cookbook and LinkSet.pm.
>
> There are several references in the Cookbook to the method  
> "get_linkname", however this seems to have changed in the recent  
> version of LinkSet.pm to "get_link_name". But one reference to the  
> old method name still exists in LinkSet.pm, as shown by this patch:
>
> --- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
> LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
> +++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
> @@ -220,7 +220,7 @@
> =cut
>
> sub get_link_name {
> -    return ($_[0]->get_linknames)[0];
> +    return ($_[0]->get_link_names)[0];
> }
>
> =head2 get_submitted_ids
>
> If i haven't got this all wrong entirely, I could go through and fix  
> the Cookbook entries if that was useful?
>
> adam
>
>
> On 10 Jun 2009, at 13:28, Chris Fields wrote:
>
>> I can reproduce that; I'll look into it.
>>
>> chris
>>
>> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I am going through the EUtilities Cookbook, but the last example  
>>> (in section 2.3.1) fails with:
>>>
>>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>>
>>> This is with BioPerl 1.6.0, perl v5.8.8
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From awitney at sgul.ac.uk  Wed Jun 10 13:12:05 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 14:12:05 +0100
Subject: [Bioperl-l] EUtilities Cookbook example fails
In-Reply-To: <1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
References: <40B53E84-2EBC-471A-9261-CED1973C7A0C@sgul.ac.uk>
	<1AB3CE57-4114-4440-870F-C15B39F42D77@illinois.edu>
Message-ID: <98E59238-260F-49A1-BA78-51DF94FF55A8@sgul.ac.uk>


Hi Chris,

not sure if I should start a new thread for this, but it is related to  
the EUtilities Cookbook and LinkSet.pm.

There are several references in the Cookbook to the method  
"get_linkname", however this seems to have changed in the recent  
version of LinkSet.pm to "get_link_name". But one reference to the old  
method name still exists in LinkSet.pm, as shown by this patch:

--- /usr/local/lib/perl5/site_perl/5.8.9/Bio/Tools/EUtilities/Link/ 
LinkSet.pm	2009-02-20 12:36:37.000000000 +0000
+++ /Users/adam/Desktop/LinkSet.pm	2009-06-10 13:58:49.000000000 +0100
@@ -220,7 +220,7 @@
  =cut

  sub get_link_name {
-    return ($_[0]->get_linknames)[0];
+    return ($_[0]->get_link_names)[0];
  }

  =head2 get_submitted_ids

If i haven't got this all wrong entirely, I could go through and fix  
the Cookbook entries if that was useful?

adam


On 10 Jun 2009, at 13:28, Chris Fields wrote:

> I can reproduce that; I'll look into it.
>
> chris
>
> On Jun 10, 2009, at 6:19 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I am going through the EUtilities Cookbook, but the last example  
>> (in section 2.3.1) fails with:
>>
>> Can't use an undefined value as an ARRAY reference at /usr/lib/ 
>> perl5/site_perl/5.8.8/Bio/Tools/EUtilities/Link/LinkSet.pm line 470.
>>
>> This is with BioPerl 1.6.0, perl v5.8.8
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Wed Jun 10 14:10:21 2009
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 10 Jun 2009 15:10:21 +0100
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
Message-ID: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>


Thanks for the pointers Chris.

The new example on the Cookbook doesn't quite work for me as ChrStart  
seems to appear in the DocSum twice, thus  
get_contents_by_name('ChrStart') returns a list of two values (which  
writes the second ChrStart into $end). Also the $start and $end seem  
to be out by 1, so I needed to change it to this:

my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
my ($start) = ($docsum->get_contents_by_name('ChrStart'));
my ($end) = ($docsum->get_contents_by_name('ChrStop'));

  $start += 1;
  $end += 1;

Ah, looking at this further there appears to be something going on in  
the response from Entrez. Compare these two gene records:

http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi? 
db=gene&id=18131		(your example below)
http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
		(my gene)

In both cases you can see that ChrStart appears twice, once as part of  
the GenomicInfo list and once on its own at the bottom. In my example  
above the two ChrStart values match, but in the Notch3 example you  
posted the 2nd ChrStart seems to be the same as the ChrStop in the  
GenomicInfo list. Do you know if the second ChrStart has a separate  
meaning?

I guess in the Cookbook example we would need to make sure that the  
get_contents_by_name('ChrStart') picks up the value from the  
GenomicInfo list, is this possible?

thanks again

adam


On 10 Jun 2009, at 14:20, Chris Fields wrote:

> EntrezGene doesn't contain the sequence information; I believe it  
> just links to the sequence in a specified nuc record with given  
> coordinates.  You can get to it, but it takes a little trickery; in  
> essence you need to use the UID to get the gene summary information,  
> extract that, then grab the sequence record using seqstart, seqend,  
> and seqstrand.
>
> A dump of esummary info for UID 18131, for instance, (using $eutil- 
> >print_all) gives this info (abbreviated somewhat):
>
> UID                 :18131
> Name                :Notch3
> Description         :Notch gene homolog 3 (Drosophila)
> Orgname             :Mus musculus
> ...
> GenomicInfo
>    GenomicInfoType
>        ChrLoc      :17
>        ChrAccVer   :NC_000083.5
>        ChrStart    :32303796
>        ChrStop     :32257837
> GeneWeight          :23049
>
> The genomic info section gives the accession.version, start, end,  
> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
> added an example to the cookbook:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>
> chris
>
> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>
>> Hi,
>>
>> I have been experimenting with the Bio::DB::EUtilities module, with  
>> help from the Cookbook. But I can't seem to figure out how to get  
>> the DNA sequence of a gene; all the examples seem to be fetching  
>> protein sequence.
>>
>> How would i go about fetching a sequence using an Entrez GeneID?
>>
>> thanks for any help
>>
>> adam
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 10 17:56:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 10 Jun 2009 12:56:46 -0500
Subject: [Bioperl-l] fetch gene sequence with EUtilities.pm
In-Reply-To: <B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
References: <78771A58-8FCC-46ED-BE4F-8B0915BD324D@sgul.ac.uk>
	<9B52E71A-3183-412A-81E7-803C28B8082E@illinois.edu>
	<B41AD5E8-138D-4B8D-AD4C-7DEA230FD46F@sgul.ac.uk>
Message-ID: <CD8513A6-0872-4174-9333-94D76D5711F8@illinois.edu>

Adam,

That's really odd that they do that (both the duplication of ChrStart  
and the coordinates being off-by-one, which means they appear to be 0- 
based).  It's possible that the second ChrStart is meant to represent  
the actual first base for the gene irrespective of start/end.  My  
example is on the opposite strand, so the second ChrStart == end.

The fact that they use the same element name is slightly annoying (and  
seemingly redundant), but there is a workaround.  We grab only the  
layered information specifically; in this case we want everything  
below 'GenomicInfoType':

GenomicInfo
     GenomicInfoType
         ChrLoc      :17
         ChrAccVer   :NC_000083.5
         ChrStart    :32303796
         ChrStop     :32257837

So, we can do this in the DocSum loop (that appears to work for your  
example):

############################

for my $docsum ($eutil->next_DocSum) {
     # to ensure we grab the right ChrStart information, we grab the  
Item above
     # it in the Item hierarchy (visible via print_all from the eutil  
instance)
     my ($item) = $docsum->get_Items_by_name('GenomicInfoType');

     my %item_data = map {$_ => 0} qw(ChrAccVer ChrStart ChrStop);

     while (my $sub_item = $item->next_subItem) {
         if (exists $item_data{$sub_item->get_name}) {
             $item_data{$sub_item->get_name} = $sub_item->get_content;
         }
     }
     # check to make sure everything is set
     for my $check (qw(ChrAccVer ChrStart ChrStop)) {
         die "$check not set" unless $item_data{$check};
     }

     my $strand = $item_data{ChrStart} > $item_data{ChrStop} ? 2 : 1;
     $fetcher->set_parameters(-id => $item_data{ChrAccVer},
                              -seq_start => $item_data{ChrStart} + 1,
                              -seq_stop  => $item_data{ChrStop} + 1,
                              -strand    => $strand);
     print $fetcher->get_Response->content;
}

############################

That's to retain compatibility with 1.6; I'll update the wiki.  I can  
add some common Item container methods to grab information for any  
Items contained in the current instance (be it a DocSum or another  
Item).  I'll add that in bioperl-live.

chris

On Jun 10, 2009, at 9:10 AM, Adam Witney wrote:

> Thanks for the pointers Chris.
>
> The new example on the Cookbook doesn't quite work for me as  
> ChrStart seems to appear in the DocSum twice, thus  
> get_contents_by_name('ChrStart') returns a list of two values (which  
> writes the second ChrStart into $end). Also the $start and $end seem  
> to be out by 1, so I needed to change it to this:
>
> my ($acc) = ($docsum->get_contents_by_name('ChrAccVer'));
> my ($start) = ($docsum->get_contents_by_name('ChrStart'));
> my ($end) = ($docsum->get_contents_by_name('ChrStop'));
>
> $start += 1;
> $end += 1;
>
> Ah, looking at this further there appears to be something going on  
> in the response from Entrez. Compare these two gene records:
>
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=18131 
> 		(your example below)
> http://www.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene&id=2861733 
> 		(my gene)
>
> In both cases you can see that ChrStart appears twice, once as part  
> of the GenomicInfo list and once on its own at the bottom. In my  
> example above the two ChrStart values match, but in the Notch3  
> example you posted the 2nd ChrStart seems to be the same as the  
> ChrStop in the GenomicInfo list. Do you know if the second ChrStart  
> has a separate meaning?
>
> I guess in the Cookbook example we would need to make sure that the  
> get_contents_by_name('ChrStart') picks up the value from the  
> GenomicInfo list, is this possible?
>
> thanks again
>
> adam
>
>
> On 10 Jun 2009, at 14:20, Chris Fields wrote:
>
>> EntrezGene doesn't contain the sequence information; I believe it  
>> just links to the sequence in a specified nuc record with given  
>> coordinates.  You can get to it, but it takes a little trickery; in  
>> essence you need to use the UID to get the gene summary  
>> information, extract that, then grab the sequence record using  
>> seqstart, seqend, and seqstrand.
>>
>> A dump of esummary info for UID 18131, for instance, (using $eutil- 
>> >print_all) gives this info (abbreviated somewhat):
>>
>> UID                 :18131
>> Name                :Notch3
>> Description         :Notch gene homolog 3 (Drosophila)
>> Orgname             :Mus musculus
>> ...
>> GenomicInfo
>>   GenomicInfoType
>>       ChrLoc      :17
>>       ChrAccVer   :NC_000083.5
>>       ChrStart    :32303796
>>       ChrStop     :32257837
>> GeneWeight          :23049
>>
>> The genomic info section gives the accession.version, start, end,  
>> and (implicitly) the strand (ChrStop is less that ChrStart). I have  
>> added an example to the cookbook:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_retrieve_the_DNA_sequence_using_EntrezGene_IDs.3F
>>
>> chris
>>
>> On Jun 9, 2009, at 6:20 AM, Adam Witney wrote:
>>
>>> Hi,
>>>
>>> I have been experimenting with the Bio::DB::EUtilities module,  
>>> with help from the Cookbook. But I can't seem to figure out how to  
>>> get the DNA sequence of a gene; all the examples seem to be  
>>> fetching protein sequence.
>>>
>>> How would i go about fetching a sequence using an Entrez GeneID?
>>>
>>> thanks for any help
>>>
>>> adam
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 11:36:40 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 07:36:40 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
Message-ID: <17AD00895AFD43E1A1436D1065092BAC@NewLife>

Hi Chris and list-
Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
I notice also that autogenerated documentation for bioperl-live doesn't contain
new modules (or HIVQuery & Tiling, anyway ;) )--
cheers, Mark


From maj at fortinbras.us  Thu Jun 11 13:17:25 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 09:17:25 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>

Rasmus et al-

This looks like a bug. A quick debug shows it's barfing on 'AarI' (as it cycles 
through
all enzymes apparently creating a global cut map). AarI has a recognition 
sequence of

CACCTGC (in $enz->seq->seq)

but a cut site of

CACCTGCNNNN^ (in $enz->seq->site)

The bad parm '11' refers to the end of the cut site sequence, but the routine
B:R:Analysis::_cuts is attempting to split the 7-symbol recognition sequence,
and so throws.

This surprises me. Core, let me know if you want me to take this on, or
if the module author can fix it quicker.

cheers,
Mark

----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Thu Jun 11 14:19:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 11 Jun 2009 09:19:51 -0500
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <2F52B1CED1374763822BF3AD1D283B3B@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
Message-ID: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>

Mark,

Feel free to take it up.  It's probably a good idea to start a bug  
report for tracking if it proves to be thornier to fix than expected.

chris

On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:

> Rasmus et al-
>
> This looks like a bug. A quick debug shows it's barfing on  
> 'AarI' (as it cycles through
> all enzymes apparently creating a global cut map). AarI has a  
> recognition sequence of
>
> CACCTGC (in $enz->seq->seq)
>
> but a cut site of
>
> CACCTGCNNNN^ (in $enz->seq->site)
>
> The bad parm '11' refers to the end of the cut site sequence, but  
> the routine
> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition  
> sequence,
> and so throws.
>
> This surprises me. Core, let me know if you want me to take this on,  
> or
> if the module author can fix it quicker.
>
> cheers,
> Mark
>
> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
> using rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so  
>> please bear with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>> created the script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out  
>> the '-enzymes' argument, so it uses the built-in collection of  
>> enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only  
>> available in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length  
>> of sequence (total=7)
>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>> Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>          {
>>            'seq' => 'CTCGACCGTTAGCAA',
>>            'end' => 15,
>>            'start' => '1'
>>          },
>>          {
>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>            'end' => 34,
>>            'start' => '16'
>>          }
>>        ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>    -primary_id => 'test',
>>    -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>    -file   => 'withrefm.906',
>>    -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>    -seq     => $seqobj,
>>    -enzymes => $rebase_collection,    # it works with this line  
>> commented out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jun 11 14:26:19 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 10:26:19 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
	rebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk>
	<2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <CD6C392C39CD4287B3619FCDBC1D19CF@NewLife>

All-righty-- thanks MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> 


From mauricio at open-bio.org  Thu Jun 11 16:46:35 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 11 Jun 2009 11:46:35 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
Message-ID: <4A3134EB.4080702@open-bio.org>

Hi Mark,

I'll take a look into this sometime between today and tomorrow. Will 
keep you posted. Thanks for the heads up :)

Mauricio.


Mark A. Jensen wrote:
> Hi Chris and list-
> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
> I notice also that autogenerated documentation for bioperl-live doesn't contain
> new modules (or HIVQuery & Tiling, anyway ;) )--
> cheers, Mark
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Thu Jun 11 18:41:26 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 11 Jun 2009 14:41:26 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3134EB.4080702@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
Message-ID: <A53006055C854297AAA58F6650F4F867@NewLife>

cheers Mauricio! MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Thursday, June 11, 2009 12:46 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Hi Mark,
>
> I'll take a look into this sometime between today and tomorrow. Will keep you 
> posted. Thanks for the heads up :)
>
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> Hi Chris and list-
>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>> I notice also that autogenerated documentation for bioperl-live doesn't 
>> contain
>> new modules (or HIVQuery & Tiling, anyway ;) )--
>> cheers, Mark
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> 


From Xianjun.Dong at bccs.uib.no  Fri Jun 12 20:38:50 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Fri, 12 Jun 2009 22:38:50 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6 for
	Bio::Graphics::Glyph
Message-ID: <4A32BCDA.4080605@ii.uib.no>

HI,

I am not sure this is the right place I can get help.

I've suffered by a problem for several days: I want to highlight parts 
of regions in my track, using a different background color. To do that, 
I defined a glyph named "background", based on the 
'Bio::Graphics::Glyph::generic' module. I override the draw_component() 
method, by adding code like below:

$gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));

# the script is pasted at the end

This will draw a rectangle with top=0, bottom=$gd->height. I made the 
highlight regions into a list of features, and add_track with 
-glyph=>'background'. (see the following script, test.pl) This really 
works as I expect, which will add a colored block at background of all 
tracks in a panel (including the ruler arrow). You can see the output 
image in attached file "test.bioperl1.2.3.png"

Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does 
not work. Well, it works, but the highlight part only shrink to a low 
height, instead of covering all tracks in the panel. I also attached the 
output here, see the file "test.bioperl1.6.png".

I tried to think about the reason, the 'background' module is based on 
the generic module. What can cause the difference? Is it because 
$gd->height is different, or the tracks followed with 'background' track 
can not draw from the first position?

Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
person solve problem, wise person avoid problem"...) But another problem 
is coming: Bio::Graphics in Bioperl 1.2.3 does not support 
$panel->create_web_map() function, which means I have to use some higher 
version if I want to create web map for my graphics, but then I have to 
give up using highlight background.

OK. It's long enough for my first-time submission here. Hope someone can 
throw me some clue.

Thanks ahead!!

Xianjun


==================== test.pl =======================
#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12);

# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
$panel->add_track([$trans41,$trans31],
          -glyph   => 'background',
                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();

1;

==================== background.pm =======================
package Bio::Graphics::Glyph::background;
 
use strict;
use base 'Bio::Graphics::Glyph::generic';
sub pad_top{
  return 0;
}

sub draw_component {
  my $self = shift;
  #$self->SUPER::draw_component(@_);
  my ($gd,$dx,$dy) = @_;
  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
 
  # draw an arrow to indicate the direction of transcript
  my $color = $self->option('block_bgcolor') || '#cccccc';
  $gd->filledRectangle($left,0,$right,$gd->height, 
$self->factory->translate_color($color));
}
 
1;

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090612/9cdc621a/attachment-0009.png>

From scott at scottcain.net  Sat Jun 13 01:29:09 2009
From: scott at scottcain.net (Scott Cain)
Date: Fri, 12 Jun 2009 21:29:09 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A32BCDA.4080605@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
Message-ID: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>

Hello Xianjun,

I don't think that approach will work.  What you almost certainly need
to do is a postgrid callback that does the drawing of the highlighted
region.  For example code of how to do this, take a look at the
make_postgrid_callback subroutine in GBrowse 1.69.  The option
-postgrid is a method of Bio::Graphics::Panel.

Scott


On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
> ? ? ? ? -glyph ? => 'background',
> ? ? ? ? ? ? ? ? -block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
> ? ? ? ? ? ? ? ? );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
> ? ? ? ? ? ? ? ? -glyph=>'arrow',
> ? ? ? ? ? ? ? ? -double=>1,
> ? ? ? ? ? ? ? ? -tick=>2);
>
> $panel->add_track($trans,
> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
> ? ? ? ? ? ? ? ? -title => '$source',
> ? ? ? ? ? ? ? ? -link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
> ? ? ? ? ? ? ? ? );
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From scott at scottcain.net  Sat Jun 13 13:27:39 2009
From: scott at scottcain.net (Scott Cain)
Date: Sat, 13 Jun 2009 09:27:39 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A339621.2060702@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
Message-ID: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>

Hi Xianjun,

I understand what you want to do, as the current version of gbrowse
does this, which uses bioperl 1.6.  Without digging through the code,
I can't tell you exactly how this works and you didn't send your code
that uses this callback, so I can't try it either.

One thing that is different between your code and gbrowse is that each
of the tracks is actually a seperate panel (to allow track dragging),
so it possible that this sort of callback doesn't work for
Bio::Graphics any more.

Scott

On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott
>
> Thanks for your reply first.
>
> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>
> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>
> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>
> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>
> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>
> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
> test.bioperl1.2.3.png: ? ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>
> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>
> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>
> Thanks
>
> Xianjun
> =============================================
>
> # this generates the callback for highlighting a region
> sub make_postgrid_callback {
> ?my $settings = shift;
> ?return unless ref $settings->{h_region};
>
> ?my @h_regions = map {
>  ? my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>  ? defined($h_ref) && $h_ref eq $settings->{ref}
>  ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>  ? ? ? ? ? ? ? ?: ()
> ?}
>  ? @{$settings->{h_region}};
>
> ?return unless @h_regions;
> ?return hilite_regions_closure(@h_regions);
> }
>
> # this subroutine generates a Bio::Graphics::Panel callback closure
> # suitable for hilighting a region of a panel.
> # The args are a list of [start,end,color]
> sub hilite_regions_closure {
> ?my @h_regions = @_;
>
> ?return sub {
>  ? my $gd ? ? = shift;
>  ? my $panel ?= shift;
>  ? my $left ? = $panel->pad_left;
>  ? my $top ? ?= $panel->top;
>  ? my $bottom = $panel->bottom;
>  ? for my $r (@h_regions) {
>  ? ? my ($h_start,$h_end,$h_color) = @$r;
>  ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>  ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>  ? ? # assuming top is 0 so as to ignore top padding
>  ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>  ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>  ? }
> ?};
> }
>
>
> Scott Cain wrote:
>
> Hello Xianjun,
>
> I don't think that approach will work. ?What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region. ?For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>
>
> HI,
>
> I am not sure this is the right place I can get help.
>
> I've suffered by a problem for several days: I want to highlight parts of
> regions in my track, using a different background color. To do that, I
> defined a glyph named "background", based on the
> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
> method, by adding code like below:
>
> $gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
>
> # the script is pasted at the end
>
> This will draw a rectangle with top=0, bottom=$gd->height. I made the
> highlight regions into a list of features, and add_track with
> -glyph=>'background'. (see the following script, test.pl) This really works
> as I expect, which will add a colored block at background of all tracks in a
> panel (including the ruler arrow). You can see the output image in attached
> file "test.bioperl1.2.3.png"
>
> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
> work. Well, it works, but the highlight part only shrink to a low height,
> instead of covering all tracks in the panel. I also attached the output
> here, see the file "test.bioperl1.6.png".
>
> I tried to think about the reason, the 'background' module is based on the
> generic module. What can cause the difference? Is it because $gd->height is
> different, or the tracks followed with 'background' track can not draw from
> the first position?
>
> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
> solve problem, wise person avoid problem"...) But another problem is coming:
> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
> function, which means I have to use some higher version if I want to create
> web map for my graphics, but then I have to give up using highlight
> background.
>
> OK. It's long enough for my first-time submission here. Hope someone can
> throw me some clue.
>
> Thanks ahead!!
>
> Xianjun
>
>
> ==================== test.pl =======================
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 =
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
> -source=>'a');
> my $trans5 =
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans ?=
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 =
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
> -source=>'a');
> my $trans41 =
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>  ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>
> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
> and 1.6
> $panel->add_track([$trans41,$trans31],
>  ? ? ? ?-glyph ? => 'background',
>  ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
> 'a')?'#cccccc':'#fffc22'},
>  ? ? ? ? ? ? ? ?);
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>  ? ? ? ? ? ? ? ?-glyph=>'arrow',
>  ? ? ? ? ? ? ? ?-double=>1,
>  ? ? ? ? ? ? ? ?-tick=>2);
>
> $panel->add_track($trans,
>  ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>  ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>  ? ? ? ? ? ? ? ?-title => '$source',
>  ? ? ? ? ? ? ? ?-link =>
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>  ? ? ? ? ? ? ? ?);
> ?print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
> 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
> 1;
>
> ==================== background.pm =======================
> package Bio::Graphics::Glyph::background;
>
> use strict;
> use base 'Bio::Graphics::Glyph::generic';
> sub pad_top{
> ?return 0;
> }
>
> sub draw_component {
> ?my $self = shift;
> ?#$self->SUPER::draw_component(@_);
> ?my ($gd,$dx,$dy) = @_;
> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>
> ?# draw an arrow to indicate the direction of transcript
> ?my $color = $self->option('block_bgcolor') || '#cccccc';
> ?$gd->filledRectangle($left,0,$right,$gd->height,
> $self->factory->translate_color($color));
> }
>
> 1;
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>
>
>
>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 16:48:16 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 18:48:16 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
Message-ID: <4A33D850.1020203@ii.uib.no>

Hi, Scott

Before I gave up my own whole solution to use GBrowse, I still want to 
bother you once:

As you suggested, I put -postgrid option when the panel, which will call 
a function to draw the background. The code below is almost copied from 
the online POD of Bio::Graphics::Panel (see 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
)

But it still does not work. Could you help to have a look? I paste it 
below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the 
gap drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)

  my $panel = *Bio::Graphics::Panel*->new(-segment=>$segment,
                                        -grid=>1,
                                        -width=>600,
                                        -postgrid=> \&draw_gap);
  sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $panel->bottom;
     my $gray                 = $panel->translate_color('gray');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}

THanks

Xianjun

-----------------------------------------------

#!/usr/bin/perl
 
use strict;
use lib "$ENV{HOME}/lib";
 
use Bio::Graphics;
use Bio::Graphics::Feature;
my $ftr= 'Bio::Graphics::Feature';
 
# processed_transcript
my $trans1 = 
$ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
my $trans3 = 
$ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans4 = 
$ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
-source=>'a');
my $trans5 = 
$ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
my $trans  = 
$ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);

# hightlight
my $trans31 = 
$ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
-source=>'a');
my $trans41 = 
$ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
-source=>'b');
 
my $panel= Bio::Graphics::Panel->new(-width=>1200,
                                             -length=>1050,
                                             -start =>0,
                                             -pad_left=>12,
                                             -pad_right=>12
                                             -postgrid=>\&gap_it);

sub gap_it {
     my $gd    = shift;
     my $panel = shift;
     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
     my $top                  = $panel->top;
     my $bottom               = $gd->height, #panel->bottom;
     my $gray                 = $panel->translate_color('red');
     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
}
# the following track works as I expected in bioperl 1.2.3, but not in 
1.5 and 1.6
#$panel->add_track([$trans41,$trans31],
#          -glyph   => 'background',
#                  -block_bgcolor => sub{return (shift->source eq 
'a')?'#cccccc':'#fffc22'},
#                  );

$panel->add_track($ftr->new(-start=>100,-end=>1000),
                  -glyph=>'arrow',
                  -double=>1,
                  -tick=>2);

$panel->add_track($trans,
          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
                  -fgcolor => 'darkred',
                  -bgcolor => 'darkred',
                  -title => '$source',
                  -link => 
'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
                  );
   
print $panel->png;

# the following part works in bioperl 1.5 and 1.6, but not work in 
Bioperl 1.2.3
my $map = $panel->create_web_map("image");
$panel->finished();


Scott Cain wrote:
> Hi Xianjun,
>
> I understand what you want to do, as the current version of gbrowse
> does this, which uses bioperl 1.6.  Without digging through the code,
> I can't tell you exactly how this works and you didn't send your code
> that uses this callback, so I can't try it either.
>
> One thing that is different between your code and gbrowse is that each
> of the tracks is actually a seperate panel (to allow track dragging),
> so it possible that this sort of callback doesn't work for
> Bio::Graphics any more.
>
> Scott
>
> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> wrote:
>   
>> Hi, Scott
>>
>> Thanks for your reply first.
>>
>> I still have question: I dig out the code from GBrowse (which I paste below). Method make_postgrid_callback gets all highlight region and then use hilite_regions_closure function to draw them out, using the following GD function:
>>
>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>
>> where the $bottom=$panel->bottom. This is the only difference from my code, where I use $gd->height. I guess they are almost same (except the pad_bottom), we can see this in the code of http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>
>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for my highlight regions. The output is same, when using the library of Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>
>> OK. I might have not explained my question explicitly. My question is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can get the right image I want (see the attached file "test.bioperl1.2.3.png"), where the highlight range will go from the roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the highlight region in its own track, not the whole panel. OK, did I explain clearly now? you can see the difference of the two images.
>>
>> [I am not sure the mailist allow to attach image, otherwise, I put them in the following links:
>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>> test.bioperl1.2.3.png:    http://translog.genereg.net/test.bioperl1.2.3.png ]
>>
>> You can test it and see the difference if you have both 1.2.3 and 1.6 on your computer?
>>
>> Really want to know how this works in bioperl 1.2.3 (Even though this might be a bug at that version, or whatever)
>>
>> Thanks
>>
>> Xianjun
>> =============================================
>>
>> # this generates the callback for highlighting a region
>> sub make_postgrid_callback {
>>  my $settings = shift;
>>  return unless ref $settings->{h_region};
>>
>>  my @h_regions = map {
>>    my ($h_ref,$h_start,$h_end,$h_color) = /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>                 : ()
>>  }
>>    @{$settings->{h_region}};
>>
>>  return unless @h_regions;
>>  return hilite_regions_closure(@h_regions);
>> }
>>
>> # this subroutine generates a Bio::Graphics::Panel callback closure
>> # suitable for hilighting a region of a panel.
>> # The args are a list of [start,end,color]
>> sub hilite_regions_closure {
>>  my @h_regions = @_;
>>
>>  return sub {
>>    my $gd     = shift;
>>    my $panel  = shift;
>>    my $left   = $panel->pad_left;
>>    my $top    = $panel->top;
>>    my $bottom = $panel->bottom;
>>    for my $r (@h_regions) {
>>      my ($h_start,$h_end,$h_color) = @$r;
>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always see something
>>      # assuming top is 0 so as to ignore top padding
>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>                           $panel->translate_color($h_color));
>>    }
>>  };
>> }
>>
>>
>> Scott Cain wrote:
>>
>> Hello Xianjun,
>>
>> I don't think that approach will work.  What you almost certainly need
>> to do is a postgrid callback that does the drawing of the highlighted
>> region.  For example code of how to do this, take a look at the
>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>> -postgrid is a method of Bio::Graphics::Panel.
>>
>> Scott
>>
>>
>>
>>
>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>
>>
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>>     
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From maj at fortinbras.us  Sun Jun 14 04:35:18 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 00:35:18 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when
	usingrebasefile.
In-Reply-To: <0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>
	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
Message-ID: <A9819F7FF3894C768CF89C36CB689942@NewLife>

All-

I'm finding this is requiring a pretty substantial refactor and
rationalization. I have opened a branch at
REPOS/bioperl-live/branches/restriction-refactor
and am making commits at will there (won't Rob be pleased!).
When it appears to be passing tests, I'll let Chris know (on list),
and he can decide on its mergability, and brave users could try
it out by downloading Bio/Restriction (deeply) via subversion.

My running commentary is at Bug #2855.
MAJ

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Thursday, June 11, 2009 10:19 AM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exception when 
usingrebasefile.


> Mark,
>
> Feel free to take it up.  It's probably a good idea to start a bug  report for 
> tracking if it proves to be thornier to fix than expected.
>
> chris
>
> On Jun 11, 2009, at 8:17 AM, Mark A. Jensen wrote:
>
>> Rasmus et al-
>>
>> This looks like a bug. A quick debug shows it's barfing on  'AarI' (as it 
>> cycles through
>> all enzymes apparently creating a global cut map). AarI has a  recognition 
>> sequence of
>>
>> CACCTGC (in $enz->seq->seq)
>>
>> but a cut site of
>>
>> CACCTGCNNNN^ (in $enz->seq->site)
>>
>> The bad parm '11' refers to the end of the cut site sequence, but  the 
>> routine
>> B:R:Analysis::_cuts is attempting to split the 7-symbol recognition 
>> sequence,
>> and so throws.
>>
>> This surprises me. Core, let me know if you want me to take this on,  or
>> if the module author can fix it quicker.
>>
>> cheers,
>> Mark
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>> rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  please 
>>> bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>> the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  the 
>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>> works.
>>>
>>> My problem is, that I need to use some of the enzymes that are only 
>>> available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total length  of 
>>> sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.10.0/ 
>>> Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>> out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Mon Jun 15 01:57:45 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Sun, 14 Jun 2009 18:57:45 -0700
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception
	when	usingrebasefile.
In-Reply-To: <A9819F7FF3894C768CF89C36CB689942@NewLife>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu>
	<A9819F7FF3894C768CF89C36CB689942@NewLife>
Message-ID: <4A35AA99.2080305@cornell.edu>

Mark A. Jensen wrote:
> I'm finding this is requiring a pretty substantial refactor and
> rationalization. I have opened a branch at
> REPOS/bioperl-live/branches/restriction-refactor
> and am making commits at will there (won't Rob be pleased!).
Oh Mark, you are so agile!

> When it appears to be passing tests, I'll let Chris know (on list),
> and he can decide on its mergability, and brave users could try
> it out by downloading Bio/Restriction (deeply) via subversion.
If it's passing tests but still has bugs, make sure you add tests for 
the additional bugs you find!

Rob


From maj at fortinbras.us  Mon Jun 15 02:02:37 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 14 Jun 2009 22:02:37 -0400
Subject: [Bioperl-l] Bio::Restriction::Analysis.
	Exceptionwhen	usingrebasefile.
In-Reply-To: <4A35AA99.2080305@cornell.edu>
References: <4A2F622D.5060500@ron.dk><2F52B1CED1374763822BF3AD1D283B3B@NewLife>	<0F0E9DCC-BE3A-4918-8226-B9988B022FEB@illinois.edu><A9819F7FF3894C768CF89C36CB689942@NewLife>
	<4A35AA99.2080305@cornell.edu>
Message-ID: <FFDC29BB104149BE95840F1AD1B61827@NewLife>


----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Sunday, June 14, 2009 9:57 PM
Subject: Re: [Bioperl-l] Bio::Restriction::Analysis. Exceptionwhen 
usingrebasefile.


> Mark A. Jensen wrote:
>> I'm finding this is requiring a pretty substantial refactor and
>> rationalization. I have opened a branch at
>> REPOS/bioperl-live/branches/restriction-refactor
>> and am making commits at will there (won't Rob be pleased!).
> Oh Mark, you are so agile!
ha!
>
>> When it appears to be passing tests, I'll let Chris know (on list),
>> and he can decide on its mergability, and brave users could try
>> it out by downloading Bio/Restriction (deeply) via subversion.
> If it's passing tests but still has bugs, make sure you add tests for the 
> additional bugs you find!

mais, bien sur; plenty new tests coming-- thanks Rob-
MAJ

>
> Rob
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From shalabh.sharma7 at gmail.com  Mon Jun 15 20:06:31 2009
From: shalabh.sharma7 at gmail.com (shalabh sharma)
Date: Mon, 15 Jun 2009 16:06:31 -0400
Subject: [Bioperl-l] sub sampling
Message-ID: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>

Hi All,           I was just wondering that is there any module is bioperl
that do subsampling?
I have a file like this:

369859  0477    93
163417  1348    92
228122  0176    88
232792  0050    93
239636  1850    95
300069  0048    96
244108  0046    91
199087  0055    93
206209  0048    96
-              -         -
-              -         -

which contain around 100,000 lines and i want to take out a sample of 25%
from this file. Is there any way i can do this in Bioperl?

Thanks
Shalabh


From maj at fortinbras.us  Mon Jun 15 23:49:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 19:49:58 -0400
Subject: [Bioperl-l] Bio::Restriction refactor [Was:
	Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <4A2F622D.5060500@ron.dk>
References: <4A2F622D.5060500@ron.dk>
Message-ID: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>

Dear All,

The revamped Bio::Restriction::* in branch

REPOS/bioperl-live/branches/restriction-refactor

passes all existing tests, including those in t/Restriction.
New tests will be added within the next day or so.
The original bug occurred because only a subset of
the possible rebase withrefm-formatted enzymes were
handled; it choked on freshly-downloaded rebase
files because of this.

The refactored version now handles *all* rebase types,
including those of rebase forms

XXX^X                [ intrasite cutters, the main types
                               built in to base.pm]
XXXX(m/n)          [ right-end extrasite cutters ]
(s/t)XXXX            [ left-end ditto ]
(s/t)XXXX(m/n)    [ double-end ditto],

palindromic and non-palindromic, as well as multisite
enzymes that string together combinations of these
forms. Much rationalization (well, seems rational to me
anyway) and cruft removal in the affected code has also
occurred. itype2.pm has been updated as well, to
conform to the refactoring.

If you're dying to try this now, get a working copy
of the branch like so

$ svn co 
svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
bioperl-rr
$ cd bioperl-rr
$ perl Build.PL
$ ./Build test
$ ./Build install

This will only hammer your current installation in the
$SITE_LIB/Bio/Restriction path; I worked only on
a sparse checkout of the necessary files. To revert to your
old install, do

$ cd $MY_OLD_BIOPERL_WORKINGDIR
$ ./Build install

[In the possible event that these instructions are in error,
there will be a response on this list in a matter of
milliseconds, so stand by.]

Happy coding-
Mark


----- Original Message ----- 
From: "Rasmus Ory Nielsen" <ron at ron.dk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 10, 2009 3:35 AM
Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
rebasefile.


> Hi,
>
> This is my first time using bioperl for restriction analysis, so please bear 
> with me, if this is a FAQ.
>
> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
> script shown at the bottom of the mail.
> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>
> The scripts throws an exception - see below. But, if I comment out the 
> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>
> My problem is, that I need to use some of the enzymes that are only available 
> in rebase. So how do I get this working?
>
> Thanks for your attention.
>
> Best regards,
> Rasmus Ory Nielsen
>
>
> ############################################################
> Output from the script:
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
>
> ------------- EXCEPTION -------------
> MSG: Bad end parameter (11). End must be less than the total length of 
> sequence (total=7)
> STACK Bio::PrimarySeq::subseq 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> STACK Bio::Restriction::Analysis::_enzyme_sites 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> STACK Bio::Restriction::Analysis::_cuts 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> STACK Bio::Restriction::Analysis::cut 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> STACK Bio::Restriction::Analysis::fragment_maps 
> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> STACK toplevel ./restriction_test.pl:30
> -------------------------------------
>
> [roni at ksdhcp ~]$
>
>
> ############################################################
> Output from the script with the '-enzymes' argument commented out
> ############################################################
>
> [roni at ksdhcp ~]$ ./restriction_test.pl
>
> --------------------- WARNING ---------------------
> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> ---------------------------------------------------
> $VAR1 = [
>           {
>             'seq' => 'CTCGACCGTTAGCAA',
>             'end' => 15,
>             'start' => '1'
>           },
>           {
>             'seq' => 'AGCTTTCTACCGTTATCGT',
>             'end' => 34,
>             'start' => '16'
>           }
>         ];
> [roni at ksdhcp ~]$
>
> ############################################################
>
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::PrimarySeq;
> use Bio::Restriction::IO;
> use Bio::Restriction::Analysis;
> use Data::Dumper;
>
> # create seq obj
> my $seqobj = new Bio::PrimarySeq(
>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>     -primary_id => 'test',
>     -molecule   => 'dna'
> );
>
> # read rebase file
> my $rebase_io = Bio::Restriction::IO->new(
>     -file   => 'withrefm.906',
>     -format => 'withrefm',
> );
> my $rebase_collection = $rebase_io->read;
>
> # start restriction analysis
> my $restriction_analysis = Bio::Restriction::Analysis->new(
>     -seq     => $seqobj,
>     -enzymes => $rebase_collection,    # it works with this line commented out
> );
>
> # retrieve fragment maps
> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> print Dumper \@fragment_maps;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 16 00:07:21 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 15 Jun 2009 20:07:21 -0400
Subject: [Bioperl-l] sub sampling
In-Reply-To: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
References: <9fcc48c70906151306k5df96b69k1f4d0f1466204c5a@mail.gmail.com>
Message-ID: <A030148C139446DAB1DEE791A4EC2D3B@NewLife>

Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):

 # open your file into $my_infile, then
 @lines = <$my_infile>;

 my $num_samps = 10;
 my $sample_size_pc = 0.25;
 my @samples;

 for (1..$num_samps) {
    push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * 
@lines) ) ];
 }

# now, do something, fr'instance
 my @sample_pc;
 foreach (@samples) {
    my $pct=0;
    foreach my $line (@lines[ @$_ ]) {
        @a = split(/\s+/,$line);
        $pct += $a[2];
    }
    $pct /= @$_;
    push @sample_pc, $pct;
 }

R's just better for some things, ain't it?
MAJ


----- Original Message ----- 
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling


> Hi All,           I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859  0477    93
> 163417  1348    92
> 228122  0176    88
> 232792  0050    93
> 239636  1850    95
> 300069  0048    96
> 244108  0046    91
> 199087  0055    93
> 206209  0048    96
> -              -         -
> -              -         -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Xianjun.Dong at bccs.uib.no  Sat Jun 13 12:05:53 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Sat, 13 Jun 2009 14:05:53 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
Message-ID: <4A339621.2060702@ii.uib.no>

Hi, Scott

Thanks for your reply first.

I still have question: I dig out the code from GBrowse (which I paste 
below). Method make_postgrid_callback gets all highlight region and then 
use hilite_regions_closure function to draw them out, using the 
following GD function:

$gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));

where the $bottom=$panel->bottom. This is the only difference from my 
code, where I use $gd->height. I guess they are almost same (except the 
pad_bottom), we can see this in the code of 
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22

OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for 
my highlight regions. The output is same, when using the library of 
Bioperl 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")

OK. I might have not explained my question explicitly. My question is: 
if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I 
can get the right image I want (see the attached file 
"test.bioperl1.2.3.png"), where the highlight range will go from the 
roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
highlight region in its own track, not the whole panel. OK, did I 
explain clearly now? you can see the difference of the two images.

[I am not sure the mailist allow to attach image, otherwise, I put them 
in the following links:
test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
test.bioperl1.2.3.png:    
http://translog.genereg.net/test.bioperl1.2.3.png ]

You can test it and see the difference if you have both 1.2.3 and 1.6 on 
your computer?

Really want to know how this works in bioperl 1.2.3 (Even though this 
might be a bug at that version, or whatever)

Thanks

Xianjun
=============================================

# this generates the callback for highlighting a region
sub make_postgrid_callback {
  my $settings = shift;
  return unless ref $settings->{h_region};

  my @h_regions = map {
    my ($h_ref,$h_start,$h_end,$h_color) = 
/^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
    defined($h_ref) && $h_ref eq $settings->{ref}
                 ? [$h_start,$h_end,$h_color||'lightgrey']
                 : ()
  }
    @{$settings->{h_region}};

  return unless @h_regions;
  return hilite_regions_closure(@h_regions);
}

# this subroutine generates a Bio::Graphics::Panel callback closure
# suitable for hilighting a region of a panel.
# The args are a list of [start,end,color]
sub hilite_regions_closure {
  my @h_regions = @_;

  return sub {
    my $gd     = shift;
    my $panel  = shift;
    my $left   = $panel->pad_left;
    my $top    = $panel->top;
    my $bottom = $panel->bottom;
    for my $r (@h_regions) {
      my ($h_start,$h_end,$h_color) = @$r;
      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
      if ($end-$start <= 1) { $end++; $start-- } # so that we always see 
something
      # assuming top is 0 so as to ignore top padding
      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
                           $panel->translate_color($h_color));
    }
  };
}


Scott Cain wrote:
> Hello Xianjun,
>
> I don't think that approach will work.  What you almost certainly need
> to do is a postgrid callback that does the drawing of the highlighted
> region.  For example code of how to do this, take a look at the
> make_postgrid_callback subroutine in GBrowse 1.69.  The option
> -postgrid is a method of Bio::Graphics::Panel.
>
> Scott
>
>
>
>
> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>   
>> HI,
>>
>> I am not sure this is the right place I can get help.
>>
>> I've suffered by a problem for several days: I want to highlight parts of
>> regions in my track, using a different background color. To do that, I
>> defined a glyph named "background", based on the
>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>> method, by adding code like below:
>>
>> $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>>
>> # the script is pasted at the end
>>
>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>> highlight regions into a list of features, and add_track with
>> -glyph=>'background'. (see the following script, test.pl) This really works
>> as I expect, which will add a colored block at background of all tracks in a
>> panel (including the ruler arrow). You can see the output image in attached
>> file "test.bioperl1.2.3.png"
>>
>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does not
>> work. Well, it works, but the highlight part only shrink to a low height,
>> instead of covering all tracks in the panel. I also attached the output
>> here, see the file "test.bioperl1.6.png".
>>
>> I tried to think about the reason, the 'background' module is based on the
>> generic module. What can cause the difference? Is it because $gd->height is
>> different, or the tracks followed with 'background' track can not draw from
>> the first position?
>>
>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart person
>> solve problem, wise person avoid problem"...) But another problem is coming:
>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>> function, which means I have to use some higher version if I want to create
>> web map for my graphics, but then I have to give up using highlight
>> background.
>>
>> OK. It's long enough for my first-time submission here. Hope someone can
>> throw me some clue.
>>
>> Thanks ahead!!
>>
>> Xianjun
>>
>>
>> ==================== test.pl =======================
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 = $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 = $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 = $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans  =
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>                                            -length=>1050,
>>                                            -start =>0,
>>                                            -pad_left=>12,
>>                                            -pad_right=>12);
>>
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> $panel->add_track([$trans41,$trans31],
>>         -glyph   => 'background',
>>                 -block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>>                 );
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>                 -glyph=>'arrow',
>>                 -double=>1,
>>                 -tick=>2);
>>
>> $panel->add_track($trans,
>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>                 -fgcolor => 'darkred',
>>                 -bgcolor => 'darkred',
>>                 -title => '$source',
>>                 -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  #EnsEMBL
>>                 );
>>  print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>> 1;
>>
>> ==================== background.pm =======================
>> package Bio::Graphics::Glyph::background;
>>
>> use strict;
>> use base 'Bio::Graphics::Glyph::generic';
>> sub pad_top{
>>  return 0;
>> }
>>
>> sub draw_component {
>>  my $self = shift;
>>  #$self->SUPER::draw_component(@_);
>>  my ($gd,$dx,$dy) = @_;
>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>
>>  # draw an arrow to indicate the direction of transcript
>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>  $gd->filledRectangle($left,0,$right,$gd->height,
>> $self->factory->translate_color($color));
>> }
>>
>> 1;
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     
>
>
>
>   

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.2.3.png
Type: image/png
Size: 2789 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.bioperl1.6.png
Type: image/png
Size: 2365 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090613/3cf5d9c2/attachment-0009.png>

From malcolm.cook at gmail.com  Tue Jun 16 08:06:36 2009
From: malcolm.cook at gmail.com (Malcolm Cook)
Date: Tue, 16 Jun 2009 03:06:36 -0500
Subject: [Bioperl-l]  Alignment->slice() issue?
Message-ID: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>

Kevin,

I'm getting struck by this old issue you once coded around.

      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html

Any chance you could share your implementation with  fellow traveller...

??

Thanks,

Malcolm Cook
Stowers insitute for Medical research


From remi.planel at free.fr  Tue Jun 16 14:57:27 2009
From: remi.planel at free.fr (Remi Planel)
Date: Tue, 16 Jun 2009 16:57:27 +0200
Subject: [Bioperl-l] Hits Object
Message-ID: <4A37B2D7.70807@free.fr>

Hi all,

I couldn't find out from a Bio::Search::Result::ResultI object (obtain 
after parsing a blast report) a way to filter some of the hsps associated ?
By filter I mean eliminate for each hit some hsps I'm not interested in ?

Can I modify directly the Result object ?

Thanks,


From lsbrath at gmail.com  Tue Jun 16 15:42:37 2009
From: lsbrath at gmail.com (Mgavi Brathwaite)
Date: Tue, 16 Jun 2009 11:42:37 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
	undefined value
Message-ID: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

Hello,
My method produces an error message stating that it can't call a "next_hit"
method on an undefined value.

sub hu_bl2seq_parser{
	my ($maid, $maid_dir) = @_;
	# Get the report
	my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
						   -report_type => 'blastn');
	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
	my $result=$in->next_result;
	my($hu_aln,$hu_mismatches);
	# Get info about the first hit
	my $hit = $result->next_hit;
	my $name = $hit->name;
	# get info about the first hsp of the first hit
	my $hsp = $hit->next_hsp;
	# get the alignment object
	my $aln = $hsp->get_aln;
	#my $percent_id = $hsp->percent_identity;
	#my $aln_length = $hsp->length('total');
	my @mismatches = $hsp->seq_inds('query','nomatch');
	my $aln_str="";
	# access the alignment string
	my $strIO=IO::String->new($aln_str);
	#  write the string alignio in clustalw format
	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
	# now the actual alignment string is accessable for printing or in
this case moving to a db table
	$alnio->write_aln($aln);
	$hu_aln=$aln_str;
	$hu_mismatches = scalar @mismatches;
	return($hu_aln, $hu_mismatches);
}

The problem is at "my $hit = $result->next_hit;"
Any help will be appreciated.
LomSpace


From cjfields at illinois.edu  Tue Jun 16 18:14:18 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:14:18 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <9A7FE5B3-29A2-4FAE-AE5A-945064DD8DB6@illinois.edu>

I'll check out the branch sometime today and run tests on it.  Thanks  
for the hard work Mark!

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From maj at fortinbras.us  Tue Jun 16 17:58:56 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:58:56 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>

Dear All,

There are tests for the new functionality of Bio::Restriction
now in t/Restriction on the branch, along with the withrefm.906
in t/data that revealed the bug in RON's post. All tests pass without
warnings on my machine (which is bioperl live, perl 5.10.10,
under Vista/cygwin - yes, I still don't have a real computer).
We're ready for a merge on my end.

Thanks all for your silent assent to these machinations.
cheers
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 16 17:51:14 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 13:51:14 -0400
Subject: [Bioperl-l] Hits Object
In-Reply-To: <4A37B2D7.70807@free.fr>
Message-ID: <3766B1A38606458EB5FA24D24371433D@NewLife>

Remi- have a look at http://www.bioperl.org/wiki/HOWTO:SearchIO and maybe
http://www.bioperl.org/wiki/Parsing_BLAST_HSPs; perhaps your questions will 
be answered there-
cheers, Mark


From cjfields at illinois.edu  Tue Jun 16 18:31:10 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 13:31:10 -0500
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
Message-ID: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>

Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  
merge.

Also (as mentioned some time back w/ Hilmar among others), we can  
probably delete this branch seeing as the code will be merged to trunk  
(it being a feature branch and all).  Worth doing the same for a few  
other feature branches as well.

chris

On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:

> Dear All,
>
> There are tests for the new functionality of Bio::Restriction
> now in t/Restriction on the branch, along with the withrefm.906
> in t/data that revealed the bug in RON's post. All tests pass without
> warnings on my machine (which is bioperl live, perl 5.10.10,
> under Vista/cygwin - yes, I still don't have a real computer).
> We're ready for a merge on my end.
>
> Thanks all for your silent assent to these machinations.
> cheers
> Mark
>
> ----- Original Message ----- From: "Mark A. Jensen"  
> <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor  
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>
>
>> Dear All,
>>
>> The revamped Bio::Restriction::* in branch
>>
>> REPOS/bioperl-live/branches/restriction-refactor
>>
>> passes all existing tests, including those in t/Restriction.
>> New tests will be added within the next day or so.
>> The original bug occurred because only a subset of
>> the possible rebase withrefm-formatted enzymes were
>> handled; it choked on freshly-downloaded rebase
>> files because of this.
>>
>> The refactored version now handles *all* rebase types,
>> including those of rebase forms
>>
>> XXX^X                [ intrasite cutters, the main types
>>                              built in to base.pm]
>> XXXX(m/n)          [ right-end extrasite cutters ]
>> (s/t)XXXX            [ left-end ditto ]
>> (s/t)XXXX(m/n)    [ double-end ditto],
>>
>> palindromic and non-palindromic, as well as multisite
>> enzymes that string together combinations of these
>> forms. Much rationalization (well, seems rational to me
>> anyway) and cruft removal in the affected code has also
>> occurred. itype2.pm has been updated as well, to
>> conform to the refactoring.
>>
>> If you're dying to try this now, get a working copy
>> of the branch like so
>>
>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>> restriction-refactor bioperl-rr
>> $ cd bioperl-rr
>> $ perl Build.PL
>> $ ./Build test
>> $ ./Build install
>>
>> This will only hammer your current installation in the
>> $SITE_LIB/Bio/Restriction path; I worked only on
>> a sparse checkout of the necessary files. To revert to your
>> old install, do
>>
>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>> $ ./Build install
>>
>> [In the possible event that these instructions are in error,
>> there will be a response on this list in a matter of
>> milliseconds, so stand by.]
>>
>> Happy coding-
>> Mark
>>
>>
>>
>>
>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 10, 2009 3:35 AM
>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  
>> using rebasefile.
>>
>>
>>> Hi,
>>>
>>> This is my first time using bioperl for restriction analysis, so  
>>> please bear with me, if this is a FAQ.
>>>
>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  
>>> created the script shown at the bottom of the mail.
>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>
>>> The scripts throws an exception - see below. But, if I comment out  
>>> the '-enzymes' argument, so it uses the built-in collection of  
>>> enzymes, it works.
>>>
>>> My problem is, that I need to use some of the enzymes that are  
>>> only available in rebase. So how do I get this working?
>>>
>>> Thanks for your attention.
>>>
>>> Best regards,
>>> Rasmus Ory Nielsen
>>>
>>>
>>> ############################################################
>>> Output from the script:
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>>
>>> ------------- EXCEPTION -------------
>>> MSG: Bad end parameter (11). End must be less than the total  
>>> length of sequence (total=7)
>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>> 5.10.0/Bio/PrimarySeq.pm:401
>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>> STACK toplevel ./restriction_test.pl:30
>>> -------------------------------------
>>>
>>> [roni at ksdhcp ~]$
>>>
>>>
>>> ############################################################
>>> Output from the script with the '-enzymes' argument commented out
>>> ############################################################
>>>
>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>> ---------------------------------------------------
>>> $VAR1 = [
>>>          {
>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>            'end' => 15,
>>>            'start' => '1'
>>>          },
>>>          {
>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>            'end' => 34,
>>>            'start' => '16'
>>>          }
>>>        ];
>>> [roni at ksdhcp ~]$
>>>
>>> ############################################################
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::PrimarySeq;
>>> use Bio::Restriction::IO;
>>> use Bio::Restriction::Analysis;
>>> use Data::Dumper;
>>>
>>> # create seq obj
>>> my $seqobj = new Bio::PrimarySeq(
>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>    -primary_id => 'test',
>>>    -molecule   => 'dna'
>>> );
>>>
>>> # read rebase file
>>> my $rebase_io = Bio::Restriction::IO->new(
>>>    -file   => 'withrefm.906',
>>>    -format => 'withrefm',
>>> );
>>> my $rebase_collection = $rebase_io->read;
>>>
>>> # start restriction analysis
>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>    -seq     => $seqobj,
>>>    -enzymes => $rebase_collection,    # it works with this line  
>>> commented out
>>> );
>>>
>>> # retrieve fragment maps
>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>> print Dumper \@fragment_maps;
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Tue Jun 16 19:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 16 Jun 2009 14:07:44 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>

Sounds to me like a BioPerl bug.  Do you have some example data  
demonstrating the problem?

chris

On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:

> Kevin,
>
> I'm getting struck by this old issue you once coded around.
>
>      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>
> Any chance you could share your implementation with  fellow  
> traveller...
>
> ??
>
> Thanks,
>
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun 16 19:32:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 15:32:02 -0400
Subject: [Bioperl-l] error message: can't call method "next_hit" on
	andundefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <91AC45F45A0F43D292323A711F0D5BDA@NewLife>

lomspace-
this

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

should be

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => $maid_dir."\\".$maid."aln_hu.aln",
   -report_type => 'blastn');

if you're reading the file. Then $result will have something in it when
you do $in->next_result

cheers, MAJ
----- Original Message ----- 
From: "Mgavi Brathwaite" <lsbrath at gmail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Tuesday, June 16, 2009 11:42 AM
Subject: [Bioperl-l] error message: can't call method "next_hit" on andundefined 
value


> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
>
> sub hu_bl2seq_parser{
> my ($maid, $maid_dir) = @_;
> # Get the report
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
>    -report_type => 'blastn');
> #open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");
> #my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> my $result=$in->next_result;
> my($hu_aln,$hu_mismatches);
> # Get info about the first hit
> my $hit = $result->next_hit;
> my $name = $hit->name;
> # get info about the first hsp of the first hit
> my $hsp = $hit->next_hsp;
> # get the alignment object
> my $aln = $hsp->get_aln;
> #my $percent_id = $hsp->percent_identity;
> #my $aln_length = $hsp->length('total');
> my @mismatches = $hsp->seq_inds('query','nomatch');
> my $aln_str="";
> # access the alignment string
> my $strIO=IO::String->new($aln_str);
> #  write the string alignio in clustalw format
> my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> # now the actual alignment string is accessable for printing or in
> this case moving to a db table
> $alnio->write_aln($aln);
> $hu_aln=$aln_str;
> $hu_mismatches = scalar @mismatches;
> return($hu_aln, $hu_mismatches);
> }
>
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rmb32 at cornell.edu  Tue Jun 16 19:46:40 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 16 Jun 2009 12:46:40 -0700
Subject: [Bioperl-l] error message: can't call method "next_hit" on and
 undefined value
In-Reply-To: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
References: <69367b8f0906160842y39c89b86m8733e5b18e5334e2@mail.gmail.com>
Message-ID: <4A37F6A0.1080907@cornell.edu>

Mgavi Brathwaite wrote:
> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.

Your proximate problem seems to be that you are prepending a '>' to the 
filename in your invocation of Bio::SearchIO::new, which I think might 
cause it to write to the file instead of reading from it.  But also, you 
probably want to use next_result and next_hit in while loops, since they 
return undef when there are no more hits or hsps to parse.  This is what 
is causing your "can't call next_hit on undefined value" error. 
next_result() returns undef when there are no results to parse.

by while loops, I mean something like:

while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
      # insert the rest of your operations here
      }
}

Hope this helps.

Rob

> Hello,
> My method produces an error message stating that it can't call a "next_hit"
> method on an undefined value.
> 
> sub hu_bl2seq_parser{
> 	my ($maid, $maid_dir) = @_;
> 	# Get the report
> 	my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => ">".$maid_dir."\\".$maid."aln_hu.aln",
> 						   -report_type => 'blastn');
> 	#open(my $out, ">$maid_dir/".$maid."aln_hu_parsed.out");					
> 	#my $out = Bio::AlignIO->newFh(-format => 'clustalw' );
> 	my $result=$in->next_result;
> 	my($hu_aln,$hu_mismatches);
> 	# Get info about the first hit
> 	my $hit = $result->next_hit;
> 	my $name = $hit->name;
> 	# get info about the first hsp of the first hit
> 	my $hsp = $hit->next_hsp;
> 	# get the alignment object
> 	my $aln = $hsp->get_aln;
> 	#my $percent_id = $hsp->percent_identity;
> 	#my $aln_length = $hsp->length('total');
> 	my @mismatches = $hsp->seq_inds('query','nomatch');
> 	my $aln_str="";
> 	# access the alignment string
> 	my $strIO=IO::String->new($aln_str);
> 	#  write the string alignio in clustalw format
> 	my $alnio = Bio::AlignIO->new(-format => 'clustalw', -fh=>$strIO);
> 	# now the actual alignment string is accessable for printing or in
> this case moving to a db table
> 	$alnio->write_aln($aln);
> 	$hu_aln=$aln_str;
> 	$hu_mismatches = scalar @mismatches;
> 	return($hu_aln, $hu_mismatches);
> }
> 
> The problem is at "my $hit = $result->next_hit;"
> Any help will be appreciated.
> LomSpace
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Tue Jun 16 20:10:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 16:10:34 -0400
Subject: [Bioperl-l] Bio::Restriction
	refactor[Was:Bio::Restriction::Analysis. Exception when using
	rebasefile.]
In-Reply-To: <A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
References: <4A2F622D.5060500@ron.dk><E80E6C1BC08D4E338739148BFE9BFAC0@NewLife><D4FBF1054F5C48C0ACBF81873A81E7AB@NewLife>
	<A800F5EC-C7E2-4BE4-9B45-3E71FB60AC2E@illinois.edu>
Message-ID: <61179C22E04F479686C7F5CFEC496FB0@NewLife>

Right; will remove branch. Will go ahead with merge at 21:20 UTC.
cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Rasmus Ory Nielsen" <ron at ron.dk>
Sent: Tuesday, June 16, 2009 2:31 PM
Subject: Re: [Bioperl-l] Bio::Restriction 
refactor[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]


> Everything passes on my end (Mac OS X 10.5, perl 5.10.0).  +1 on the  merge.
>
> Also (as mentioned some time back w/ Hilmar among others), we can  probably 
> delete this branch seeing as the code will be merged to trunk  (it being a 
> feature branch and all).  Worth doing the same for a few  other feature 
> branches as well.
>
> chris
>
> On Jun 16, 2009, at 12:58 PM, Mark A. Jensen wrote:
>
>> Dear All,
>>
>> There are tests for the new functionality of Bio::Restriction
>> now in t/Restriction on the branch, along with the withrefm.906
>> in t/data that revealed the bug in RON's post. All tests pass without
>> warnings on my machine (which is bioperl live, perl 5.10.10,
>> under Vista/cygwin - yes, I still don't have a real computer).
>> We're ready for a merge on my end.
>>
>> Thanks all for your silent assent to these machinations.
>> cheers
>> Mark
>>
>> ----- Original Message ----- From: "Mark A. Jensen"  <maj at fortinbras.us>
>> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
>> Sent: Monday, June 15, 2009 7:49 PM
>> Subject: [Bioperl-l] Bio::Restriction refactor 
>> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
>>
>>
>>> Dear All,
>>>
>>> The revamped Bio::Restriction::* in branch
>>>
>>> REPOS/bioperl-live/branches/restriction-refactor
>>>
>>> passes all existing tests, including those in t/Restriction.
>>> New tests will be added within the next day or so.
>>> The original bug occurred because only a subset of
>>> the possible rebase withrefm-formatted enzymes were
>>> handled; it choked on freshly-downloaded rebase
>>> files because of this.
>>>
>>> The refactored version now handles *all* rebase types,
>>> including those of rebase forms
>>>
>>> XXX^X                [ intrasite cutters, the main types
>>>                              built in to base.pm]
>>> XXXX(m/n)          [ right-end extrasite cutters ]
>>> (s/t)XXXX            [ left-end ditto ]
>>> (s/t)XXXX(m/n)    [ double-end ditto],
>>>
>>> palindromic and non-palindromic, as well as multisite
>>> enzymes that string together combinations of these
>>> forms. Much rationalization (well, seems rational to me
>>> anyway) and cruft removal in the affected code has also
>>> occurred. itype2.pm has been updated as well, to
>>> conform to the refactoring.
>>>
>>> If you're dying to try this now, get a working copy
>>> of the branch like so
>>>
>>> $ svn co svn://code.open-bio.org/bioperl/bioperl-live/branches/ 
>>> restriction-refactor bioperl-rr
>>> $ cd bioperl-rr
>>> $ perl Build.PL
>>> $ ./Build test
>>> $ ./Build install
>>>
>>> This will only hammer your current installation in the
>>> $SITE_LIB/Bio/Restriction path; I worked only on
>>> a sparse checkout of the necessary files. To revert to your
>>> old install, do
>>>
>>> $ cd $MY_OLD_BIOPERL_WORKINGDIR
>>> $ ./Build install
>>>
>>> [In the possible event that these instructions are in error,
>>> there will be a response on this list in a matter of
>>> milliseconds, so stand by.]
>>>
>>> Happy coding-
>>> Mark
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Rasmus Ory Nielsen" <ron at ron.dk>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, June 10, 2009 3:35 AM
>>> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when  using 
>>> rebasefile.
>>>
>>>
>>>> Hi,
>>>>
>>>> This is my first time using bioperl for restriction analysis, so  please 
>>>> bear with me, if this is a FAQ.
>>>>
>>>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and  created 
>>>> the script shown at the bottom of the mail.
>>>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>>>
>>>> The scripts throws an exception - see below. But, if I comment out  the 
>>>> '-enzymes' argument, so it uses the built-in collection of  enzymes, it 
>>>> works.
>>>>
>>>> My problem is, that I need to use some of the enzymes that are  only 
>>>> available in rebase. So how do I get this working?
>>>>
>>>> Thanks for your attention.
>>>>
>>>> Best regards,
>>>> Rasmus Ory Nielsen
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script:
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>>
>>>> ------------- EXCEPTION -------------
>>>> MSG: Bad end parameter (11). End must be less than the total  length of 
>>>> sequence (total=7)
>>>> STACK Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/ 
>>>> 5.10.0/Bio/PrimarySeq.pm:401
>>>> STACK Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>>>> STACK Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>>>> STACK Bio::Restriction::Analysis::cut /usr/local/lib/perl5/ 
>>>> site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>>>> STACK Bio::Restriction::Analysis::fragment_maps /usr/local/lib/ 
>>>> perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>>>> STACK toplevel ./restriction_test.pl:30
>>>> -------------------------------------
>>>>
>>>> [roni at ksdhcp ~]$
>>>>
>>>>
>>>> ############################################################
>>>> Output from the script with the '-enzymes' argument commented out
>>>> ############################################################
>>>>
>>>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>>>
>>>> --------------------- WARNING ---------------------
>>>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>>>> ---------------------------------------------------
>>>> $VAR1 = [
>>>>          {
>>>>            'seq' => 'CTCGACCGTTAGCAA',
>>>>            'end' => 15,
>>>>            'start' => '1'
>>>>          },
>>>>          {
>>>>            'seq' => 'AGCTTTCTACCGTTATCGT',
>>>>            'end' => 34,
>>>>            'start' => '16'
>>>>          }
>>>>        ];
>>>> [roni at ksdhcp ~]$
>>>>
>>>> ############################################################
>>>>
>>>> #!/usr/bin/perl
>>>> use strict;
>>>> use warnings;
>>>> use Bio::PrimarySeq;
>>>> use Bio::Restriction::IO;
>>>> use Bio::Restriction::Analysis;
>>>> use Data::Dumper;
>>>>
>>>> # create seq obj
>>>> my $seqobj = new Bio::PrimarySeq(
>>>>    -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>>>    -primary_id => 'test',
>>>>    -molecule   => 'dna'
>>>> );
>>>>
>>>> # read rebase file
>>>> my $rebase_io = Bio::Restriction::IO->new(
>>>>    -file   => 'withrefm.906',
>>>>    -format => 'withrefm',
>>>> );
>>>> my $rebase_collection = $rebase_io->read;
>>>>
>>>> # start restriction analysis
>>>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>>>    -seq     => $seqobj,
>>>>    -enzymes => $rebase_collection,    # it works with this line  commented 
>>>> out
>>>> );
>>>>
>>>> # retrieve fragment maps
>>>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>>>> print Dumper \@fragment_maps;
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From MEC at stowers.org  Tue Jun 16 20:13:33 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Tue, 16 Jun 2009 15:13:33 -0500
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<FB532969-4EE0-4D37-B14B-5BC806A95FFE@illinois.edu>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A389@exchmb-02.stowers-institute.org>

Chris!

erm, yeah, I do....

... and I will schedule some time to code up a test and add it to AlignI's suite....

Malcolm
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Tuesday, June 16, 2009 2:08 PM
> To: Malcolm Cook
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Alignment->slice() issue?
> 
> Sounds to me like a BioPerl bug.  Do you have some example 
> data demonstrating the problem?
> 
> chris
> 
> On Jun 16, 2009, at 3:06 AM, Malcolm Cook wrote:
> 
> > Kevin,
> >
> > I'm getting struck by this old issue you once coded around.
> >
> >      http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> >
> > Any chance you could share your implementation with  fellow 
> > traveller...
> >
> > ??
> >
> > Thanks,
> >
> > Malcolm Cook
> > Stowers insitute for Medical research
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jun 17 02:47:39 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 16 Jun 2009 22:47:39 -0400
Subject: [Bioperl-l] Bio::Restriction refactor
	[Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
In-Reply-To: <E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
Message-ID: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>

Dear All,

The refactored Bio::Restriction::* has been merged to trunk, with all
tests passing. [Anyone got a cigarette?]

cheers,
Mark

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 7:49 PM
Subject: [Bioperl-l] Bio::Restriction refactor [Was:Bio::Restriction::Analysis. 
Exception when using rebasefile.]


> Dear All,
>
> The revamped Bio::Restriction::* in branch
>
> REPOS/bioperl-live/branches/restriction-refactor
>
> passes all existing tests, including those in t/Restriction.
> New tests will be added within the next day or so.
> The original bug occurred because only a subset of
> the possible rebase withrefm-formatted enzymes were
> handled; it choked on freshly-downloaded rebase
> files because of this.
>
> The refactored version now handles *all* rebase types,
> including those of rebase forms
>
> XXX^X                [ intrasite cutters, the main types
>                               built in to base.pm]
> XXXX(m/n)          [ right-end extrasite cutters ]
> (s/t)XXXX            [ left-end ditto ]
> (s/t)XXXX(m/n)    [ double-end ditto],
>
> palindromic and non-palindromic, as well as multisite
> enzymes that string together combinations of these
> forms. Much rationalization (well, seems rational to me
> anyway) and cruft removal in the affected code has also
> occurred. itype2.pm has been updated as well, to
> conform to the refactoring.
>
> If you're dying to try this now, get a working copy
> of the branch like so
>
> $ svn co 
> svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor 
> bioperl-rr
> $ cd bioperl-rr
> $ perl Build.PL
> $ ./Build test
> $ ./Build install
>
> This will only hammer your current installation in the
> $SITE_LIB/Bio/Restriction path; I worked only on
> a sparse checkout of the necessary files. To revert to your
> old install, do
>
> $ cd $MY_OLD_BIOPERL_WORKINGDIR
> $ ./Build install
>
> [In the possible event that these instructions are in error,
> there will be a response on this list in a matter of
> milliseconds, so stand by.]
>
> Happy coding-
> Mark
>
>
>
>
> ----- Original Message ----- 
> From: "Rasmus Ory Nielsen" <ron at ron.dk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 10, 2009 3:35 AM
> Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using 
> rebasefile.
>
>
>> Hi,
>>
>> This is my first time using bioperl for restriction analysis, so please bear 
>> with me, if this is a FAQ.
>>
>> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created the 
>> script shown at the bottom of the mail.
>> My bioperl version is bioperl-live nightly from 09-Jun-2009.
>>
>> The scripts throws an exception - see below. But, if I comment out the 
>> '-enzymes' argument, so it uses the built-in collection of enzymes, it works.
>>
>> My problem is, that I need to use some of the enzymes that are only available 
>> in rebase. So how do I get this working?
>>
>> Thanks for your attention.
>>
>> Best regards,
>> Rasmus Ory Nielsen
>>
>>
>> ############################################################
>> Output from the script:
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>>
>> ------------- EXCEPTION -------------
>> MSG: Bad end parameter (11). End must be less than the total length of 
>> sequence (total=7)
>> STACK Bio::PrimarySeq::subseq 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
>> STACK Bio::Restriction::Analysis::_enzyme_sites 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
>> STACK Bio::Restriction::Analysis::_cuts 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
>> STACK Bio::Restriction::Analysis::cut 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
>> STACK Bio::Restriction::Analysis::fragment_maps 
>> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
>> STACK toplevel ./restriction_test.pl:30
>> -------------------------------------
>>
>> [roni at ksdhcp ~]$
>>
>>
>> ############################################################
>> Output from the script with the '-enzymes' argument commented out
>> ############################################################
>>
>> [roni at ksdhcp ~]$ ./restriction_test.pl
>>
>> --------------------- WARNING ---------------------
>> MSG: The enzyme name CviKI-1 was changed to CviKI-I
>> ---------------------------------------------------
>> $VAR1 = [
>>           {
>>             'seq' => 'CTCGACCGTTAGCAA',
>>             'end' => 15,
>>             'start' => '1'
>>           },
>>           {
>>             'seq' => 'AGCTTTCTACCGTTATCGT',
>>             'end' => 34,
>>             'start' => '16'
>>           }
>>         ];
>> [roni at ksdhcp ~]$
>>
>> ############################################################
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>> use Bio::PrimarySeq;
>> use Bio::Restriction::IO;
>> use Bio::Restriction::Analysis;
>> use Data::Dumper;
>>
>> # create seq obj
>> my $seqobj = new Bio::PrimarySeq(
>>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
>>     -primary_id => 'test',
>>     -molecule   => 'dna'
>> );
>>
>> # read rebase file
>> my $rebase_io = Bio::Restriction::IO->new(
>>     -file   => 'withrefm.906',
>>     -format => 'withrefm',
>> );
>> my $rebase_collection = $rebase_io->read;
>>
>> # start restriction analysis
>> my $restriction_analysis = Bio::Restriction::Analysis->new(
>>     -seq     => $seqobj,
>>     -enzymes => $rebase_collection,    # it works with this line commented 
>> out
>> );
>>
>> # retrieve fragment maps
>> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
>> print Dumper \@fragment_maps;
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From Russell.Smithies at agresearch.co.nz  Wed Jun 17 03:21:22 2009
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 17 Jun 2009 15:21:22 +1200
Subject: [Bioperl-l] Bio::Restriction
	refactor	[Was:Bio::Restriction::Analysis. Exception when
	using rebasefile.]
In-Reply-To: <9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
References: <4A2F622D.5060500@ron.dk>
	<E80E6C1BC08D4E338739148BFE9BFAC0@NewLife>
	<9B199A62F5A741CCBC0B927D10DF1A0D@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF3297FF3E2E4@exchsth.agresearch.co.nz>

Cigarettes are post-coitus and pre-firing squad.
What you'd be needing is a cigar (proud father)

;-)

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Wednesday, 17 June 2009 2:48 p.m.
> To: bioperl-l at lists.open-bio.org
> Cc: Rasmus Ory Nielsen
> Subject: Re: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis. Exception when using rebasefile.]
> 
> Dear All,
> 
> The refactored Bio::Restriction::* has been merged to trunk, with all
> tests passing. [Anyone got a cigarette?]
> 
> cheers,
> Mark
> 
> ----- Original Message -----
> From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Rasmus Ory Nielsen" <ron at ron.dk>; <bioperl-l at lists.open-bio.org>
> Sent: Monday, June 15, 2009 7:49 PM
> Subject: [Bioperl-l] Bio::Restriction refactor
> [Was:Bio::Restriction::Analysis.
> Exception when using rebasefile.]
> 
> 
> > Dear All,
> >
> > The revamped Bio::Restriction::* in branch
> >
> > REPOS/bioperl-live/branches/restriction-refactor
> >
> > passes all existing tests, including those in t/Restriction.
> > New tests will be added within the next day or so.
> > The original bug occurred because only a subset of
> > the possible rebase withrefm-formatted enzymes were
> > handled; it choked on freshly-downloaded rebase
> > files because of this.
> >
> > The refactored version now handles *all* rebase types,
> > including those of rebase forms
> >
> > XXX^X                [ intrasite cutters, the main types
> >                               built in to base.pm]
> > XXXX(m/n)          [ right-end extrasite cutters ]
> > (s/t)XXXX            [ left-end ditto ]
> > (s/t)XXXX(m/n)    [ double-end ditto],
> >
> > palindromic and non-palindromic, as well as multisite
> > enzymes that string together combinations of these
> > forms. Much rationalization (well, seems rational to me
> > anyway) and cruft removal in the affected code has also
> > occurred. itype2.pm has been updated as well, to
> > conform to the refactoring.
> >
> > If you're dying to try this now, get a working copy
> > of the branch like so
> >
> > $ svn co
> > svn://code.open-bio.org/bioperl/bioperl-live/branches/restriction-refactor
> > bioperl-rr
> > $ cd bioperl-rr
> > $ perl Build.PL
> > $ ./Build test
> > $ ./Build install
> >
> > This will only hammer your current installation in the
> > $SITE_LIB/Bio/Restriction path; I worked only on
> > a sparse checkout of the necessary files. To revert to your
> > old install, do
> >
> > $ cd $MY_OLD_BIOPERL_WORKINGDIR
> > $ ./Build install
> >
> > [In the possible event that these instructions are in error,
> > there will be a response on this list in a matter of
> > milliseconds, so stand by.]
> >
> > Happy coding-
> > Mark
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Rasmus Ory Nielsen" <ron at ron.dk>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Wednesday, June 10, 2009 3:35 AM
> > Subject: [Bioperl-l] Bio::Restriction::Analysis. Exception when using
> > rebasefile.
> >
> >
> >> Hi,
> >>
> >> This is my first time using bioperl for restriction analysis, so please
> bear
> >> with me, if this is a FAQ.
> >>
> >> I downloaded withrefm.906 from ftp://ftp.neb.com/pub/rebase/ and created
> the
> >> script shown at the bottom of the mail.
> >> My bioperl version is bioperl-live nightly from 09-Jun-2009.
> >>
> >> The scripts throws an exception - see below. But, if I comment out the
> >> '-enzymes' argument, so it uses the built-in collection of enzymes, it
> works.
> >>
> >> My problem is, that I need to use some of the enzymes that are only
> available
> >> in rebase. So how do I get this working?
> >>
> >> Thanks for your attention.
> >>
> >> Best regards,
> >> Rasmus Ory Nielsen
> >>
> >>
> >> ############################################################
> >> Output from the script:
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >>
> >> ------------- EXCEPTION -------------
> >> MSG: Bad end parameter (11). End must be less than the total length of
> >> sequence (total=7)
> >> STACK Bio::PrimarySeq::subseq
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/PrimarySeq.pm:401
> >> STACK Bio::Restriction::Analysis::_enzyme_sites
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:900
> >> STACK Bio::Restriction::Analysis::_cuts
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:801
> >> STACK Bio::Restriction::Analysis::cut
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:379
> >> STACK Bio::Restriction::Analysis::fragment_maps
> >> /usr/local/lib/perl5/site_perl/5.10.0/Bio/Restriction/Analysis.pm:515
> >> STACK toplevel ./restriction_test.pl:30
> >> -------------------------------------
> >>
> >> [roni at ksdhcp ~]$
> >>
> >>
> >> ############################################################
> >> Output from the script with the '-enzymes' argument commented out
> >> ############################################################
> >>
> >> [roni at ksdhcp ~]$ ./restriction_test.pl
> >>
> >> --------------------- WARNING ---------------------
> >> MSG: The enzyme name CviKI-1 was changed to CviKI-I
> >> ---------------------------------------------------
> >> $VAR1 = [
> >>           {
> >>             'seq' => 'CTCGACCGTTAGCAA',
> >>             'end' => 15,
> >>             'start' => '1'
> >>           },
> >>           {
> >>             'seq' => 'AGCTTTCTACCGTTATCGT',
> >>             'end' => 34,
> >>             'start' => '16'
> >>           }
> >>         ];
> >> [roni at ksdhcp ~]$
> >>
> >> ############################################################
> >>
> >> #!/usr/bin/perl
> >> use strict;
> >> use warnings;
> >> use Bio::PrimarySeq;
> >> use Bio::Restriction::IO;
> >> use Bio::Restriction::Analysis;
> >> use Data::Dumper;
> >>
> >> # create seq obj
> >> my $seqobj = new Bio::PrimarySeq(
> >>     -seq        => 'CTCGACCGTTAGCAAAGCTTTCTACCGTTATCGT',
> >>     -primary_id => 'test',
> >>     -molecule   => 'dna'
> >> );
> >>
> >> # read rebase file
> >> my $rebase_io = Bio::Restriction::IO->new(
> >>     -file   => 'withrefm.906',
> >>     -format => 'withrefm',
> >> );
> >> my $rebase_collection = $rebase_io->read;
> >>
> >> # start restriction analysis
> >> my $restriction_analysis = Bio::Restriction::Analysis->new(
> >>     -seq     => $seqobj,
> >>     -enzymes => $rebase_collection,    # it works with this line commented
> >> out
> >> );
> >>
> >> # retrieve fragment maps
> >> my @fragment_maps = $restriction_analysis->fragment_maps('HindIII');
> >> print Dumper \@fragment_maps;
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From e.stupka at ucl.ac.uk  Wed Jun 17 11:29:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 12:29:08 +0100
Subject: [Bioperl-l] Next-gen modules
Message-ID: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>

Dear all,

after several years of absence I am slowly coming back to Bioperl, and  
hope to contribute again to its development.

One area that I was thinking of starting from, since we are actively  
involved with it, is to improve BIoperl's support fo next-gen  
sequencing data, tools, etc. Since I am sure I have missed out on a  
lot of recent developments, do let me know if/what is useful.

One example that comes to mind is that the conversion of various  
formats to/from FASTQ does not seem to be supported. Some code can be  
found within Li Heng's script: http://maq.sourceforge.net/ 
fq_all2std.pl but it would be good if it could make its way into  
SeqIO? And similarly, potentially, for other next-gen sequence formats?

Similarly, there seems to be little in bioperl-run to support tools  
that have been developed in this area, such as Maq, BowTie, TopHat, etc?

Do let me know if there is a past thread on this, or other people  
actively developing, etc. so that I can find out what priorities are.

thanks and best regards to all (old friends and new),

Elia

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 12:19:04 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:19:04 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>

[ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl ]
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Wed Jun 17 12:21:17 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 13:21:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <320fb6e00906170521m7d997334j321d92fda2da4114@mail.gmail.com>

On Wed, Jun 17, 2009 at 12:29 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?

If you do add FASTQ support to BioPerl's SeqIO (and I think that is a
good idea), please could you follow the format names used by Biopython
- as this time we got there first ;)

I'm asking this as Biopython's SeqIO tries to use the same format
names as BioPerl's SeqIO and EMBOSS, see
http://biopython.org/wiki/SeqIO

Specifically,
* "fastq" in Biopython means the original Sanger standard FASTQ files
encoding PHRED qualities using an ASCII offset of 33.
* "fastq-solexa" in Biopython means the early Solexa/Illumina style
FASTQ files which encode Solexa qualities using an ASCII offset of 64.
* "fastq-illumina" in Biopython will mean recent Solexa/Illumina style
FASTQ files (from pipeline version 1.3+) which encode PHRED qualities
using an ASCII offset of 64. This is in the Biopython repository, but
hasn't been released yet - so the name "fastq-illumina" isn't set in
stone yet.

For good quality reads, PHRED and Solexa scores are approximately
equal, so the "fastq-solexa" and "fastq-illumina" variants are almost
equivalent.

> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.

Have you seen these recent threads?:
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html

Regards,

Peter (at Biopython)


From maj at fortinbras.us  Wed Jun 17 12:02:11 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 08:02:11 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <92C15E3391F64BAF801754E924122540@NewLife>

Elia--
I say a definite +1; in fact, this sounds like it should be a Hot Topic 
(see http://www.bioperl.org/wiki/Category:Hot_Topics for some others
you might have missed in your hiatus...). I will create a page that 
can be a central point for wish lists, discussion, etc.

There has been much discussion of late about FASTQ 
http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html

cheers from a newbie, 
Mark

----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 7:29 AM
Subject: [Bioperl-l] Next-gen modules


> Dear all,
> 
> after several years of absence I am slowly coming back to Bioperl, and  
> hope to contribute again to its development.
> 
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
> 
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can be  
> found within Li Heng's script: http://maq.sourceforge.net/ 
> fq_all2std.pl but it would be good if it could make its way into  
> SeqIO? And similarly, potentially, for other next-gen sequence formats?
> 
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat, etc?
> 
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
> 
> thanks and best regards to all (old friends and new),
> 
> Elia
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 12:57:52 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 07:57:52 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>

Elia,

As Mark indicated, we recently discussed the lack of support for next- 
gen on list, at least re: fastq.  I may be hit with the same thing in  
a few months time myself, and I recall Jason and a few others also  
mentioning the same.  Heikki wrote some code for Illumina FASTQ for  
SeqIO and related modules but I don't believe it has been committed to  
trunk yet, so maybe he can answer.

 From prior discussions IIRC the issues were:

1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
Illumina 1.3) from one another (so maybe some optional validation), and
2) having a way for the Seq object to either 'know' what format is  
contained, or we use phred score and convert back and forth from that  
(I think the latter makes more sense).

Peter's suggestions also are reasonable, though does biopython have a  
separate module for each of these variations?  Our version (I believe)  
mainly varied the conversion within Bio::SeqIO::fastq itself based on  
the fastq variant passed in as a separate named argument.

As for the wrappers, we would most certainly welcome them!

chris

On Jun 17, 2009, at 6:29 AM, Elia Stupka wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl,  
> and hope to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively  
> involved with it, is to improve BIoperl's support fo next-gen  
> sequencing data, tools, etc. Since I am sure I have missed out on a  
> lot of recent developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various  
> formats to/from FASTQ does not seem to be supported. Some code can  
> be found within Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl 
>  but it would be good if it could make its way into SeqIO? And  
> similarly, potentially, for other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?
>
> Do let me know if there is a past thread on this, or other people  
> actively developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 12:54:22 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 13:54:22 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>

Dear Mark,

thanks a lot for the pointers.

With regards to FASTQ parsing:

-my understanding by reading past threads is to work on a single  
format, i.e. FASTQ and to interpet the quality "flavours" as just  
quality conversions, right?

-However, I assume we would still want to support a simple way for the  
user to say format => 'fastq-solexa' using the nomenclature adopted in  
BioPython suggested by Peter, right?

-I also saw Heikki's "long essay", but did not yet compare to Heng  
Li's code at http://maq.sourceforge.net/fq_all2std.pl, I guess we  
would hope they would produce identical outputs, will be a good check.

Finally, I saw Tristan's reply to Heikki's thread, so what is the  
status quo? Is it moving forward?

cheers

Elia


On 17 Jun 2009, at 13:02, Mark A. Jensen wrote:

> Elia--
> I say a definite +1; in fact, this sounds like it should be a Hot  
> Topic (see http://www.bioperl.org/wiki/Category:Hot_Topics for some  
> others
> you might have missed in your hiatus...). I will create a page that  
> can be a central point for wish lists, discussion, etc.
>
> There has been much discussion of late about FASTQ http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/029765.html
>
> cheers from a newbie, Mark
>
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From biopython at maubp.freeserve.co.uk  Wed Jun 17 13:25:59 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:25:59 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
Message-ID: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
> Elia,
>
> As Mark indicated, we recently discussed the lack of support for next-gen on
> list, at least re: fastq. ?I may be hit with the same thing in a few months
> time myself, and I recall Jason and a few others also mentioning the same.
> ?Heikki wrote some code for Illumina FASTQ for SeqIO and related modules but
> I don't believe it has been committed to trunk yet, so maybe he can answer.
>
> From prior discussions IIRC the issues were:
>
> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, Illumina
> 1.3) from one another (so maybe some optional validation), and

Following the python rule of thumb for being explicit, Biopython makes
the user specify which FASTQ variant is being used. I don't think you
can do anything else. Any attempted validation would have to be
heuristic based on the ASCII characters found, and would risk false
positive warnings.

> 2) having a way for the Seq object to either 'know' what format is
> contained, or we use phred score and convert back and forth from that (I
> think the latter makes more sense).

I think it could make sense for BioPerl to convert Solexa scores to/from
PHRED scores on the fly (especially now that Illumina is abandoning
the Solexa score system). Python style tries to avoid implicit conversions,
so Biopython doesn't automatically do a conversion from Solexa to
PHRED scores on parsing (but will on writing if the requested output
format requires this).

> Peter's suggestions also are reasonable, though does biopython have a
> separate module for each of these variations? ?Our version (I believe)
> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
> fastq variant passed in as a separate named argument.

Biopython's SeqIO gives the three FASTQ variants their own unique
names. This format name is a required argument for parsing/writing
(we don't try and guess the file format from the data contents). Internally
we have three separate FASTQ parsers/writers although they do share
code.

Other issues to keep in mind:

(3) There should be no warning parsing files where the optional repeated
title is missing on the "+" lines (as discussed earlier on the BioPerl list).

(4) When writing FASTQ files should BioPerl omit the optional repeated
title on the "+" line? Biopython omits this as I understand this to be
common practice, and can make a big different to file sizes - especially
on short read data from Solexa/Illumina.

(5) Also test reading and writing files with an optional description (as well
as an identifier) on the "@" (and "+") lines. See the NCBI SRA for examples,
e.g.

@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC


(6) Test reading and writing files where the encoded quality string starts
with a "@" or a "+" character, e.g.
http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html

Peter


From tristan.lefebure at gmail.com  Wed Jun 17 13:27:12 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 09:27:12 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <92C15E3391F64BAF801754E924122540@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
Message-ID: <200906170927.13273.tristan.lefebure@gmail.com>

Hello,
Regarding next-gen sequences and bioperl, following my 
experience, another issue is bioperl speed. For example, if 
you want to trim bad quality bases at ends of 1E6 Solexa 
reads using Bio::SeqIO::fastq and some methods in 
Bio::Seq::Quality, well, you've got to be patient (but may 
be I missed some shortcuts...).

A pure perl solution will be between 100 to 1000x faster... 
Would it be possible to have an ultra-light quality object 
with few simple methods for next-gen reads?

I can contribute some tests if that sounds like an important 
point.

-Tristan


On Wednesday 17 June 2009 08:02:11 Mark A. Jensen wrote:
> Elia--
> I say a definite +1; in fact, this sounds like it should
> be a Hot Topic (see
> http://www.bioperl.org/wiki/Category:Hot_Topics for some
> others you might have missed in your hiatus...). I will
> create a page that can be a central point for wish lists,
> discussion, etc.
>
> There has been much discussion of late about FASTQ
> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/0
>30187.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9970.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/02
>9911.html
> http://lists.open-bio.org/pipermail/bioperl-l/2009-April/
>029765.html
>
> cheers from a newbie,
> Mark
>
> ----- Original Message -----
> From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
> > Dear all,
> >
> > after several years of absence I am slowly coming back
> > to Bioperl, and hope to contribute again to its
> > development.
> >
> > One area that I was thinking of starting from, since we
> > are actively involved with it, is to improve BIoperl's
> > support fo next-gen sequencing data, tools, etc. Since
> > I am sure I have missed out on a lot of recent
> > developments, do let me know if/what is useful.
> >
> > One example that comes to mind is that the conversion
> > of various formats to/from FASTQ does not seem to be
> > supported. Some code can be found within Li Heng's
> > script: http://maq.sourceforge.net/ fq_all2std.pl but
> > it would be good if it could make its way into SeqIO?
> > And similarly, potentially, for other next-gen sequence
> > formats?
> >
> > Similarly, there seems to be little in bioperl-run to
> > support tools that have been developed in this area,
> > such as Maq, BowTie, TopHat, etc?
> >
> > Do let me know if there is a past thread on this, or
> > other people actively developing, etc. so that I can
> > find out what priorities are.
> >
> > thanks and best regards to all (old friends and new),
> >
> > Elia
> >
> > ---
> > Senior Lecturer, Bioinformatics
> > UCL Cancer Institute
> > Paul O' Gorman Building
> > University College London
> > Gower Street
> > WC1E 6BT
> > London
> > UK
> >
> > Office (UCL): +44 207 679 6493
> > Office (ICMS): +44 0207 8822374
> >
> > Mobile: +44 7597 566 194
> > Mobile (Italy): +39 338 8448801
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 17 13:54:45 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Jun 2009 14:54:45 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<E59DEB0A-9D02-4922-A300-0F81B809D6D4@ucl.ac.uk>
Message-ID: <320fb6e00906170654m735dc054iaf94fa2f86647002@mail.gmail.com>

On Wed, Jun 17, 2009 at 1:54 PM, Elia Stupka<e.stupka at ucl.ac.uk> wrote:
>
> Dear Mark,
>
> thanks a lot for the pointers.
>
> With regards to FASTQ parsing:
>
> -my understanding by reading past threads is to work on a single format,
> i.e. FASTQ and to interpet the quality "flavours" as just quality
> conversions, right?
> -However, I assume we would still want to support a simple way for the user
> to say format => 'fastq-solexa' using the nomenclature adopted in BioPython
> suggested by Peter, right?

I think you will need a way for the user to say they have a Solexa, or
an Illumina 1.3+, or an original Sanger standard FASTQ file.

>From reading the http://bioperl.org/wiki/HOWTO:SeqIO wiki page, I
assumed BioPerl's SeqIO just had formats (e.g. the "chadoxml" format
and the variant
"flybase_chadoxml" format). Does BioPerl's SeqIO format system have any
concept of flavour that I am not aware of?

> -I also saw Heikki's "long essay", but did not yet compare to Heng Li's code
> at http://maq.sourceforge.net/fq_all2std.pl, I guess we would hope they
> would produce identical outputs, will be a good check.

Heng Li's code at http://maq.sourceforge.net/fq_all2std.pl is a useful
guide (although it doesn't yet cope with the new Illumina 1.3+ variant),
but I don't trust it 100%. See e.g.
http://lists.open-bio.org/pipermail/biopython/2009-June/005208.html
http://lists.open-bio.org/pipermail/biopython/2009-June/005209.html

Peter


From john.marshall at sanger.ac.uk  Wed Jun 17 13:28:12 2009
From: john.marshall at sanger.ac.uk (John Marshall)
Date: Wed, 17 Jun 2009 14:28:12 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>

On 17 Jun 2009, at 12:29, Elia Stupka wrote:
> Similarly, there seems to be little in bioperl-run to support tools  
> that have been developed in this area, such as Maq, BowTie, TopHat,  
> etc?

FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to submit  
in the not too distant future.  (First it needs some "blah blah"  
replaced with actual documentation and a test suite.)

Cheers,

     John

[1] http://www.ebi.ac.uk/~zerbino/velvet/


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From Kevin.M.Brown at asu.edu  Wed Jun 17 15:41:18 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 17 Jun 2009 08:41:18 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>

Warning: This is very ugly code and makes a few assumptions, such as the
alignment objects are stored in order of their start position. I made
this assumption as that is how I put them into the object to begin with.

=head1 C<slice>

Function to slice up an alignment sequence based on start and end
parameters
and returns a new alignment object.

slice($alignment, $start, $end)

=cut

sub slice
{
	my ($alignment, $start, $end, $new_align) = @_;

	$$new_align = new Bio::SimpleAlign;
	print $$alignment->no_sequences() . "\n";

	$$new_align->add_seq(
			   new Bio::LocatableSeq(
				   -seq =>
					 substr(
	
$$alignment->get_seq_by_pos(1)->seq(),
							$start - 1, $end
- $start + 1
						   ),
				   -id    =>
$$alignment->get_seq_by_pos(1)->display_id(),
				   -start =>
	
max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
				   -end => min(
	
$$alignment->get_seq_by_pos(1)->end - $start + 1,
							   $end - $start
+ 1
							  ),
				   -alphabet => 'dna',
				   -strand   =>
$$alignment->get_seq_by_pos(1)->strand()
			   )
	);

	# implement a binary search to determine a decent offset into
the alignment
	my $probe;
	
	if ($$alignment->no_sequences() <= 2) {
		$probe = $$alignment->no_sequences();
	}
	else {
	my ($L, $R) = (1, $$alignment->no_sequences());
	while (($R - $L) > 1)
	{
		$probe = floor(($R + $L) / 2);

		# gotta watch this.  Had the check backwards and so was
never going
		# in the right direction for the search.  If I reverse
these two
		# variables, then I have to either reverse the
conditions or change
		# the > to a <.
		if ($$alignment->get_seq_by_pos($probe)->start() >
$start)
		{
			$R = $probe;
		}
		else
		{
			$L = $probe;
		}
	}
	}
	# now go through the results that are after that point
	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
	{
		my $seq = $$alignment->get_seq_by_pos($i);
		last if ($seq->start() > $end);

		# Only concern ourselves with primers that land inside
the desired region
		# other primers will show up in the image maps for each
gene.
		if ($seq->start() >= $start && $seq->end() <= $end)
		{

			# values for the substr pullout of a given
sequence
			my $offset = max($start - $seq->start(), 0);
			my $length =
			  min($end, $seq->end()) - max($start,
$seq->start()) + 1;
			$$new_align->add_seq(
					 new Bio::LocatableSeq(
						 -seq   => $seq->seq(),
						 -id    =>
$seq->display_id(),
						 -start =>
max($seq->start - $start + 1, 1),
						 -end => min($seq->end -
$start + 1, $end - $start + 1),
						 -alphabet => 'dna',
						 -strand   =>
$seq->strand()
					 )
			);
		}
	}
	return 1;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Malcolm Cook
> Sent: Tuesday, June 16, 2009 1:07 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Alignment->slice() issue?
> 
> Kevin,
> 
> I'm getting struck by this old issue you once coded around.
> 
>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
> 
> Any chance you could share your implementation with  fellow 
> traveller...
> 
> ??
> 
> Thanks,
> 
> Malcolm Cook
> Stowers insitute for Medical research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jun 17 16:47:38 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 12:47:38 -0400
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
Message-ID: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>


Hi All, 

I thought I'd revisit this thread, since in the last couple weeks,
have used both techniques (bioperl-dev and branch from trunk) to
produce completed projects. My thoughts:

Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
new addition to the core api. There was no pressure to conform to the
existing api there. In particular, there was no implicit insistence to
make things work through Bio::Search::Utils, and I was free to factor
it out. The Tiling api was definitely unstable until the end, when it
was ported to the core. As I made regular reports to bioperl-l,
everything was transparent and up front, and I received excellent
suggestions there (as usual). 

For Bio::Restriction, using the branch was just as natural. Here, the
existing structure was well established, and all the work needed to
happen beneath the api. All old t/Restriction tests needed to pass,
and additional ones created for the new functionality. So here, using
bioperl-dev wasn't natural, even though some "experiments" needed to
be tried (some succeeded and some failed, as you can see in the
commentary at Bug #2855). Even though the new code turned out to
require substantial effort, the effort was required to fix a true bug
in the working core, and any fixes needed to work transparently with
respect to the users for whom this bug had not been an issue. Using
the branch made it relatively easy to merge quickly back into the core
when done, and there is a certain psychological pressure too provided
by an open branch which is helpful.

Hilmar raised the very good point in the previous discussion that
(essentially) bioperl-dev shouldn't become a sandbox with lots of
unfinished code scraps and derelict stuff that doesn't work. My view
is bioperl-dev will become a sandbox only if we treat it like
one. I've filled out the Bioperl-dev page on the wiki
(http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
some recognition to devs there whose modules become part of the
core may be a better way to insure that projects that are started on
bioperl-dev actually get finished, than to prescribe beforehand what
kinds of projects may get started. I believe this follows the adage of
liberality on what is accepted, and strictness on what is emitted.

cheers, 
MAJ


----- Original Message ----- 
From: "Hilmar Lapp" <hlapp at duke.edu>
To: "Chase Miller" <chmille4 at gmail.com>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, May 21, 2009 4:00 PM
Subject: Re: [Bioperl-l] bioperl-dev or branch?


> Moving this question to the BioPerl list, which is where we need to  
> discuss this I think. Can someone refresh my memory on what the  
> Bioperl-dev repository is or was meant for? It doesn't seem documented  
> on the wiki.
> 
> My (admittedly vague) recollection is that bioperl-dev is basically  
> for highly experimental changes or functionality.
> 
> I'm not clear why everything else shouldn't go either into the main  
> trunk or into a branch. If there is a realistic expectation for  
> something to be folded into the main trunk sooner or later, what would  
> be the reasons for not putting it into a branch of the main  
> repository? If we are putting it into a separate repository, we're  
> waiving a lot of svn's support for merging and resolving concurrent  
> edits.
> 
> I would also go actually go a step further and suggest that even if  
> this GSoC project starts out on a branch (which I can see good reasons  
> for, such as eliminating fear to disrupt something), there should be a  
> plan to move to main trunk before the end of the project. We've had a  
> good tradition in BioPerl of developing directly on the main trunk. It  
> sometimes leads to occasional disruptions when lots of tests seem  
> failing, but it also encourages development discipline and make new  
> code to melt into the BioPerl code base without requiring any extra  
> steps by someone.
> 
> Any and all thoughts or comments welcome and appreciated!
> 
> -hilmar
> 
> On May 21, 2009, at 11:26 AM, Chase Miller wrote:
> 
>> This brings me to a question about where I should have my code  
>> repository.  Originally, I was going to use Bioperl-dev, but it was  
>> brought to my attention that that repository does not normally  
>> receive daily updates and it might not be the right place for my day  
>> to day development.  An alternative would be to use something like  
>> google code on a daily basis and commit to Bioperl-dev on a weekly  
>> basis.
> 
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 17:06:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:06:44 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
Message-ID: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>


On Jun 17, 2009, at 8:25 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>
>> Elia,
>>
>> As Mark indicated, we recently discussed the lack of support for  
>> next-gen on
>> list, at least re: fastq.  I may be hit with the same thing in a  
>> few months
>> time myself, and I recall Jason and a few others also mentioning  
>> the same.
>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  
>> modules but
>> I don't believe it has been committed to trunk yet, so maybe he can  
>> answer.
>>
>> From prior discussions IIRC the issues were:
>>
>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0,  
>> Illumina
>> 1.3) from one another (so maybe some optional validation), and
>
> Following the python rule of thumb for being explicit, Biopython makes
> the user specify which FASTQ variant is being used. I don't think you
> can do anything else. Any attempted validation would have to be
> heuristic based on the ASCII characters found, and would risk false
> positive warnings.

Right; I'm thinking along the same lines.  If anything the most we  
would allow is some level of validation, so if there were a degree of  
uncertainty about the format one could set a validation flag to check  
bounds during the parse and warn if they are exceeded.

>> 2) having a way for the Seq object to either 'know' what format is
>> contained, or we use phred score and convert back and forth from  
>> that (I
>> think the latter makes more sense).
>
> I think it could make sense for BioPerl to convert Solexa scores to/ 
> from
> PHRED scores on the fly (especially now that Illumina is abandoning
> the Solexa score system). Python style tries to avoid implicit  
> conversions,
> so Biopython doesn't automatically do a conversion from Solexa to
> PHRED scores on parsing (but will on writing if the requested output
> format requires this).
>
>> Peter's suggestions also are reasonable, though does biopython have a
>> separate module for each of these variations?  Our version (I  
>> believe)
>> mainly varied the conversion within Bio::SeqIO::fastq itself based  
>> on the
>> fastq variant passed in as a separate named argument.
>
> Biopython's SeqIO gives the three FASTQ variants their own unique
> names. This format name is a required argument for parsing/writing
> (we don't try and guess the file format from the data contents).  
> Internally
> we have three separate FASTQ parsers/writers although they do share
> code.

We could easily do the same if others agree.  Actually, if we  
specified that shorthand for a variant on a format would be designated  
as -format => 'format-variant', I think we could easily hack SeqIO to  
deal with that by splitting on '-' and passing everything to the  
constructor as (-format => 'format', -variant => 'variant').  Very  
little repeated code in this case, just an additional named parameter  
indicating the format variant (and the SeqIO class can do the type  
checking on that within the constructor).

> Other issues to keep in mind:
>
> (3) There should be no warning parsing files where the optional  
> repeated
> title is missing on the "+" lines (as discussed earlier on the  
> BioPerl list).

Agreed, though we'll have to check the current fastq parser to see if  
that's currently the case.  I thought that was fixed but maybe not?

> (4) When writing FASTQ files should BioPerl omit the optional repeated
> title on the "+" line? Biopython omits this as I understand this to be
> common practice, and can make a big different to file sizes -  
> especially
> on short read data from Solexa/Illumina.

Agreed, particularly if it's commonly encountered.

> (5) Also test reading and writing files with an optional description  
> (as well
> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  
> examples,
> e.g.
>
> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC

Should be easy enough to implement with a simple regex.

> (6) Test reading and writing files where the encoded quality string  
> starts
> with a "@" or a "+" character, e.g.
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>
> Peter

Mark, getting all that? ;>

chris


From cjfields at illinois.edu  Wed Jun 17 17:09:54 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:09:54 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>


On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

The key issues affecting speed in bioperl are contained object  
instantiation and inheritance (and between those two, the latter much  
more so as it plays a role with contained objects as well as the  
container).

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

Moose/Perl6 roles/traits are one way around that issue, but we are a  
ways off from getting that running.  I think to get that working  
decently would be a from-ground-up endeavor (see my past posts on  
biomoose/bioperl6).

> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan

The quality objects themselves I don't think are that heavy; I think  
the main impediment is inheritance.  One could get around that a bit  
by using a direct_new method to create a blessed hash directly, then  
reimplement methods to lazily create any objects contained on the fly.

chris


From bill at genenformics.com  Wed Jun 17 17:03:16 2009
From: bill at genenformics.com (bill at genenformics.com)
Date: Wed, 17 Jun 2009 10:03:16 -0700
Subject: [Bioperl-l] Alignment->slice() issue?
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
References: <cfebe0020906160106s5f76501au9d67d8ee3062d166@mail.gmail.com>
	<1A4207F8295607498283FE9E93B775B4060B9718@EX02.asurite.ad.asu.edu>
Message-ID: <92dadb76ce7d7b8eeb4644b47ef1a81f.squirrel@mail.dreamhost.com>

Hopefully this is helpful.

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/seqalign/Dense_seg.cpp#L648

Bill at genenformics

> Warning: This is very ugly code and makes a few assumptions, such as the
> alignment objects are stored in order of their start position. I made
> this assumption as that is how I put them into the object to begin with.
>
> =head1 C<slice>
>
> Function to slice up an alignment sequence based on start and end
> parameters
> and returns a new alignment object.
>
> slice($alignment, $start, $end)
>
> =cut
>
> sub slice
> {
> 	my ($alignment, $start, $end, $new_align) = @_;
>
> 	$$new_align = new Bio::SimpleAlign;
> 	print $$alignment->no_sequences() . "\n";
>
> 	$$new_align->add_seq(
> 			   new Bio::LocatableSeq(
> 				   -seq =>
> 					 substr(
>
> $$alignment->get_seq_by_pos(1)->seq(),
> 							$start - 1, $end
> - $start + 1
> 						   ),
> 				   -id    =>
> $$alignment->get_seq_by_pos(1)->display_id(),
> 				   -start =>
>
> max($$alignment->get_seq_by_pos(1)->start - $start + 1, 1),
> 				   -end => min(
>
> $$alignment->get_seq_by_pos(1)->end - $start + 1,
> 							   $end - $start
> + 1
> 							  ),
> 				   -alphabet => 'dna',
> 				   -strand   =>
> $$alignment->get_seq_by_pos(1)->strand()
> 			   )
> 	);
>
> 	# implement a binary search to determine a decent offset into
> the alignment
> 	my $probe;
>
> 	if ($$alignment->no_sequences() <= 2) {
> 		$probe = $$alignment->no_sequences();
> 	}
> 	else {
> 	my ($L, $R) = (1, $$alignment->no_sequences());
> 	while (($R - $L) > 1)
> 	{
> 		$probe = floor(($R + $L) / 2);
>
> 		# gotta watch this.  Had the check backwards and so was
> never going
> 		# in the right direction for the search.  If I reverse
> these two
> 		# variables, then I have to either reverse the
> conditions or change
> 		# the > to a <.
> 		if ($$alignment->get_seq_by_pos($probe)->start() >
> $start)
> 		{
> 			$R = $probe;
> 		}
> 		else
> 		{
> 			$L = $probe;
> 		}
> 	}
> 	}
> 	# now go through the results that are after that point
> 	for (my $i = $probe ; $i <= $$alignment->no_sequences() ; $i++)
> 	{
> 		my $seq = $$alignment->get_seq_by_pos($i);
> 		last if ($seq->start() > $end);
>
> 		# Only concern ourselves with primers that land inside
> the desired region
> 		# other primers will show up in the image maps for each
> gene.
> 		if ($seq->start() >= $start && $seq->end() <= $end)
> 		{
>
> 			# values for the substr pullout of a given
> sequence
> 			my $offset = max($start - $seq->start(), 0);
> 			my $length =
> 			  min($end, $seq->end()) - max($start,
> $seq->start()) + 1;
> 			$$new_align->add_seq(
> 					 new Bio::LocatableSeq(
> 						 -seq   => $seq->seq(),
> 						 -id    =>
> $seq->display_id(),
> 						 -start =>
> max($seq->start - $start + 1, 1),
> 						 -end => min($seq->end -
> $start + 1, $end - $start + 1),
> 						 -alphabet => 'dna',
> 						 -strand   =>
> $seq->strand()
> 					 )
> 			);
> 		}
> 	}
> 	return 1;
> }
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Malcolm Cook
>> Sent: Tuesday, June 16, 2009 1:07 AM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Alignment->slice() issue?
>>
>> Kevin,
>>
>> I'm getting struck by this old issue you once coded around.
>>
>>       http://bioperl.org/pipermail/bioperl-l/2007-January/024665.html
>>
>> Any chance you could share your implementation with  fellow
>> traveller...
>>
>> ??
>>
>> Thanks,
>>
>> Malcolm Cook
>> Stowers insitute for Medical research
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From maj at fortinbras.us  Wed Jun 17 17:13:23 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 13:13:23 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>

I'm on the case! (but maybe not in realtime, today!)

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Peter" <biopython at maubp.freeserve.co.uk>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" 
<e.stupka at ucl.ac.uk>; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
Sent: Wednesday, June 17, 2009 1:06 PM
Subject: Re: [Bioperl-l] Next-gen modules


>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>> On Wed, Jun 17, 2009 at 1:57 PM, Chris Fields<cjfields at illinois.edu>  wrote:
>>>
>>> Elia,
>>>
>>> As Mark indicated, we recently discussed the lack of support for  next-gen 
>>> on
>>> list, at least re: fastq.  I may be hit with the same thing in a  few months
>>> time myself, and I recall Jason and a few others also mentioning  the same.
>>>  Heikki wrote some code for Illumina FASTQ for SeqIO and related  modules 
>>> but
>>> I don't believe it has been committed to trunk yet, so maybe he can  answer.
>>>
>>> From prior discussions IIRC the issues were:
>>>
>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina 1.0, 
>>> Illumina
>>> 1.3) from one another (so maybe some optional validation), and
>>
>> Following the python rule of thumb for being explicit, Biopython makes
>> the user specify which FASTQ variant is being used. I don't think you
>> can do anything else. Any attempted validation would have to be
>> heuristic based on the ASCII characters found, and would risk false
>> positive warnings.
>
> Right; I'm thinking along the same lines.  If anything the most we  would 
> allow is some level of validation, so if there were a degree of  uncertainty 
> about the format one could set a validation flag to check  bounds during the 
> parse and warn if they are exceeded.
>
>>> 2) having a way for the Seq object to either 'know' what format is
>>> contained, or we use phred score and convert back and forth from  that (I
>>> think the latter makes more sense).
>>
>> I think it could make sense for BioPerl to convert Solexa scores to/ from
>> PHRED scores on the fly (especially now that Illumina is abandoning
>> the Solexa score system). Python style tries to avoid implicit  conversions,
>> so Biopython doesn't automatically do a conversion from Solexa to
>> PHRED scores on parsing (but will on writing if the requested output
>> format requires this).
>>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations?  Our version (I  believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based  on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).  Internally
>> we have three separate FASTQ parsers/writers although they do share
>> code.
>
> We could easily do the same if others agree.  Actually, if we  specified that 
> shorthand for a variant on a format would be designated  as -format => 
> 'format-variant', I think we could easily hack SeqIO to  deal with that by 
> splitting on '-' and passing everything to the  constructor as (-format => 
> 'format', -variant => 'variant').  Very  little repeated code in this case, 
> just an additional named parameter  indicating the format variant (and the 
> SeqIO class can do the type  checking on that within the constructor).
>
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional  repeated
>> title is missing on the "+" lines (as discussed earlier on the  BioPerl 
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if  that's 
> currently the case.  I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes -  especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description  (as 
>> well
>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA for  examples,
>> e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string  starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From e.stupka at ucl.ac.uk  Wed Jun 17 17:49:38 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 18:49:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
Message-ID: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>

I would suggest developing the "standard" version first, then moving  
onto potential optimizations.

When we went through a similar argument in Ensembl about 8 years ago  
we ended up dropping Bio::Root completely...

If one is truly after performance for these large next-gen projects,  
it'd be down to pure piping, shell, and worrying about location and  
copying of files, sticking to systems-level as much as possible, and  
quite far from Bioperl altogether, so I think it's a whole different  
level of optimization issues, probably outside the scope of Bioperl.

Elia

On 17 Jun 2009, at 18:09, Chris Fields wrote:

>
> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>
>> Hello,
>> Regarding next-gen sequences and bioperl, following my
>> experience, another issue is bioperl speed. For example, if
>> you want to trim bad quality bases at ends of 1E6 Solexa
>> reads using Bio::SeqIO::fastq and some methods in
>> Bio::Seq::Quality, well, you've got to be patient (but may
>> be I missed some shortcuts...).
>
> The key issues affecting speed in bioperl are contained object  
> instantiation and inheritance (and between those two, the latter  
> much more so as it plays a role with contained objects as well as  
> the container).
>
> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>
> Moose/Perl6 roles/traits are one way around that issue, but we are a  
> ways off from getting that running.  I think to get that working  
> decently would be a from-ground-up endeavor (see my past posts on  
> biomoose/bioperl6).
>
>> A pure perl solution will be between 100 to 1000x faster...
>> Would it be possible to have an ultra-light quality object
>> with few simple methods for next-gen reads?
>>
>> I can contribute some tests if that sounds like an important
>> point.
>>
>> -Tristan
>
> The quality objects themselves I don't think are that heavy; I think  
> the main impediment is inheritance.  One could get around that a bit  
> by using a direct_new method to create a blessed hash directly, then  
> reimplement methods to lazily create any objects contained on the fly.
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 17:52:49 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 12:52:49 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
Message-ID: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>

I think this is a top priority for a fall BioPerl release, maybe 1.6.2  
(I am planning on a summer 1.6.1 release still).  Made it into a bug  
report for tracking:

http://bugzilla.open-bio.org/show_bug.cgi?id=2857

If no one works on this I may take it up after the 1.6.1 release.

chris

On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:

> I'm on the case! (but maybe not in realtime, today!)
>
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
> >
> To: "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
> Sent: Wednesday, June 17, 2009 1:06 PM
> Subject: Re: [Bioperl-l] Next-gen modules
>
>
>>
>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>
>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>> Fields<cjfields at illinois.edu>  wrote:
>>>>
>>>> Elia,
>>>>
>>>> As Mark indicated, we recently discussed the lack of support for   
>>>> next-gen on
>>>> list, at least re: fastq.  I may be hit with the same thing in a   
>>>> few months
>>>> time myself, and I recall Jason and a few others also mentioning   
>>>> the same.
>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>> modules but
>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>> can  answer.
>>>>
>>>> From prior discussions IIRC the issues were:
>>>>
>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>> 1.0, Illumina
>>>> 1.3) from one another (so maybe some optional validation), and
>>>
>>> Following the python rule of thumb for being explicit, Biopython  
>>> makes
>>> the user specify which FASTQ variant is being used. I don't think  
>>> you
>>> can do anything else. Any attempted validation would have to be
>>> heuristic based on the ASCII characters found, and would risk false
>>> positive warnings.
>>
>> Right; I'm thinking along the same lines.  If anything the most we   
>> would allow is some level of validation, so if there were a degree  
>> of  uncertainty about the format one could set a validation flag to  
>> check  bounds during the parse and warn if they are exceeded.
>>
>>>> 2) having a way for the Seq object to either 'know' what format is
>>>> contained, or we use phred score and convert back and forth from   
>>>> that (I
>>>> think the latter makes more sense).
>>>
>>> I think it could make sense for BioPerl to convert Solexa scores  
>>> to/ from
>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>> the Solexa score system). Python style tries to avoid implicit   
>>> conversions,
>>> so Biopython doesn't automatically do a conversion from Solexa to
>>> PHRED scores on parsing (but will on writing if the requested output
>>> format requires this).
>>>
>>>> Peter's suggestions also are reasonable, though does biopython  
>>>> have a
>>>> separate module for each of these variations?  Our version (I   
>>>> believe)
>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>> based  on the
>>>> fastq variant passed in as a separate named argument.
>>>
>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>> names. This format name is a required argument for parsing/writing
>>> (we don't try and guess the file format from the data contents).   
>>> Internally
>>> we have three separate FASTQ parsers/writers although they do share
>>> code.
>>
>> We could easily do the same if others agree.  Actually, if we   
>> specified that shorthand for a variant on a format would be  
>> designated  as -format => 'format-variant', I think we could easily  
>> hack SeqIO to  deal with that by splitting on '-' and passing  
>> everything to the  constructor as (-format => 'format', -variant =>  
>> 'variant').  Very  little repeated code in this case, just an  
>> additional named parameter  indicating the format variant (and the  
>> SeqIO class can do the type  checking on that within the  
>> constructor).
>>
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional   
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the   
>>> BioPerl list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if  that's currently the case.  I thought that was fixed but maybe  
>> not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -   
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description  (as well
>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>> for  examples,
>>> e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string  starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 18:01:28 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:01:28 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu><320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<129A87FC74254873A6CEB1CEB2ADAF6F@NewLife>
	<16E48B50-88FC-4B6D-9FDD-CF7FDE6BAEAA@illinois.edu>
Message-ID: <E0FAC5DB-470E-48E1-A30F-B64E2E63EB86@ucl.ac.uk>

If we reach a consensus on how/who/what, I will be happy to contribute  
some coding time in the coming days.

Would it be a good starting point to start adding the different  
formats as named in BioPython, and test support for reading/wrting  
them? I could start playing with that.

regards,

Elia

On 17 Jun 2009, at 18:52, Chris Fields wrote:

> I think this is a top priority for a fall BioPerl release, maybe  
> 1.6.2 (I am planning on a summer 1.6.1 release still).  Made it into  
> a bug report for tracking:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2857
>
> If no one works on this I may take it up after the 1.6.1 release.
>
> chris
>
> On Jun 17, 2009, at 12:13 PM, Mark A. Jensen wrote:
>
>> I'm on the case! (but maybe not in realtime, today!)
>>
>> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu 
>> >
>> To: "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>; "Elia Stupka" <e.stupka at ucl.ac.uk 
>> >; "Heikki Lehvaslaiho" <heikki at sanbi.ac.za>
>> Sent: Wednesday, June 17, 2009 1:06 PM
>> Subject: Re: [Bioperl-l] Next-gen modules
>>
>>
>>>
>>> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>>>
>>>> On Wed, Jun 17, 2009 at 1:57 PM, Chris  
>>>> Fields<cjfields at illinois.edu>  wrote:
>>>>>
>>>>> Elia,
>>>>>
>>>>> As Mark indicated, we recently discussed the lack of support  
>>>>> for  next-gen on
>>>>> list, at least re: fastq.  I may be hit with the same thing in  
>>>>> a  few months
>>>>> time myself, and I recall Jason and a few others also  
>>>>> mentioning  the same.
>>>>> Heikki wrote some code for Illumina FASTQ for SeqIO and related   
>>>>> modules but
>>>>> I don't believe it has been committed to trunk yet, so maybe he  
>>>>> can  answer.
>>>>>
>>>>> From prior discussions IIRC the issues were:
>>>>>
>>>>> 1) distinguishing the various FASTQ versions (Sanger, Illumina  
>>>>> 1.0, Illumina
>>>>> 1.3) from one another (so maybe some optional validation), and
>>>>
>>>> Following the python rule of thumb for being explicit, Biopython  
>>>> makes
>>>> the user specify which FASTQ variant is being used. I don't think  
>>>> you
>>>> can do anything else. Any attempted validation would have to be
>>>> heuristic based on the ASCII characters found, and would risk false
>>>> positive warnings.
>>>
>>> Right; I'm thinking along the same lines.  If anything the most  
>>> we  would allow is some level of validation, so if there were a  
>>> degree of  uncertainty about the format one could set a validation  
>>> flag to check  bounds during the parse and warn if they are  
>>> exceeded.
>>>
>>>>> 2) having a way for the Seq object to either 'know' what format is
>>>>> contained, or we use phred score and convert back and forth  
>>>>> from  that (I
>>>>> think the latter makes more sense).
>>>>
>>>> I think it could make sense for BioPerl to convert Solexa scores  
>>>> to/ from
>>>> PHRED scores on the fly (especially now that Illumina is abandoning
>>>> the Solexa score system). Python style tries to avoid implicit   
>>>> conversions,
>>>> so Biopython doesn't automatically do a conversion from Solexa to
>>>> PHRED scores on parsing (but will on writing if the requested  
>>>> output
>>>> format requires this).
>>>>
>>>>> Peter's suggestions also are reasonable, though does biopython  
>>>>> have a
>>>>> separate module for each of these variations?  Our version (I   
>>>>> believe)
>>>>> mainly varied the conversion within Bio::SeqIO::fastq itself  
>>>>> based  on the
>>>>> fastq variant passed in as a separate named argument.
>>>>
>>>> Biopython's SeqIO gives the three FASTQ variants their own unique
>>>> names. This format name is a required argument for parsing/writing
>>>> (we don't try and guess the file format from the data contents).   
>>>> Internally
>>>> we have three separate FASTQ parsers/writers although they do share
>>>> code.
>>>
>>> We could easily do the same if others agree.  Actually, if we   
>>> specified that shorthand for a variant on a format would be  
>>> designated  as -format => 'format-variant', I think we could  
>>> easily hack SeqIO to  deal with that by splitting on '-' and  
>>> passing everything to the  constructor as (-format => 'format', - 
>>> variant => 'variant').  Very  little repeated code in this case,  
>>> just an additional named parameter  indicating the format variant  
>>> (and the SeqIO class can do the type  checking on that within the  
>>> constructor).
>>>
>>>> Other issues to keep in mind:
>>>>
>>>> (3) There should be no warning parsing files where the optional   
>>>> repeated
>>>> title is missing on the "+" lines (as discussed earlier on the   
>>>> BioPerl list).
>>>
>>> Agreed, though we'll have to check the current fastq parser to see  
>>> if  that's currently the case.  I thought that was fixed but maybe  
>>> not?
>>>
>>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>>> repeated
>>>> title on the "+" line? Biopython omits this as I understand this  
>>>> to be
>>>> common practice, and can make a big different to file sizes -   
>>>> especially
>>>> on short read data from Solexa/Illumina.
>>>
>>> Agreed, particularly if it's commonly encountered.
>>>
>>>> (5) Also test reading and writing files with an optional  
>>>> description  (as well
>>>> as an identifier) on the "@" (and "+") lines. See the NCBI SRA  
>>>> for  examples,
>>>> e.g.
>>>>
>>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>>
>>> Should be easy enough to implement with a simple regex.
>>>
>>>> (6) Test reading and writing files where the encoded quality  
>>>> string  starts
>>>> with a "@" or a "+" character, e.g.
>>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>>
>>>> Peter
>>>
>>> Mark, getting all that? ;>
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From tristan.lefebure at gmail.com  Wed Jun 17 18:09:42 2009
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Wed, 17 Jun 2009 14:09:42 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <200906171409.42558.tristan.lefebure@gmail.com>

Thanks both for the light.

That probably means that the place bioperl will take in the 
handling of the next-gen sequencing raw data (i.e. reads) is 
very limited, nope? (at least until bioperl6). A single GA2 
solexa lane generates about 9 million reads, and I would 
really not called that a big project...

BTW, is there a simple way to see object instantiation and 
inheritance, as well as time consumption for each, when once 
calls next_seq() (or any other method)?

-Tristan

On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
> I would suggest developing the "standard" version first,
> then moving onto potential optimizations.
>
> When we went through a similar argument in Ensembl about
> 8 years ago we ended up dropping Bio::Root completely...
>
> If one is truly after performance for these large
> next-gen projects, it'd be down to pure piping, shell,
> and worrying about location and copying of files,
> sticking to systems-level as much as possible, and quite
> far from Bioperl altogether, so I think it's a whole
> different level of optimization issues, probably outside
> the scope of Bioperl.
>
> Elia
>
> On 17 Jun 2009, at 18:09, Chris Fields wrote:
> > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
> >> Hello,
> >> Regarding next-gen sequences and bioperl, following my
> >> experience, another issue is bioperl speed. For
> >> example, if you want to trim bad quality bases at ends
> >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
> >> methods in Bio::Seq::Quality, well, you've got to be
> >> patient (but may be I missed some shortcuts...).
> >
> > The key issues affecting speed in bioperl are contained
> > object instantiation and inheritance (and between those
> > two, the latter much more so as it plays a role with
> > contained objects as well as the container).
> >
> > http://www.bioperl.org/wiki/Why_BioPerl_is_slow
> >
> > Moose/Perl6 roles/traits are one way around that issue,
> > but we are a ways off from getting that running.  I
> > think to get that working decently would be a
> > from-ground-up endeavor (see my past posts on
> > biomoose/bioperl6).
> >
> >> A pure perl solution will be between 100 to 1000x
> >> faster... Would it be possible to have an ultra-light
> >> quality object with few simple methods for next-gen
> >> reads?
> >>
> >> I can contribute some tests if that sounds like an
> >> important point.
> >>
> >> -Tristan
> >
> > The quality objects themselves I don't think are that
> > heavy; I think the main impediment is inheritance.  One
> > could get around that a bit by using a direct_new
> > method to create a blessed hash directly, then
> > reimplement methods to lazily create any objects
> > contained on the fly.
> >
> > chris
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801


From bix at sendu.me.uk  Wed Jun 17 18:20:00 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 19:20:00 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <4A3933D0.4040808@sendu.me.uk>

Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my 
> experience, another issue is bioperl speed. For example, if 
> you want to trim bad quality bases at ends of 1E6 Solexa 
> reads using Bio::SeqIO::fastq and some methods in 
> Bio::Seq::Quality, well, you've got to be patient (but may 
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant 
set of users out there who are dealing with next-gen sequencing and 
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at 
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster... 
> Would it be possible to have an ultra-light quality object 
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the 
speedup is to not create any Bio::Seq* objects but just return the data 
directly. At that point it's not taking much advantage of BioPerl. But 
certainly it could be done...


From e.stupka at ucl.ac.uk  Wed Jun 17 18:39:08 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 19:39:08 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <8C661293-DF7D-4262-970A-92AF0015BB04@ucl.ac.uk>

We are using bioperl for simple pre and post-processing of data for  
full Solexa runs, and although it might not be ideal, the scripting  
with Bioperl is not a major killer. When I was referring to large,  
heavy pipelines I was thinking of pipelines that deal with many Solexa  
runs as one project (e.g. 1000 genomes) who really cannot afford any  
bottleneck in their pipelines, because that affects directly their  
storage.

cheers

Elia


On 17 Jun 2009, at 19:09, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...
>
> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan
>
> On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
>> I would suggest developing the "standard" version first,
>> then moving onto potential optimizations.
>>
>> When we went through a similar argument in Ensembl about
>> 8 years ago we ended up dropping Bio::Root completely...
>>
>> If one is truly after performance for these large
>> next-gen projects, it'd be down to pure piping, shell,
>> and worrying about location and copying of files,
>> sticking to systems-level as much as possible, and quite
>> far from Bioperl altogether, so I think it's a whole
>> different level of optimization issues, probably outside
>> the scope of Bioperl.
>>
>> Elia
>>
>> On 17 Jun 2009, at 18:09, Chris Fields wrote:
>>> On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my
>>>> experience, another issue is bioperl speed. For
>>>> example, if you want to trim bad quality bases at ends
>>>> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
>>>> methods in Bio::Seq::Quality, well, you've got to be
>>>> patient (but may be I missed some shortcuts...).
>>>
>>> The key issues affecting speed in bioperl are contained
>>> object instantiation and inheritance (and between those
>>> two, the latter much more so as it plays a role with
>>> contained objects as well as the container).
>>>
>>> http://www.bioperl.org/wiki/Why_BioPerl_is_slow
>>>
>>> Moose/Perl6 roles/traits are one way around that issue,
>>> but we are a ways off from getting that running.  I
>>> think to get that working decently would be a
>>> from-ground-up endeavor (see my past posts on
>>> biomoose/bioperl6).
>>>
>>>> A pure perl solution will be between 100 to 1000x
>>>> faster... Would it be possible to have an ultra-light
>>>> quality object with few simple methods for next-gen
>>>> reads?
>>>>
>>>> I can contribute some tests if that sounds like an
>>>> important point.
>>>>
>>>> -Tristan
>>>
>>> The quality objects themselves I don't think are that
>>> heavy; I think the main impediment is inheritance.  One
>>> could get around that a bit by using a direct_new
>>> method to create a blessed hash directly, then
>>> reimplement methods to lazily create any objects
>>> contained on the fly.
>>>
>>> chris
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From cjfields at illinois.edu  Wed Jun 17 18:40:05 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 13:40:05 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906171409.42558.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
	<200906171409.42558.tristan.lefebure@gmail.com>
Message-ID: <63B608B2-8DE0-4FD1-9E15-339FD226D7AB@illinois.edu>

On Jun 17, 2009, at 1:09 PM, Tristan Lefebure wrote:

> Thanks both for the light.
>
> That probably means that the place bioperl will take in the
> handling of the next-gen sequencing raw data (i.e. reads) is
> very limited, nope? (at least until bioperl6). A single GA2
> solexa lane generates about 9 million reads, and I would
> really not called that a big project...

I don't think it's impossible.  If you parse any very long list of  
sequences in order it will be very slow, yes, but if they were indexed  
or loaded into a DB lookups would of course be magnitudes faster.

We already have perl-based indexing for fastq (Bio::Index::Fastq), so  
maybe something could be built on top of that. I haven't looked but we  
can also wrap other C/C++-based parsers as well. BioLib, for instance,  
has bindings to io_lib, so maybe that could be (ab)used in some way.

> BTW, is there a simple way to see object instantiation and
> inheritance, as well as time consumption for each, when once
> calls next_seq() (or any other method)?
>
> -Tristan

As a simple benchmark, at one point all feature tag information was  
converted into Bio::Annotations.  I reverted that behavior to be  
simple tag/value again and had a pretty decent bump:

http://www.bioperl.org/wiki/Feature_Annotation_rollback#Simple_Benchmark

Also, I tried reimplementing some parsers as generic 'event'-based  
driver/handler and they were slightly faster, the key roadblock being  
instantation again.  If I didn't create Features/Annotations I saw a  
significant speedup.  That's not entirely unexpected, as SeqFeatures  
also contain Locations (in turn that can contain subLocations) and  
(until recently) tag-based Bio::Annotation by default.  Annotations  
are collected in an Annotation::Collection and can contain other  
objects I believe (Ontology terms, etc).

The overall lesson is, if you don't have very heavy objects being  
created the overhead is actually quite small; it's only when you  
greedily instantiate everything that you run into problems.

chris


From cjfields at illinois.edu  Wed Jun 17 19:05:03 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:05:03 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<F8B12CF8-7A35-430F-A233-6DBA2992D19F@illinois.edu>
	<F995F500-2E4B-40B8-9C2D-2355E91DF65E@ucl.ac.uk>
Message-ID: <E92652A7-7622-4183-8DC3-596E6593C587@illinois.edu>

On Jun 17, 2009, at 12:49 PM, Elia Stupka wrote:

> I would suggest developing the "standard" version first, then moving  
> onto potential optimizations.

Yes, agreed.

> When we went through a similar argument in Ensembl about 8 years ago  
> we ended up dropping Bio::Root completely...

They (strangely enough) still use it in a few modules and require  
bioperl 1.2.3, but (in my experience) the latest bioperl works just  
fine.  I asked about that and never got a response.

> If one is truly after performance for these large next-gen projects,  
> it'd be down to pure piping, shell, and worrying about location and  
> copying of files, sticking to systems-level as much as possible, and  
> quite far from Bioperl altogether, so I think it's a whole different  
> level of optimization issues, probably outside the scope of Bioperl.
>
> Elia

In the end I don't think we can run it using perl alone, no, and I  
believe using BioPerl by itself will not be the optimal solution, but  
it can probably interface with something that is.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 19:14:04 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:14:04 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<76D5EDD5-6217-438E-87A5-1B7571D14FFE@sanger.ac.uk>
Message-ID: <9AC2CFC1-D7E7-4B93-9671-65C30E5AA285@ucl.ac.uk>

Excellent, I was thinking of working on Maq and BowTie as priorities.

Elia

On 17 Jun 2009, at 14:28, John Marshall wrote:

> On 17 Jun 2009, at 12:29, Elia Stupka wrote:
>> Similarly, there seems to be little in bioperl-run to support tools  
>> that have been developed in this area, such as Maq, BowTie, TopHat,  
>> etc?
>
> FYI I have a Bio::Tools::Run::Velvet wrapper [1] that I plan to  
> submit in the not too distant future.  (First it needs some "blah  
> blah" replaced with actual documentation and a test suite.)
>
> Cheers,
>
>    John
>
> [1] http://www.ebi.ac.uk/~zerbino/velvet/
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1  
> 2BE._______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From michael.watson at bbsrc.ac.uk  Wed Jun 17 19:15:20 2009
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 17 Jun 2009 20:15:20 +0100
Subject: [Bioperl-l] Next-gen modules
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9508B291F1@iahce2ksrv1.iah.bbsrc.ac.uk>

In answer to your question, yes!  We have 6 illumina datasets which we have searched against sequence databases using fasta, and I used SearchIO to parse the results.  This is where BioPerl comes into its own - wrapped around fast, optimised solutions written in C or Java.  Sure, I could have written something in sed/awk/pure perl/C etc to parse out the information I needed faster, but the SearchIO solution only took a few minutes to parse a huge fasta results file, and for me (and many others, I suspect) a few minutes is not a problem.

 
________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Sendu Bala
Sent: Wed 17/06/2009 7:20 PM
To: tristan.lefebure at gmail.com
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Next-gen modules


Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant
set of users out there who are dealing with next-gen sequencing and
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at
least are probably never going to use BioPerl for the work.


> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the
speedup is to not create any Bio::Seq* objects but just return the data
directly. At that point it's not taking much advantage of BioPerl. But
certainly it could be done...
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jun 17 19:30:15 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 14:30:15 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>

On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> Hello,
>> Regarding next-gen sequences and bioperl, following my experience,  
>> another issue is bioperl speed. For example, if you want to trim  
>> bad quality bases at ends of 1E6 Solexa reads using  
>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>> you've got to be patient (but may be I missed some shortcuts...).
>
> This is my concern as well. Or, rather, is there actually a  
> significant set of users out there who are dealing with next-gen  
> sequencing and would consider using BioPerl for their work?
>
> I'm working with all the 1000-genomes data at the Sanger, and we at  
> least are probably never going to use BioPerl for the work.

Are you using pure perl or (gasp) something else?  ;>

Judging by the feedback there are definitely a set of users who would  
like to integrate nextgen into bioperl somehow, probably to take  
advantage of other aspects of bioperl.

>> A pure perl solution will be between 100 to 1000x faster... Would  
>> it be possible to have an ultra-light quality object with few  
>> simple methods for next-gen reads?
>
> The fastq parser itself already seems pretty fast. The way to get  
> the speedup is to not create any Bio::Seq* objects but just return  
> the data directly. At that point it's not taking much advantage of  
> BioPerl. But certainly it could be done...


I suppose the best way to assess what needs to be done is come up with  
a set of 'use cases' specifying what users want so we can design  
around them, otherwise we're shooting in the dark.

I'm personally wondering if this could be done as a sequence database,  
something similar in theme to Lincoln's SeqFeature::Store, but  
sequence only, and returns quality objects in a similar manner (ala  
Storable)?  Not sure whether that's feasible, but it's appears at  
least scalable.

chris


From e.stupka at ucl.ac.uk  Wed Jun 17 19:37:26 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 20:37:26 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4C3D793879C64A5E84C67FE313C86FA4@NewLife>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<4C3D793879C64A5E84C67FE313C86FA4@NewLife>
Message-ID: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>

Dear all,

I tried to summarize today's discussion with what seems to be the  
"shaping consensus" on the Wiki page:

http://www.bioperl.org/wiki/Nextgen_in_Bioperl

good night,

Elia


On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:

> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>  ]
> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, June 17, 2009 7:29 AM
> Subject: [Bioperl-l] Next-gen modules
>
>
>> Dear all,
>> after several years of absence I am slowly coming back to Bioperl,  
>> and  hope to contribute again to its development.
>> One area that I was thinking of starting from, since we are  
>> actively  involved with it, is to improve BIoperl's support fo next- 
>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>> on a  lot of recent developments, do let me know if/what is useful.
>> One example that comes to mind is that the conversion of various   
>> formats to/from FASTQ does not seem to be supported. Some code can  
>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>> fq_all2std.pl but it would be good if it could make its way into   
>> SeqIO? And similarly, potentially, for other next-gen sequence  
>> formats?
>> Similarly, there seems to be little in bioperl-run to support  
>> tools  that have been developed in this area, such as Maq, BowTie,  
>> TopHat, etc?
>> Do let me know if there is a past thread on this, or other people   
>> actively developing, etc. so that I can find out what priorities are.
>> thanks and best regards to all (old friends and new),
>> Elia
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From e.stupka at ucl.ac.uk  Wed Jun 17 20:06:35 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:06:35 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>

Interesting that you mention the database issue. We found that for  
specific memory/CPU intenstive things we also switch to using dbs. For  
example, after many years of loyal use of disconnected_ranges we  
switched to a simple SQL implementation of it, because of the large  
performance gains it would give us.  Similarly in Ensembl as well as  
in the old days of bioperl-db we opted for doing subseq within SQL  
where possible.

Some lean way of SQL'izing specific components could be less  
"disruptive" than avoiding object creation and provide significant  
gains in performance. Could be set as an optional flag, and could use  
temporary ad hoc SQL databases?

Still, priority now is to make SeqIO compliant with all those formats,  
than we can worry about performance :)

Elia

On 17 Jun 2009, at 20:30, Chris Fields wrote:

> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience,  
>>> another issue is bioperl speed. For example, if you want to trim  
>>> bad quality bases at ends of 1E6 Solexa reads using  
>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>> you've got to be patient (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a  
>> significant set of users out there who are dealing with next-gen  
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at  
>> least are probably never going to use BioPerl for the work.
>
> Are you using pure perl or (gasp) something else?  ;>
>
> Judging by the feedback there are definitely a set of users who  
> would like to integrate nextgen into bioperl somehow, probably to  
> take advantage of other aspects of bioperl.
>
>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>> it be possible to have an ultra-light quality object with few  
>>> simple methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get  
>> the speedup is to not create any Bio::Seq* objects but just return  
>> the data directly. At that point it's not taking much advantage of  
>> BioPerl. But certainly it could be done...
>
>
> I suppose the best way to assess what needs to be done is come up  
> with a set of 'use cases' specifying what users want so we can  
> design around them, otherwise we're shooting in the dark.
>
> I'm personally wondering if this could be done as a sequence  
> database, something similar in theme to Lincoln's SeqFeature::Store,  
> but sequence only, and returns quality objects in a similar manner  
> (ala Storable)?  Not sure whether that's feasible, but it's appears  
> at least scalable.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 20:29:31 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:29:31 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk><4C3D793879C64A5E84C67FE313C86FA4@NewLife>
	<540FFE96-A177-4A56-A574-50052569F39E@ucl.ac.uk>
Message-ID: <1C89D353AD0B4D219515BF1EAAA1FFB5@NewLife>

Thanks Elia for those wiki notes--
[I would say you received an enthusiatic 'welcome back'!]
cheers, 
Mark
----- Original Message ----- 
From: "Elia Stupka" <e.stupka at ucl.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, June 17, 2009 3:37 PM
Subject: Re: [Bioperl-l] Next-gen modules


> Dear all,
> 
> I tried to summarize today's discussion with what seems to be the  
> "shaping consensus" on the Wiki page:
> 
> http://www.bioperl.org/wiki/Nextgen_in_Bioperl
> 
> good night,
> 
> Elia
> 
> 
> On 17 Jun 2009, at 13:19, Mark A. Jensen wrote:
> 
>> [ and here's a new wikipage: http://www.bioperl.org/wiki/Nextgen_in_Bioperl 
>>  ]
>> ----- Original Message ----- From: "Elia Stupka" <e.stupka at ucl.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, June 17, 2009 7:29 AM
>> Subject: [Bioperl-l] Next-gen modules
>>
>>
>>> Dear all,
>>> after several years of absence I am slowly coming back to Bioperl,  
>>> and  hope to contribute again to its development.
>>> One area that I was thinking of starting from, since we are  
>>> actively  involved with it, is to improve BIoperl's support fo next- 
>>> gen  sequencing data, tools, etc. Since I am sure I have missed out  
>>> on a  lot of recent developments, do let me know if/what is useful.
>>> One example that comes to mind is that the conversion of various   
>>> formats to/from FASTQ does not seem to be supported. Some code can  
>>> be  found within Li Heng's script: http://maq.sourceforge.net/  
>>> fq_all2std.pl but it would be good if it could make its way into   
>>> SeqIO? And similarly, potentially, for other next-gen sequence  
>>> formats?
>>> Similarly, there seems to be little in bioperl-run to support  
>>> tools  that have been developed in this area, such as Maq, BowTie,  
>>> TopHat, etc?
>>> Do let me know if there is a past thread on this, or other people   
>>> actively developing, etc. so that I can find out what priorities are.
>>> thanks and best regards to all (old friends and new),
>>> Elia
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
> 
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
> 
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
> 
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Wed Jun 17 20:35:38 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 15:35:38 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>

So, #1 priority is to get fastq up-to-speed, then maybe assess other  
options.

Illuminating discussion, thanks Elia!

urgh, excuse unintended bad pun above...

chris

On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Interesting that you mention the database issue. We found that for  
> specific memory/CPU intenstive things we also switch to using dbs.  
> For example, after many years of loyal use of disconnected_ranges we  
> switched to a simple SQL implementation of it, because of the large  
> performance gains it would give us.  Similarly in Ensembl as well as  
> in the old days of bioperl-db we opted for doing subseq within SQL  
> where possible.
>
> Some lean way of SQL'izing specific components could be less  
> "disruptive" than avoiding object creation and provide significant  
> gains in performance. Could be set as an optional flag, and could  
> use temporary ad hoc SQL databases?
>
> Still, priority now is to make SeqIO compliant with all those  
> formats, than we can worry about performance :)
>
> Elia
>
> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>>
>> Are you using pure perl or (gasp) something else?  ;>
>>
>> Judging by the feedback there are definitely a set of users who  
>> would like to integrate nextgen into bioperl somehow, probably to  
>> take advantage of other aspects of bioperl.
>>
>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>>
>>
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>>
>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From e.stupka at ucl.ac.uk  Wed Jun 17 20:36:31 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Wed, 17 Jun 2009 21:36:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>

Better than colorspaced discussions for sure ;)

Elia

On 17 Jun 2009, at 21:35, Chris Fields wrote:

> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
>
> Illuminating discussion, thanks Elia!
>
> urgh, excuse unintended bad pun above...
>
> chris
>
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges  
>> we switched to a simple SQL implementation of it, because of the  
>> large performance gains it would give us.  Similarly in Ensembl as  
>> well as in the old days of bioperl-db we opted for doing subseq  
>> within SQL where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>> Would it be possible to have an ultra-light quality object with  
>>>>> few simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just  
>>>> return the data directly. At that point it's not taking much  
>>>> advantage of BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From maj at fortinbras.us  Wed Jun 17 20:54:00 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 17 Jun 2009 16:54:00 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife><200906170927.13273.tristan.lefebure@gmail.com><4A3933D0.4040808@sendu.me.uk><8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu><0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
Message-ID: <2B2A7A587B0F488DAA18E80A1BFD671B@NewLife>

unintended! Does that mean your delete key's broke...?
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Elia Stupka" <e.stupka at ucl.ac.uk>
Cc: <bioperl-l at lists.open-bio.org>; <tristan.lefebure at gmail.com>
Sent: Wednesday, June 17, 2009 4:35 PM
Subject: Re: [Bioperl-l] Next-gen modules


> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
> 
> Illuminating discussion, thanks Elia!
> 
> urgh, excuse unintended bad pun above...
> 
> chris
> 
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
> 
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges we  
>> switched to a simple SQL implementation of it, because of the large  
>> performance gains it would give us.  Similarly in Ensembl as well as  
>> in the old days of bioperl-db we opted for doing subseq within SQL  
>> where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>>> it be possible to have an ultra-light quality object with few  
>>>>> simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just return  
>>>> the data directly. At that point it's not taking much advantage of  
>>>> BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From hartzell at alerce.com  Wed Jun 17 20:40:03 2009
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 17 Jun 2009 13:40:03 -0700
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3933D0.4040808@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
Message-ID: <19001.21667.127519.462899@already.dhcp.gene.com>

Sendu Bala writes:
 > Tristan Lefebure wrote:
 > > Hello,
 > > Regarding next-gen sequences and bioperl, following my 
 > > experience, another issue is bioperl speed. For example, if 
 > > you want to trim bad quality bases at ends of 1E6 Solexa 
 > > reads using Bio::SeqIO::fastq and some methods in 
 > > Bio::Seq::Quality, well, you've got to be patient (but may 
 > > be I missed some shortcuts...).
 > 
 > This is my concern as well. Or, rather, is there actually a significant 
 > set of users out there who are dealing with next-gen sequencing and 
 > would consider using BioPerl for their work?
 > 
 > I'm working with all the 1000-genomes data at the Sanger, and we at 
 > least are probably never going to use BioPerl for the work.
 > [...]

Is it purely a speed issue, or are there other issues (e.g. stability,
correctness, compatibility) that are contributing to your decision?

What *are* you using?

g.


From bix at sendu.me.uk  Wed Jun 17 22:10:57 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:10:57 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
Message-ID: <4A3969F1.8080002@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> Hello,
>>> Regarding next-gen sequences and bioperl, following my experience, 
>>> another issue is bioperl speed. For example, if you want to trim bad 
>>> quality bases at ends of 1E6 Solexa reads using Bio::SeqIO::fastq and 
>>> some methods in Bio::Seq::Quality, well, you've got to be patient 
>>> (but may be I missed some shortcuts...).
>>
>> This is my concern as well. Or, rather, is there actually a 
>> significant set of users out there who are dealing with next-gen 
>> sequencing and would consider using BioPerl for their work?
>>
>> I'm working with all the 1000-genomes data at the Sanger, and we at 
>> least are probably never going to use BioPerl for the work.
> 
> Are you using pure perl or (gasp) something else?  ;>

We use some perl stuff, some C stuff. My own stuff is OO perl, but much 
lighter weight than BioPerl. Absolute minimal object creation.


>>> A pure perl solution will be between 100 to 1000x faster... Would it 
>>> be possible to have an ultra-light quality object with few simple 
>>> methods for next-gen reads?
>>
>> The fastq parser itself already seems pretty fast. The way to get the 
>> speedup is to not create any Bio::Seq* objects but just return the 
>> data directly. At that point it's not taking much advantage of 
>> BioPerl. But certainly it could be done...
> 
> I suppose the best way to assess what needs to be done is come up with a 
> set of 'use cases' specifying what users want so we can design around 
> them, otherwise we're shooting in the dark.

Indeed. Though at least I think we can all agree it would be nice to 
have the functionality there even if it's slow. There will always be at 
least some use-cases where the run speed doesn't matter.


> I'm personally wondering if this could be done as a sequence database, 
> something similar in theme to Lincoln's SeqFeature::Store, but sequence 
> only, and returns quality objects in a similar manner (ala Storable)?  
> Not sure whether that's feasible, but it's appears at least scalable.

I think not. Well, at least SeqFeature::Store doesn't scale. Try storing 
millions of features in a database and watch it crawl to complete 
unusability. I can't imagine a db scaling to holding hundreds of TB of 
data either. I'm also not sure what the benefit is. There are already 
high-speed ways of indexing your fastq or bam files.


From bix at sendu.me.uk  Wed Jun 17 22:24:50 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 17 Jun 2009 23:24:50 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <19001.21667.127519.462899@already.dhcp.gene.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
Message-ID: <4A396D32.5070909@sendu.me.uk>

George Hartzell wrote:
> Sendu Bala writes:
>  > Tristan Lefebure wrote:
>  > > Hello,
>  > > Regarding next-gen sequences and bioperl, following my 
>  > > experience, another issue is bioperl speed. For example, if 
>  > > you want to trim bad quality bases at ends of 1E6 Solexa 
>  > > reads using Bio::SeqIO::fastq and some methods in 
>  > > Bio::Seq::Quality, well, you've got to be patient (but may 
>  > > be I missed some shortcuts...).
>  > 
>  > This is my concern as well. Or, rather, is there actually a significant 
>  > set of users out there who are dealing with next-gen sequencing and 
>  > would consider using BioPerl for their work?
>  > 
>  > I'm working with all the 1000-genomes data at the Sanger, and we at 
>  > least are probably never going to use BioPerl for the work.
>  > [...]
> 
> Is it purely a speed issue, or are there other issues (e.g. stability,
> correctness, compatibility) that are contributing to your decision?

Too heavy-weight, too slow, too memory intensive, missing too much 
functionality in any case. If I have to write new parsers and wrappers, 
I may as well make them fast (which means they don't "fit" into BioPerl).


> What *are* you using?

There are already great tools written in C that do all the heavy lifting 
and the rest is done in perl written for speed and low memory.


From cjfields at illinois.edu  Wed Jun 17 22:38:26 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 17:38:26 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A3969F1.8080002@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
Message-ID: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>

On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>> Tristan Lefebure wrote:
>>>> Hello,
>>>> Regarding next-gen sequences and bioperl, following my  
>>>> experience, another issue is bioperl speed. For example, if you  
>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>
>>> This is my concern as well. Or, rather, is there actually a  
>>> significant set of users out there who are dealing with next-gen  
>>> sequencing and would consider using BioPerl for their work?
>>>
>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>> at least are probably never going to use BioPerl for the work.
>> Are you using pure perl or (gasp) something else?  ;>
>
> We use some perl stuff, some C stuff. My own stuff is OO perl, but  
> much lighter weight than BioPerl. Absolute minimal object creation.

Makes sense.

>>>> A pure perl solution will be between 100 to 1000x faster... Would  
>>>> it be possible to have an ultra-light quality object with few  
>>>> simple methods for next-gen reads?
>>>
>>> The fastq parser itself already seems pretty fast. The way to get  
>>> the speedup is to not create any Bio::Seq* objects but just return  
>>> the data directly. At that point it's not taking much advantage of  
>>> BioPerl. But certainly it could be done...
>> I suppose the best way to assess what needs to be done is come up  
>> with a set of 'use cases' specifying what users want so we can  
>> design around them, otherwise we're shooting in the dark.
>
> Indeed. Though at least I think we can all agree it would be nice to  
> have the functionality there even if it's slow. There will always be  
> at least some use-cases where the run speed doesn't matter.

Agreed.

>> I'm personally wondering if this could be done as a sequence  
>> database, something similar in theme to Lincoln's  
>> SeqFeature::Store, but sequence only, and returns quality objects  
>> in a similar manner (ala Storable)?  Not sure whether that's  
>> feasible, but it's appears at least scalable.
>
> I think not. Well, at least SeqFeature::Store doesn't scale. Try  
> storing millions of features in a database and watch it crawl to  
> complete unusability. I can't imagine a db scaling to holding  
> hundreds of TB of data either. I'm also not sure what the benefit  
> is. There are already high-speed ways of indexing your fastq or bam  
> files.

Interesting that you ran into issues with SF::Store; wonder if object  
storage is the limiting factor there, or if it is something else.  
Anyone else having this issue?

chris


From cjfields at illinois.edu  Thu Jun 18 01:08:55 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 17 Jun 2009 20:08:55 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A396D32.5070909@sendu.me.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>	<200906170927.13273.tristan.lefebure@gmail.com>	<4A3933D0.4040808@sendu.me.uk>
	<19001.21667.127519.462899@already.dhcp.gene.com>
	<4A396D32.5070909@sendu.me.uk>
Message-ID: <03A96F40-27CD-4D38-9A4A-04AB4CECC8DE@illinois.edu>

On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris


From jbarrick at msu.edu  Thu Jun 18 03:10:43 2009
From: jbarrick at msu.edu (Jeffrey Barrick)
Date: Wed, 17 Jun 2009 23:10:43 -0400
Subject: [Bioperl-l] svn error
Message-ID: <7C1A481F-275E-4E08-AA1B-036BC708D5E1@msu.edu>

Hi all,

I've been trying to download the latest version of "bioperl-live"  
through svn as per the instructions at [http://www.bioperl.org/wiki/Using_Subversion 
] and I keep getting an "svn: Found malformed header in revision file"  
error when it gets to "bioperl-live/t/RemoteDB/EMBL.t", causing it to  
stop prematurely.

I also get the error when trying to browse that directory, for example:
http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk/t/RemoteDB

Any ideas?

Thanks,
   --Jeff


From hlapp at gmx.net  Thu Jun 18 01:51:16 2009
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 17 Jun 2009 20:51:16 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
Message-ID: <C8873056-793B-4FEE-94EE-3341087478D1@gmx.net>


On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:

> Similarly in Ensembl as well as in the old days of bioperl-db we  
> opted for doing subseq within SQL where possible.


BTW Bioperl-db still lazy-loads sequences, and does subseq in SQL,  
unless you manipulate the sequence, or make it a non-persistent object.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Jun 18 06:45:17 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 18 Jun 2009 07:45:17 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<4A3969F1.8080002@sendu.me.uk>
	<550FACEA-FE90-4160-AA44-F2706C1F4CB9@illinois.edu>
Message-ID: <4A39E27D.9040807@sendu.me.uk>

Chris Fields wrote:
> On Jun 17, 2009, at 5:10 PM, Sendu Bala wrote:
 >
>>> I'm personally wondering if this could be done as a sequence 
>>> database, something similar in theme to Lincoln's SeqFeature::Store, 
>>> but sequence only, and returns quality objects in a similar manner 
>>> (ala Storable)?  Not sure whether that's feasible, but it's appears 
>>> at least scalable.
>>
>> I think not. Well, at least SeqFeature::Store doesn't scale. Try 
>> storing millions of features in a database and watch it crawl to 
>> complete unusability. I can't imagine a db scaling to holding hundreds 
>> of TB of data either. I'm also not sure what the benefit is. There are 
>> already high-speed ways of indexing your fastq or bam files.
> 
> Interesting that you ran into issues with SF::Store; wonder if object 
> storage is the limiting factor there, or if it is something else.

Object storage certainly was an issue, which is why I patched it to 
(optionally) not store objects. That helped a great deal, but ultimately 
only increased the number of features you could store before it slowed 
down; it didn't solve the problem completely.


From Xianjun.Dong at bccs.uib.no  Thu Jun 18 10:15:47 2009
From: Xianjun.Dong at bccs.uib.no (Xianjun Dong)
Date: Thu, 18 Jun 2009 12:15:47 +0200
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
 for Bio::Graphics::Glyph
In-Reply-To: <4A33D850.1020203@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no>
Message-ID: <4A3A13D3.7050208@ii.uib.no>

Hi, Scott,

Do you mind to have a look of the code (below my signature) if I use the 
-postgrid callback correctly?
I still cannnot get the background for the whole panel.

Thanks

Xianjun


Xianjun Dong wrote:
> Hi, Scott
>
> Before I gave up my own whole solution to use GBrowse, I still want to 
> bother you once:
>
> As you suggested, I put -postgrid option when the panel, which will 
> call a function to draw the background. The code below is almost 
> copied from the online POD of Bio::Graphics::Panel (see 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html 
> )
>
> But it still does not work. Could you help to have a look? I paste it 
> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while 
> the gap drawing function is gap_it, not draw_gap. I guess it's a typo. 
> or not?)
>
> THanks
>
> Xianjun
>
> ----------------------------------------------- mytestcode.pl 
> --------------------------
>
> #!/usr/bin/perl
>
> use strict;
> use lib "$ENV{HOME}/lib";
>
> use Bio::Graphics;
> use Bio::Graphics::Feature;
> my $ftr= 'Bio::Graphics::Feature';
>
> # processed_transcript
> my $trans1 = 
> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
> my $trans2 = 
> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
> my $trans3 = 
> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans4 = 
> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS', 
> -source=>'a');
> my $trans5 = 
> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
> my $trans  = 
> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>
> # hightlight
> my $trans31 = 
> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
> -source=>'a');
> my $trans41 = 
> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
> -source=>'b');
>
> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>                                             -length=>1050,
>                                             -start =>0,
>                                             -pad_left=>12,
>                                             -pad_right=>12
>                                             -postgrid=>\&gap_it);
>
> sub gap_it {
>     my $gd    = shift;
>     my $panel = shift;
>     my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>     my $top                  = $panel->top;
>     my $bottom               = $gd->height, #panel->bottom;
>     my $gray                 = $panel->translate_color('red');
>     $gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
> }
> # the following track works as I expected in bioperl 1.2.3, but not in 
> 1.5 and 1.6
> #$panel->add_track([$trans41,$trans31],
> #          -glyph   => 'background',
> #                  -block_bgcolor => sub{return (shift->source eq 
> 'a')?'#cccccc':'#fffc22'},
> #                  );
>
> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>                  -glyph=>'arrow',
>                  -double=>1,
>                  -tick=>2);
>
> $panel->add_track($trans,
>          -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>                  -fgcolor => 'darkred',
>                  -bgcolor => 'darkred',
>                  -title => '$source',
>                  -link => 
> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
> #EnsEMBL
>                  );
>   print $panel->png;
>
> # the following part works in bioperl 1.5 and 1.6, but not work in 
> Bioperl 1.2.3
> my $map = $panel->create_web_map("image");
> $panel->finished();
>
>
>
>
>
>
>
>
>
>
> Scott Cain wrote:
>> Hi Xianjun,
>>
>> I understand what you want to do, as the current version of gbrowse
>> does this, which uses bioperl 1.6.  Without digging through the code,
>> I can't tell you exactly how this works and you didn't send your code
>> that uses this callback, so I can't try it either.
>>
>> One thing that is different between your code and gbrowse is that each
>> of the tracks is actually a seperate panel (to allow track dragging),
>> so it possible that this sort of callback doesn't work for
>> Bio::Graphics any more.
>>
>> Scott
>>
>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no> 
>> wrote:
>>  
>>> Hi, Scott
>>>
>>> Thanks for your reply first.
>>>
>>> I still have question: I dig out the code from GBrowse (which I 
>>> paste below). Method make_postgrid_callback gets all highlight 
>>> region and then use hilite_regions_closure function to draw them 
>>> out, using the following GD function:
>>>
>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>
>>> where the $bottom=$panel->bottom. This is the only difference from 
>>> my code, where I use $gd->height. I guess they are almost same 
>>> (except the pad_bottom), we can see this in the code of 
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22 
>>>
>>>
>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, 
>>> for my highlight regions. The output is same, when using the library 
>>> of Bioperl 1.6 (or 1.5). You can see the attached image 
>>> ("test.bioperl1.6.png")
>>>
>>> OK. I might have not explained my question explicitly. My question 
>>> is: if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 
>>> 1.2.3), I can get the right image I want (see the attached file 
>>> "test.bioperl1.2.3.png"), where the highlight range will go from the 
>>> roof to the floor. While in bioperl 1.5 (or 1.6), I only can see the 
>>> highlight region in its own track, not the whole panel. OK, did I 
>>> explain clearly now? you can see the difference of the two images.
>>>
>>> [I am not sure the mailist allow to attach image, otherwise, I put 
>>> them in the following links:
>>> test.bioperl1.6.png:    http://translog.genereg.net/test.bioperl1.6.png
>>> test.bioperl1.2.3.png:    
>>> http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>
>>> You can test it and see the difference if you have both 1.2.3 and 
>>> 1.6 on your computer?
>>>
>>> Really want to know how this works in bioperl 1.2.3 (Even though 
>>> this might be a bug at that version, or whatever)
>>>
>>> Thanks
>>>
>>> Xianjun
>>> =============================================
>>>
>>> # this generates the callback for highlighting a region
>>> sub make_postgrid_callback {
>>>  my $settings = shift;
>>>  return unless ref $settings->{h_region};
>>>
>>>  my @h_regions = map {
>>>    my ($h_ref,$h_start,$h_end,$h_color) = 
>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>    defined($h_ref) && $h_ref eq $settings->{ref}
>>>                 ? [$h_start,$h_end,$h_color||'lightgrey']
>>>                 : ()
>>>  }
>>>    @{$settings->{h_region}};
>>>
>>>  return unless @h_regions;
>>>  return hilite_regions_closure(@h_regions);
>>> }
>>>
>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>> # suitable for hilighting a region of a panel.
>>> # The args are a list of [start,end,color]
>>> sub hilite_regions_closure {
>>>  my @h_regions = @_;
>>>
>>>  return sub {
>>>    my $gd     = shift;
>>>    my $panel  = shift;
>>>    my $left   = $panel->pad_left;
>>>    my $top    = $panel->top;
>>>    my $bottom = $panel->bottom;
>>>    for my $r (@h_regions) {
>>>      my ($h_start,$h_end,$h_color) = @$r;
>>>      my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>      if ($end-$start <= 1) { $end++; $start-- } # so that we always 
>>> see something
>>>      # assuming top is 0 so as to ignore top padding
>>>      $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>                           $panel->translate_color($h_color));
>>>    }
>>>  };
>>> }
>>>
>>>
>>> Scott Cain wrote:
>>>
>>> Hello Xianjun,
>>>
>>> I don't think that approach will work.  What you almost certainly need
>>> to do is a postgrid callback that does the drawing of the highlighted
>>> region.  For example code of how to do this, take a look at the
>>> make_postgrid_callback subroutine in GBrowse 1.69.  The option
>>> -postgrid is a method of Bio::Graphics::Panel.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun 
>>> Dong<Xianjun.Dong at bccs.uib.no> wrote:
>>>
>>>
>>> HI,
>>>
>>> I am not sure this is the right place I can get help.
>>>
>>> I've suffered by a problem for several days: I want to highlight 
>>> parts of
>>> regions in my track, using a different background color. To do that, I
>>> defined a glyph named "background", based on the
>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>> method, by adding code like below:
>>>
>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>>
>>> # the script is pasted at the end
>>>
>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>> highlight regions into a list of features, and add_track with
>>> -glyph=>'background'. (see the following script, test.pl) This 
>>> really works
>>> as I expect, which will add a colored block at background of all 
>>> tracks in a
>>> panel (including the ruler arrow). You can see the output image in 
>>> attached
>>> file "test.bioperl1.2.3.png"
>>>
>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it 
>>> does not
>>> work. Well, it works, but the highlight part only shrink to a low 
>>> height,
>>> instead of covering all tracks in the panel. I also attached the output
>>> here, see the file "test.bioperl1.6.png".
>>>
>>> I tried to think about the reason, the 'background' module is based 
>>> on the
>>> generic module. What can cause the difference? Is it because 
>>> $gd->height is
>>> different, or the tracks followed with 'background' track can not 
>>> draw from
>>> the first position?
>>>
>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart 
>>> person
>>> solve problem, wise person avoid problem"...) But another problem is 
>>> coming:
>>> Bio::Graphics in Bioperl 1.2.3 does not support 
>>> $panel->create_web_map()
>>> function, which means I have to use some higher version if I want to 
>>> create
>>> web map for my graphics, but then I have to give up using highlight
>>> background.
>>>
>>> OK. It's long enough for my first-time submission here. Hope someone 
>>> can
>>> throw me some clue.
>>>
>>> Thanks ahead!!
>>>
>>> Xianjun
>>>
>>>
>>> ==================== test.pl =======================
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 = 
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 = 
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 = 
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans  =
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background', 
>>>
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass', 
>>>
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>                                            -length=>1050,
>>>                                            -start =>0,
>>>                                            -pad_left=>12,
>>>                                            -pad_right=>12);
>>>
>>> # the following track works as I expected in bioperl 1.2.3, but not 
>>> in 1.5
>>> and 1.6
>>> $panel->add_track([$trans41,$trans31],
>>>         -glyph   => 'background',
>>>                 -block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>>                 );
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>                 -glyph=>'arrow',
>>>                 -double=>1,
>>>                 -tick=>2);
>>>
>>> $panel->add_track($trans,
>>>         -glyph   => 'transcript2', # 'transcript2', #process_5utr',
>>>                 -fgcolor => 'darkred',
>>>                 -bgcolor => 'darkred',
>>>                 -title => '$source',
>>>                 -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',  
>>> #EnsEMBL
>>>                 );
>>>  print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in 
>>> Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>> 1;
>>>
>>> ==================== background.pm =======================
>>> package Bio::Graphics::Glyph::background;
>>>
>>> use strict;
>>> use base 'Bio::Graphics::Glyph::generic';
>>> sub pad_top{
>>>  return 0;
>>> }
>>>
>>> sub draw_component {
>>>  my $self = shift;
>>>  #$self->SUPER::draw_component(@_);
>>>  my ($gd,$dx,$dy) = @_;
>>>  my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>
>>>  # draw an arrow to indicate the direction of transcript
>>>  my $color = $self->option('block_bgcolor') || '#cccccc';
>>>  $gd->filledRectangle($left,0,$right,$gd->height,
>>> $self->factory->translate_color($color));
>>> }
>>>
>>> 1;
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> ==========================================
>>> Xianjun Dong
>>> PhD student, Lenhard group
>>> Computational Biology Unit
>>> Bergen Center for Computational Science
>>> University of Bergen
>>> Hoyteknologisenteret, Thormohlensgate 55
>>> N-5008 Bergen, Norway
>>> E-mail: xianjun.dong at bccs.uib.no
>>> Tel.: +47 555 84022
>>> Fax : +47 555 84295
>>> ==========================================
>>>
>>>
>>>     
>>
>>   
>

-- 
==========================================
Xianjun Dong
PhD student, Lenhard group
Computational Biology Unit
Bergen Center for Computational Science
University of Bergen
Hoyteknologisenteret, Thormohlensgate 55
N-5008 Bergen, Norway
E-mail: xianjun.dong at bccs.uib.no
Tel.: +47 555 84022
Fax : +47 555 84295
==========================================


From charles.tilford at bms.com  Thu Jun 18 13:38:34 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 09:38:34 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
Message-ID: <4A3A435A.8000505@bms.com>

Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace channels. 
Can anyone confirm?

Hi all,

I'm using the SCF Bio::SeqIO module to parse trace data out of 
chromatograms. The SCF files are being produced by phred using the "-cd" 
parameter. The traces come out great, and the corresponding base calls 
from the .phd files align with the peaks wonderfully when I visualize 
them on a rendered trace. However, only the A bases align to the 
appropriate trace channel, the rest are mixed up. I find that if I do 
the following re-mapping, the phred base calls match the

SeqIO : Remapped
A : A
C : G
G : T
T : C

The relevant part of Bio::SeqIO::scf is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9

... which indicates that it expects the pack()ed trace data to be in 
order ATGC. The base call parsing code is here:

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8

... which is unpacking in order ACGT. As far as I can tell, the relevant 
official SCF documentation is here:

http://staden.sourceforge.net/manual/formats_unix_4.html

... which indicates that both trace and base order should be ACGT 
(matching the SeqIO unpack() for bases, but not traces). My empirical 
channel unscrambling mapping implies order ACTG, which is different from 
either of the two orders above. The sequence from the SCF file (should 
be that from original AB1 file, I think) is not perfectly identical to 
that called by phred, but is very similar (to be expected); that is, I 
don't need to remap C, G and T to get it to align with the phred data.

So it looks like the SeqIO module is not mapping the sections of the 
packed trace data to the appropriate bases. The unpack order is 
different than the staden documentation ... but so is the order I impose 
to correct the problem. I am still unclear as to the differences between 
V2 and V3 of the format. The major difference appears to be coding the 
trace absolutely (V2) or relatively to prior values (V3); I'd expect if 
I was using one format and SeqIO was trying to parse the other that I 
would get garbage out. Running in verbose reports "scf.pm is working 
with a version 2 scf."

Thoughts on this would be appreciated - can anyone confirm a problem 
with trace extraction from SCF?

I'm hoping that once I convince our admin to (properly) install 
staden::read that I can work directly with the ab1 files, but I need to 
stopgap on SCF for the time being....

-CAT


From cjfields at illinois.edu  Thu Jun 18 15:31:08 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 10:31:08 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>

Charles,

The best way to make sure this is addressed is to file a ticket (bug  
report) on it so we can properly track it.  I have a local  
installation of io_lib and I believe we also have Geneious installed  
locally (both of which read SCF), so I can work on confirming that.   
If it stays on the list it may not get answered and a possible bug  
report will be lost (to possibly bite someone else later).

AFAIK this module doesn't use staden::read but is pure perl.  You are  
more than welcome to try out Bio::SeqIO::staden::read, but I have to  
warn you that most of us are looking at replacing it's functionality  
at some point with BioLib bindings to io_lib (more stable) and so we  
don't intend on following up with bug fixes.

Note: there is also Bio::SCF (non-bp):

http://search.cpan.org/~lds/Bio-SCF-1.01/

chris

On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:

> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
> channels. Can anyone confirm?
>
> Hi all,
>
> I'm using the SCF Bio::SeqIO module to parse trace data out of  
> chromatograms. The SCF files are being produced by phred using the "- 
> cd" parameter. The traces come out great, and the corresponding base  
> calls from the .phd files align with the peaks wonderfully when I  
> visualize them on a rendered trace. However, only the A bases align  
> to the appropriate trace channel, the rest are mixed up. I find that  
> if I do the following re-mapping, the phred base calls match the
>
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
>
> The relevant part of Bio::SeqIO::scf is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>
> ... which indicates that it expects the pack()ed trace data to be in  
> order ATGC. The base call parsing code is here:
>
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>
> ... which is unpacking in order ACGT. As far as I can tell, the  
> relevant official SCF documentation is here:
>
> http://staden.sourceforge.net/manual/formats_unix_4.html
>
> ... which indicates that both trace and base order should be ACGT  
> (matching the SeqIO unpack() for bases, but not traces). My  
> empirical channel unscrambling mapping implies order ACTG, which is  
> different from either of the two orders above. The sequence from the  
> SCF file (should be that from original AB1 file, I think) is not  
> perfectly identical to that called by phred, but is very similar (to  
> be expected); that is, I don't need to remap C, G and T to get it to  
> align with the phred data.
>
> So it looks like the SeqIO module is not mapping the sections of the  
> packed trace data to the appropriate bases. The unpack order is  
> different than the staden documentation ... but so is the order I  
> impose to correct the problem. I am still unclear as to the  
> differences between V2 and V3 of the format. The major difference  
> appears to be coding the trace absolutely (V2) or relatively to  
> prior values (V3); I'd expect if I was using one format and SeqIO  
> was trying to parse the other that I would get garbage out. Running  
> in verbose reports "scf.pm is working with a version 2 scf."
>
> Thoughts on this would be appreciated - can anyone confirm a problem  
> with trace extraction from SCF?
>
> I'm hoping that once I convince our admin to (properly) install  
> staden::read that I can work directly with the ab1 files, but I need  
> to stopgap on SCF for the time being....
>
> -CAT


From MEC at stowers.org  Thu Jun 18 15:42:48 2009
From: MEC at stowers.org (Cook, Malcolm)
Date: Thu, 18 Jun 2009 10:42:48 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A435A.8000505@bms.com>
References: <4A3A435A.8000505@bms.com>
Message-ID: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>

Charles,

Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF

	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm

It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.

Its not in the bioperl project but it is an easy install from CPAN.

I am familiar with staden::read installation woes.  

Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
  

#!/usr/bin/env perl

# PURPOSE: extract from AB1 files into fasta format the sequence in
# the 'clear range' defined by 3 parameters.  If there is no clear
# range, emit warning and skip the sequence.  The fasta 'defline'
# identifier is taken as the sample name.  Other useful attributes are
# also embedded into the defline using attribute=value syntax.

# USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1

# NOTE: 20 4 20 is ABI default settings

# EXAMPLE:
# ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta

# AUTHOR: malcolm_cook at stowers-institute.org

use strict;
use warnings;
use Bio::Trace::ABIF;
use Text::Wrap qw(wrap);
$Text::Wrap::columns = 72;	# wrap the sequence

use File::Basename;
my ($window_width,
    $bad_bases_threshold,
    $quality_threshold,
    @ARGV) = @ARGV;

my $abif = Bio::Trace::ABIF->new();

sub main {} {
  foreach (@ARGV) {
    $abif->open_abif($_) or die "error opening $_ as ABIF";
    my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
								   $bad_bases_threshold,
								   $quality_threshold
								  );
    my $sample_score = $abif->sample_score(
					   $window_width,
					   $bad_bases_threshold,
					   $quality_threshold
					  );
    #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
    #							       $quality_threshold,
    #							       0, # ==> trim_ends
    #							      );
    #    my $length_of_read = $abif->length_of_read(
    #				    $window_width,
    #				    $quality_threshold,
    #				    # $method
    #				   );
    my $defline = 
      join "\t", 
	$abif->sample_name,
	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
	  (map {my $method = $_;
		"$method=". ($abif->$method() || '')}
	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
	     # sample_tracking_id - don't use this - it is internal to ABI software
	     "clear_range_start=$clear_range_start",
	       "clear_range_stop=$clear_range_stop",
		 "sample_score=$sample_score",
		   #"contiguous_read_length=$contiguous_read_length",
		   #"length_of_read=$length_of_read",
		   ;
    if ($clear_range_start == -1) {
      warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
      next;
    }
    my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
    print ">$defline\n$seq\n";
    $abif->close_abif();

  }
}

main ();


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Charles Tilford
> Sent: Thursday, June 18, 2009 8:39 AM
> To: BioPerl List
> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
> 
> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
> channels. 
> Can anyone confirm?
> 
> Hi all,
> 
> I'm using the SCF Bio::SeqIO module to parse trace data out 
> of chromatograms. The SCF files are being produced by phred 
> using the "-cd" 
> parameter. The traces come out great, and the corresponding 
> base calls from the .phd files align with the peaks 
> wonderfully when I visualize them on a rendered trace. 
> However, only the A bases align to the appropriate trace 
> channel, the rest are mixed up. I find that if I do the 
> following re-mapping, the phred base calls match the
> 
> SeqIO : Remapped
> A : A
> C : G
> G : T
> T : C
> 
> The relevant part of Bio::SeqIO::scf is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE9
> 
> ... which indicates that it expects the pack()ed trace data 
> to be in order ATGC. The base call parsing code is here:
> 
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
> io/SeqIO/scf.html#CODE8
> 
> ... which is unpacking in order ACGT. As far as I can tell, 
> the relevant official SCF documentation is here:
> 
> http://staden.sourceforge.net/manual/formats_unix_4.html
> 
> ... which indicates that both trace and base order should be 
> ACGT (matching the SeqIO unpack() for bases, but not traces). 
> My empirical channel unscrambling mapping implies order ACTG, 
> which is different from either of the two orders above. The 
> sequence from the SCF file (should be that from original AB1 
> file, I think) is not perfectly identical to that called by 
> phred, but is very similar (to be expected); that is, I don't 
> need to remap C, G and T to get it to align with the phred data.
> 
> So it looks like the SeqIO module is not mapping the sections 
> of the packed trace data to the appropriate bases. The unpack 
> order is different than the staden documentation ... but so 
> is the order I impose to correct the problem. I am still 
> unclear as to the differences between
> V2 and V3 of the format. The major difference appears to be 
> coding the trace absolutely (V2) or relatively to prior 
> values (V3); I'd expect if I was using one format and SeqIO 
> was trying to parse the other that I would get garbage out. 
> Running in verbose reports "scf.pm is working with a version 2 scf."
> 
> Thoughts on this would be appreciated - can anyone confirm a 
> problem with trace extraction from SCF?
> 
> I'm hoping that once I convince our admin to (properly) 
> install staden::read that I can work directly with the ab1 
> files, but I need to stopgap on SCF for the time being....
> 
> -CAT
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From carze at som.umaryland.edu  Thu Jun 18 17:51:43 2009
From: carze at som.umaryland.edu (Cesar Arze)
Date: Thu, 18 Jun 2009 10:51:43 -0700 (PDT)
Subject: [Bioperl-l]  Problems parsing scientific name from a Genbank file
Message-ID: <24095355.post@talk.nabble.com>


Hi all,
   I've searched through the mailing list and bug-tracker looking for any
indication of this (what I presume to be) bug I have been encountering when
parsing certain Genbank files using SeqIO::GenBank but have yet to find
anything. I apologize in advance if this is something that has already been
addressed.

When parsing these files and extracting the scientific name it seems that
line breaks are causing the lineage info found in the ORGANISM section to be
captured as part of the scientific name. An example of this is accession
NC_005945:

  ORGANISM  Bacillus anthracis str. Sterne
            Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
Bacillus
            cereus group.

Bacillus cereus has a line break which then causes scientific name to
capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.

Not sure if anyone has ever ran into this problem but I would very much
appreciate any help or direction.
-- 
View this message in context: http://www.nabble.com/Problems-parsing-scientific-name-from-a-Genbank-file-tp24095355p24095355.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From charles.tilford at bms.com  Thu Jun 18 19:59:01 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 15:59:01 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
References: <4A3A435A.8000505@bms.com>
	<49F38F2D-4C57-4309-BB2C-3ED53E1ED9B5@illinois.edu>
Message-ID: <4A3A9C85.4000603@bms.com>

Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug  
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl. 
Yes, that's my understanding too. I'm using the SeqIO module because of 
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>   
I have that installed, but have not tried it out yet.

Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>   
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of  
>> chromatograms. The SCF files are being produced by phred using the "- 
>> cd" parameter. The traces come out great, and the corresponding base  
>> calls from the .phd files align with the peaks wonderfully when I  
>> visualize them on a rendered trace. However, only the A bases align  
>> to the appropriate trace channel, the rest are mixed up. I find that  
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in  
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the  
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT  
>> (matching the SeqIO unpack() for bases, but not traces). My  
>> empirical channel unscrambling mapping implies order ACTG, which is  
>> different from either of the two orders above. The sequence from the  
>> SCF file (should be that from original AB1 file, I think) is not  
>> perfectly identical to that called by phred, but is very similar (to  
>> be expected); that is, I don't need to remap C, G and T to get it to  
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the  
>> packed trace data to the appropriate bases. The unpack order is  
>> different than the staden documentation ... but so is the order I  
>> impose to correct the problem. I am still unclear as to the  
>> differences between V2 and V3 of the format. The major difference  
>> appears to be coding the trace absolutely (V2) or relatively to  
>> prior values (V3); I'd expect if I was using one format and SeqIO  
>> was trying to parse the other that I would get garbage out. Running  
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem  
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install  
>> staden::read that I can work directly with the ab1 files, but I need  
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>     
>
>
>
>   


From charles.tilford at bms.com  Thu Jun 18 20:02:53 2009
From: charles.tilford at bms.com (Charles Tilford)
Date: Thu, 18 Jun 2009 16:02:53 -0400
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
Message-ID: <4A3A9D6D.2010106@bms.com>

Cook, Malcolm wrote:
> Charles,
>
> Another possible stopgap that might work for you, if you're working with AB1 chromatograms and have ABIs kb-basecaller turned on, is to use Bio::Trace::ABIF
>
> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>
> It works great and includes implementation of ABIs algorithm allowing to (re)compute trace clear ranges using kc-basecallers quality scores and any windowing/quality parameters.
>
> Its not in the bioperl project but it is an easy install from CPAN.
>   
Thanks - we installed that a few weeks ago, and it was on my list of 
things to try, but I had not gotten to it yet since I was getting data 
out of the SCF SeqIO module. Even though the SeqIO::scf data looks ok, 
the fact that I need to unscramble it makes me nervous... Thanks, too, 
for the example code. I'll try out the Bio::Trace::ABIF module and see 
if it works with our files.

Thanks,
CAT
> I am familiar with staden::read installation woes.  
>
> Below is a quick script I wrote that employs it... it could be better parameterized, but it might be useful to you "out of the box"....
>
> Malcolm Cook
> Stowers Institute for Medical Research - Kansas City, Missouri
>   
>
> #!/usr/bin/env perl
>
> # PURPOSE: extract from AB1 files into fasta format the sequence in
> # the 'clear range' defined by 3 parameters.  If there is no clear
> # range, emit warning and skip the sequence.  The fasta 'defline'
> # identifier is taken as the sample name.  Other useful attributes are
> # also embedded into the defline using attribute=value syntax.
>
> # USAGE: ABIFqtrim $window_width $bad_bases_threshold $quality_threshold f1.ab1 ... fn.ab1
>
> # NOTE: 20 4 20 is ABI default settings
>
> # EXAMPLE:
> # ABIFqtrim 20 4 20 /n/facility/Bioinformatics/Software/ABI/ab1_test_files/*.ab1 > ab1_test_files_trimmed.fasta
>
> # AUTHOR: malcolm_cook at stowers-institute.org
>
> use strict;
> use warnings;
> use Bio::Trace::ABIF;
> use Text::Wrap qw(wrap);
> $Text::Wrap::columns = 72;	# wrap the sequence
>
> use File::Basename;
> my ($window_width,
>     $bad_bases_threshold,
>     $quality_threshold,
>     @ARGV) = @ARGV;
>
> my $abif = Bio::Trace::ABIF->new();
>
> sub main {} {
>   foreach (@ARGV) {
>     $abif->open_abif($_) or die "error opening $_ as ABIF";
>     my ($clear_range_start,$clear_range_stop) = $abif->clear_range($window_width,
> 								   $bad_bases_threshold,
> 								   $quality_threshold
> 								  );
>     my $sample_score = $abif->sample_score(
> 					   $window_width,
> 					   $bad_bases_threshold,
> 					   $quality_threshold
> 					  );
>     #    my $contiguous_read_length = $abif->contiguous_read_length($window_width,
>     #							       $quality_threshold,
>     #							       0, # ==> trim_ends
>     #							      );
>     #    my $length_of_read = $abif->length_of_read(
>     #				    $window_width,
>     #				    $quality_threshold,
>     #				    # $method
>     #				   );
>     my $defline = 
>       join "\t", 
> 	$abif->sample_name,
> 	  #basename($_,qw(.ab1 .abi)), # use this to use the filename's basename in the defline
> 	  #$abif->container_identifier . ':' . $abif->well_id,  # or this, for container:well_id formatted defline identifiers
> 	  (map {my $method = $_;
> 		"$method=". ($abif->$method() || '')}
> 	   qw(sample_name comment run_name well_id container_identifier sequence_length )), #comment
> 	     # sample_tracking_id - don't use this - it is internal to ABI software
> 	     "clear_range_start=$clear_range_start",
> 	       "clear_range_stop=$clear_range_stop",
> 		 "sample_score=$sample_score",
> 		   #"contiguous_read_length=$contiguous_read_length",
> 		   #"length_of_read=$length_of_read",
> 		   ;
>     if ($clear_range_start == -1) {
>       warn "NO CLEAR RANGE! SKIPPING $_:\n\t$defline";
>       next;
>     }
>     my $seq = wrap('','',substr($abif->sequence, $clear_range_start + 1, ($clear_range_stop + 1) - ($clear_range_start + 1) + 1));
>     print ">$defline\n$seq\n";
>     $abif->close_abif();
>
>   }
> }
>
> main ();
>
>
>
>
>
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org 
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>> Charles Tilford
>> Sent: Thursday, June 18, 2009 8:39 AM
>> To: BioPerl List
>> Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
>>
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace 
>> channels. 
>> Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out 
>> of chromatograms. The SCF files are being produced by phred 
>> using the "-cd" 
>> parameter. The traces come out great, and the corresponding 
>> base calls from the .phd files align with the peaks 
>> wonderfully when I visualize them on a rendered trace. 
>> However, only the A bases align to the appropriate trace 
>> channel, the rest are mixed up. I find that if I do the 
>> following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data 
>> to be in order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/B
>> io/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, 
>> the relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be 
>> ACGT (matching the SeqIO unpack() for bases, but not traces). 
>> My empirical channel unscrambling mapping implies order ACTG, 
>> which is different from either of the two orders above. The 
>> sequence from the SCF file (should be that from original AB1 
>> file, I think) is not perfectly identical to that called by 
>> phred, but is very similar (to be expected); that is, I don't 
>> need to remap C, G and T to get it to align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections 
>> of the packed trace data to the appropriate bases. The unpack 
>> order is different than the staden documentation ... but so 
>> is the order I impose to correct the problem. I am still 
>> unclear as to the differences between
>> V2 and V3 of the format. The major difference appears to be 
>> coding the trace absolutely (V2) or relatively to prior 
>> values (V3); I'd expect if I was using one format and SeqIO 
>> was trying to parse the other that I would get garbage out. 
>> Running in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a 
>> problem with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) 
>> install staden::read that I can work directly with the ab1 
>> files, but I need to stopgap on SCF for the time being....
>>
>> -CAT
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>     


From cjfields at illinois.edu  Thu Jun 18 20:27:02 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 18 Jun 2009 15:27:02 -0500
Subject: [Bioperl-l] Bio::SeqIO::scf traces scrambled?
In-Reply-To: <4A3A9D6D.2010106@bms.com>
References: <4A3A435A.8000505@bms.com>
	<BD62CBAC4395B94096109020651BE2EC12B471A44D@exchmb-02.stowers-institute.org>
	<4A3A9D6D.2010106@bms.com>
Message-ID: <2A9A3AB7-7773-48F1-993C-A679495D0B95@illinois.edu>


On Jun 18, 2009, at 3:02 PM, Charles Tilford wrote:

> Cook, Malcolm wrote:
>> Charles,
>>
>> Another possible stopgap that might work for you, if you're working  
>> with AB1 chromatograms and have ABIs kb-basecaller turned on, is to  
>> use Bio::Trace::ABIF
>>
>> 	http://search.cpan.org/dist/Bio-Trace-ABIF/lib/Bio/Trace/ABIF.pm
>>
>> It works great and includes implementation of ABIs algorithm  
>> allowing to (re)compute trace clear ranges using kc-basecallers  
>> quality scores and any windowing/quality parameters.
>>
>> Its not in the bioperl project but it is an easy install from CPAN.
>>
> Thanks - we installed that a few weeks ago, and it was on my list of  
> things to try, but I had not gotten to it yet since I was getting  
> data out of the SCF SeqIO module. Even though the SeqIO::scf data  
> looks ok, the fact that I need to unscramble it makes me nervous...  
> Thanks, too, for the example code. I'll try out the Bio::Trace::ABIF  
> module and see if it works with our files.
>
> Thanks,
> CAT

You definitely shouldn't need to unscramble it; my guess is this is a  
legit bug that just has gone unnoticed.  I see that you have filed a  
ticket on it so we can at least track it.  Thanks!

chris


From scott at scottcain.net  Fri Jun 19 03:25:35 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:25:35 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4A3A13D3.7050208@ii.uib.no>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
Message-ID: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>

Hi Xianjun,

The attached script (which is not too different from yours--I only did
a little clean up and made the padding consistent) makes the attached
image, which is what I think you want.  I'm using bioperl-live.

Scott


On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
> Hi, Scott,
>
> Do you mind to have a look of the code (below my signature) if I use the
> -postgrid callback correctly?
> I still cannnot get the background for the whole panel.
>
> Thanks
>
> Xianjun
>
>
> Xianjun Dong wrote:
>>
>> Hi, Scott
>>
>> Before I gave up my own whole solution to use GBrowse, I still want to
>> bother you once:
>>
>> As you suggested, I put -postgrid option when the panel, which will call a
>> function to draw the background. The code below is almost copied from the
>> online POD of Bio::Graphics::Panel (see
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>> )
>>
>> But it still does not work. Could you help to have a look? I paste it
>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>
>> THanks
>>
>> Xianjun
>>
>> ----------------------------------------------- mytestcode.pl
>> --------------------------
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use lib "$ENV{HOME}/lib";
>>
>> use Bio::Graphics;
>> use Bio::Graphics::Feature;
>> my $ftr= 'Bio::Graphics::Feature';
>>
>> # processed_transcript
>> my $trans1 =
>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>> my $trans2 =
>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>> my $trans3 =
>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans4 =
>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>> -source=>'a');
>> my $trans5 =
>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>> my $trans ?=
>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>
>> # hightlight
>> my $trans31 =
>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>> -source=>'a');
>> my $trans41 =
>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>> -source=>'b');
>>
>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>
>> sub gap_it {
>> ? ?my $gd ? ?= shift;
>> ? ?my $panel = shift;
>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>> }
>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>> and 1.6
>> #$panel->add_track([$trans41,$trans31],
>> # ? ? ? ? ?-glyph ? => 'background',
>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>> 'a')?'#cccccc':'#fffc22'},
>> # ? ? ? ? ? ? ? ? ?);
>>
>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>> ? ? ? ? ? ? ? ? -double=>1,
>> ? ? ? ? ? ? ? ? -tick=>2);
>>
>> $panel->add_track($trans,
>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>> ? ? ? ? ? ? ? ? -title => '$source',
>> ? ? ? ? ? ? ? ? -link =>
>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>> ? ? ? ? ? ? ? ? );
>> ?print $panel->png;
>>
>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>> 1.2.3
>> my $map = $panel->create_web_map("image");
>> $panel->finished();
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Scott Cain wrote:
>>>
>>> Hi Xianjun,
>>>
>>> I understand what you want to do, as the current version of gbrowse
>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>> I can't tell you exactly how this works and you didn't send your code
>>> that uses this callback, so I can't try it either.
>>>
>>> One thing that is different between your code and gbrowse is that each
>>> of the tracks is actually a seperate panel (to allow track dragging),
>>> so it possible that this sort of callback doesn't work for
>>> Bio::Graphics any more.
>>>
>>> Scott
>>>
>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>> wrote:
>>>
>>>>
>>>> Hi, Scott
>>>>
>>>> Thanks for your reply first.
>>>>
>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>> hilite_regions_closure function to draw them out, using the following GD
>>>> function:
>>>>
>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>
>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>> pad_bottom), we can see this in the code of
>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>
>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>
>>>> OK. I might have not explained my question explicitly. My question is:
>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>> where the highlight range will go from the roof to the floor. While in
>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>> difference of the two images.
>>>>
>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>> in the following links:
>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>> test.bioperl1.2.3.png:
>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>
>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>> your computer?
>>>>
>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>> might be a bug at that version, or whatever)
>>>>
>>>> Thanks
>>>>
>>>> Xianjun
>>>> =============================================
>>>>
>>>> # this generates the callback for highlighting a region
>>>> sub make_postgrid_callback {
>>>> ?my $settings = shift;
>>>> ?return unless ref $settings->{h_region};
>>>>
>>>> ?my @h_regions = map {
>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>> ? ? ? ? ? ? ? ?: ()
>>>> ?}
>>>> ? @{$settings->{h_region}};
>>>>
>>>> ?return unless @h_regions;
>>>> ?return hilite_regions_closure(@h_regions);
>>>> }
>>>>
>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>> # suitable for hilighting a region of a panel.
>>>> # The args are a list of [start,end,color]
>>>> sub hilite_regions_closure {
>>>> ?my @h_regions = @_;
>>>>
>>>> ?return sub {
>>>> ? my $gd ? ? = shift;
>>>> ? my $panel ?= shift;
>>>> ? my $left ? = $panel->pad_left;
>>>> ? my $top ? ?= $panel->top;
>>>> ? my $bottom = $panel->bottom;
>>>> ? for my $r (@h_regions) {
>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>> something
>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>> ? }
>>>> ?};
>>>> }
>>>>
>>>>
>>>> Scott Cain wrote:
>>>>
>>>> Hello Xianjun,
>>>>
>>>> I don't think that approach will work. ?What you almost certainly need
>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>> region. ?For example code of how to do this, take a look at the
>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>
>>>> Scott
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>
>>>> HI,
>>>>
>>>> I am not sure this is the right place I can get help.
>>>>
>>>> I've suffered by a problem for several days: I want to highlight parts
>>>> of
>>>> regions in my track, using a different background color. To do that, I
>>>> defined a glyph named "background", based on the
>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>> method, by adding code like below:
>>>>
>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>>
>>>> # the script is pasted at the end
>>>>
>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>> highlight regions into a list of features, and add_track with
>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>> works
>>>> as I expect, which will add a colored block at background of all tracks
>>>> in a
>>>> panel (including the ruler arrow). You can see the output image in
>>>> attached
>>>> file "test.bioperl1.2.3.png"
>>>>
>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>> not
>>>> work. Well, it works, but the highlight part only shrink to a low
>>>> height,
>>>> instead of covering all tracks in the panel. I also attached the output
>>>> here, see the file "test.bioperl1.6.png".
>>>>
>>>> I tried to think about the reason, the 'background' module is based on
>>>> the
>>>> generic module. What can cause the difference? Is it because $gd->height
>>>> is
>>>> different, or the tracks followed with 'background' track can not draw
>>>> from
>>>> the first position?
>>>>
>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>> person
>>>> solve problem, wise person avoid problem"...) But another problem is
>>>> coming:
>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>> function, which means I have to use some higher version if I want to
>>>> create
>>>> web map for my graphics, but then I have to give up using highlight
>>>> background.
>>>>
>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>> throw me some clue.
>>>>
>>>> Thanks ahead!!
>>>>
>>>> Xianjun
>>>>
>>>>
>>>> ==================== test.pl =======================
>>>> #!/usr/bin/perl
>>>>
>>>> use strict;
>>>> use lib "$ENV{HOME}/lib";
>>>>
>>>> use Bio::Graphics;
>>>> use Bio::Graphics::Feature;
>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>
>>>> # processed_transcript
>>>> my $trans1 =
>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>> my $trans2 =
>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>> my $trans3 =
>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans4 =
>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>> -source=>'a');
>>>> my $trans5 =
>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>> my $trans ?=
>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>
>>>> # hightlight
>>>> my $trans31 =
>>>>
>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>> -source=>'a');
>>>> my $trans41 =
>>>>
>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>> -source=>'b');
>>>>
>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>
>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>> 1.5
>>>> and 1.6
>>>> $panel->add_track([$trans41,$trans31],
>>>> ? ? ? ?-glyph ? => 'background',
>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>> 'a')?'#cccccc':'#fffc22'},
>>>> ? ? ? ? ? ? ? ?);
>>>>
>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>
>>>> $panel->add_track($trans,
>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>> ? ? ? ? ? ? ? ?-link =>
>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>> ?#EnsEMBL
>>>> ? ? ? ? ? ? ? ?);
>>>> ?print $panel->png;
>>>>
>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>> Bioperl
>>>> 1.2.3
>>>> my $map = $panel->create_web_map("image");
>>>> $panel->finished();
>>>>
>>>> 1;
>>>>
>>>> ==================== background.pm =======================
>>>> package Bio::Graphics::Glyph::background;
>>>>
>>>> use strict;
>>>> use base 'Bio::Graphics::Glyph::generic';
>>>> sub pad_top{
>>>> ?return 0;
>>>> }
>>>>
>>>> sub draw_component {
>>>> ?my $self = shift;
>>>> ?#$self->SUPER::draw_component(@_);
>>>> ?my ($gd,$dx,$dy) = @_;
>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>
>>>> ?# draw an arrow to indicate the direction of transcript
>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>> $self->factory->translate_color($color));
>>>> }
>>>>
>>>> 1;
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ==========================================
>>>> Xianjun Dong
>>>> PhD student, Lenhard group
>>>> Computational Biology Unit
>>>> Bergen Center for Computational Science
>>>> University of Bergen
>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>> N-5008 Bergen, Norway
>>>> E-mail: xianjun.dong at bccs.uib.no
>>>> Tel.: +47 555 84022
>>>> Fax : +47 555 84295
>>>> ==========================================
>>>>
>>>>
>>>>
>>>
>>>
>>
>
> --
> ==========================================
> Xianjun Dong
> PhD student, Lenhard group
> Computational Biology Unit
> Bergen Center for Computational Science
> University of Bergen
> Hoyteknologisenteret, Thormohlensgate 55
> N-5008 Bergen, Norway
> E-mail: xianjun.dong at bccs.uib.no
> Tel.: +47 555 84022
> Fax : +47 555 84295
> ==========================================
>
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid.pl
Type: application/x-perl
Size: 2140 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgrid_highlight.png
Type: image/png
Size: 7195 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20090618/0bee0f33/attachment-0004.png>

From scott at scottcain.net  Fri Jun 19 03:30:37 2009
From: scott at scottcain.net (Scott Cain)
Date: Thu, 18 Jun 2009 23:30:37 -0400
Subject: [Bioperl-l] background layer is not supported in Bioperl 1.6
	for Bio::Graphics::Glyph
In-Reply-To: <4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
References: <4A32BCDA.4080605@ii.uib.no>
	<536f21b00906121829o7a5076f6hc57eb9a4dd815e48@mail.gmail.com>
	<4A339621.2060702@ii.uib.no>
	<4536f7700906130627r41c90de5r311e58f1d718f9ee@mail.gmail.com>
	<4A33D850.1020203@ii.uib.no> <4A3A13D3.7050208@ii.uib.no>
	<4536f7700906182025m1d67afa2y2a62a30d6cc9b19d@mail.gmail.com>
Message-ID: <4536f7700906182030n74f4293k60ad04ea62b97476@mail.gmail.com>

Actually, to be clear, that's bioperl-live and Bio::Graphics version
1.96 from CPAN.

On Thu, Jun 18, 2009 at 11:25 PM, Scott Cain<scott at scottcain.net> wrote:
> Hi Xianjun,
>
> The attached script (which is not too different from yours--I only did
> a little clean up and made the padding consistent) makes the attached
> image, which is what I think you want. ?I'm using bioperl-live.
>
> Scott
>
>
> On Thu, Jun 18, 2009 at 6:15 AM, Xianjun Dong<Xianjun.Dong at bccs.uib.no> wrote:
>> Hi, Scott,
>>
>> Do you mind to have a look of the code (below my signature) if I use the
>> -postgrid callback correctly?
>> I still cannnot get the background for the whole panel.
>>
>> Thanks
>>
>> Xianjun
>>
>>
>> Xianjun Dong wrote:
>>>
>>> Hi, Scott
>>>
>>> Before I gave up my own whole solution to use GBrowse, I still want to
>>> bother you once:
>>>
>>> As you suggested, I put -postgrid option when the panel, which will call a
>>> function to draw the background. The code below is almost copied from the
>>> online POD of Bio::Graphics::Panel (see
>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html
>>> )
>>>
>>> But it still does not work. Could you help to have a look? I paste it
>>> below. (BTW, the above page of POD, the -postgrid=>\&draw_gap, while the gap
>>> drawing function is gap_it, not draw_gap. I guess it's a typo. or not?)
>>>
>>> THanks
>>>
>>> Xianjun
>>>
>>> ----------------------------------------------- mytestcode.pl
>>> --------------------------
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use lib "$ENV{HOME}/lib";
>>>
>>> use Bio::Graphics;
>>> use Bio::Graphics::Feature;
>>> my $ftr= 'Bio::Graphics::Feature';
>>>
>>> # processed_transcript
>>> my $trans1 =
>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>> my $trans2 =
>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>> my $trans3 =
>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans4 =
>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>> -source=>'a');
>>> my $trans5 =
>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>> my $trans ?=
>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>
>>> # hightlight
>>> my $trans31 =
>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>> -source=>'a');
>>> my $trans41 =
>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>> -source=>'b');
>>>
>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-length=>1050,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-start =>0,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_left=>12,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-pad_right=>12
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-postgrid=>\&gap_it);
>>>
>>> sub gap_it {
>>> ? ?my $gd ? ?= shift;
>>> ? ?my $panel = shift;
>>> ? ?my ($gap_start,$gap_end) = $panel->location2pixel(500,600);
>>> ? ?my $top ? ? ? ? ? ? ? ? ?= $panel->top;
>>> ? ?my $bottom ? ? ? ? ? ? ? = $gd->height, #panel->bottom;
>>> ? ?my $gray ? ? ? ? ? ? ? ? = $panel->translate_color('red');
>>> ? ?$gd->filledRectangle($gap_start,$top,$gap_end,$bottom,$gray);
>>> }
>>> # the following track works as I expected in bioperl 1.2.3, but not in 1.5
>>> and 1.6
>>> #$panel->add_track([$trans41,$trans31],
>>> # ? ? ? ? ?-glyph ? => 'background',
>>> # ? ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>> 'a')?'#cccccc':'#fffc22'},
>>> # ? ? ? ? ? ? ? ? ?);
>>>
>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>> ? ? ? ? ? ? ? ? -glyph=>'arrow',
>>> ? ? ? ? ? ? ? ? -double=>1,
>>> ? ? ? ? ? ? ? ? -tick=>2);
>>>
>>> $panel->add_track($trans,
>>> ? ? ? ? -glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>> ? ? ? ? ? ? ? ? -fgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -bgcolor => 'darkred',
>>> ? ? ? ? ? ? ? ? -title => '$source',
>>> ? ? ? ? ? ? ? ? -link =>
>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name', ?#EnsEMBL
>>> ? ? ? ? ? ? ? ? );
>>> ?print $panel->png;
>>>
>>> # the following part works in bioperl 1.5 and 1.6, but not work in Bioperl
>>> 1.2.3
>>> my $map = $panel->create_web_map("image");
>>> $panel->finished();
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Scott Cain wrote:
>>>>
>>>> Hi Xianjun,
>>>>
>>>> I understand what you want to do, as the current version of gbrowse
>>>> does this, which uses bioperl 1.6. ?Without digging through the code,
>>>> I can't tell you exactly how this works and you didn't send your code
>>>> that uses this callback, so I can't try it either.
>>>>
>>>> One thing that is different between your code and gbrowse is that each
>>>> of the tracks is actually a seperate panel (to allow track dragging),
>>>> so it possible that this sort of callback doesn't work for
>>>> Bio::Graphics any more.
>>>>
>>>> Scott
>>>>
>>>> On Saturday, June 13, 2009, Xianjun Dong <Xianjun.Dong at bccs.uib.no>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi, Scott
>>>>>
>>>>> Thanks for your reply first.
>>>>>
>>>>> I still have question: I dig out the code from GBrowse (which I paste
>>>>> below). Method make_postgrid_callback gets all highlight region and then use
>>>>> hilite_regions_closure function to draw them out, using the following GD
>>>>> function:
>>>>>
>>>>> $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>>
>>>>> where the $bottom=$panel->bottom. This is the only difference from my
>>>>> code, where I use $gd->height. I guess they are almost same (except the
>>>>> pad_bottom), we can see this in the code of
>>>>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Graphics/Panel.html#CODE22
>>>>>
>>>>> OK. Anyway, I change to use $panel->bottom, instead of $gd->height, for
>>>>> my highlight regions. The output is same, when using the library of Bioperl
>>>>> 1.6 (or 1.5). You can see the attached image ("test.bioperl1.6.png")
>>>>>
>>>>> OK. I might have not explained my question explicitly. My question is:
>>>>> if using bioperl 1.2.3 (actually the Bio::Graphics in bioperl 1.2.3), I can
>>>>> get the right image I want (see the attached file "test.bioperl1.2.3.png"),
>>>>> where the highlight range will go from the roof to the floor. While in
>>>>> bioperl 1.5 (or 1.6), I only can see the highlight region in its own track,
>>>>> not the whole panel. OK, did I explain clearly now? you can see the
>>>>> difference of the two images.
>>>>>
>>>>> [I am not sure the mailist allow to attach image, otherwise, I put them
>>>>> in the following links:
>>>>> test.bioperl1.6.png: ? ?http://translog.genereg.net/test.bioperl1.6.png
>>>>> test.bioperl1.2.3.png:
>>>>> ?http://translog.genereg.net/test.bioperl1.2.3.png ]
>>>>>
>>>>> You can test it and see the difference if you have both 1.2.3 and 1.6 on
>>>>> your computer?
>>>>>
>>>>> Really want to know how this works in bioperl 1.2.3 (Even though this
>>>>> might be a bug at that version, or whatever)
>>>>>
>>>>> Thanks
>>>>>
>>>>> Xianjun
>>>>> =============================================
>>>>>
>>>>> # this generates the callback for highlighting a region
>>>>> sub make_postgrid_callback {
>>>>> ?my $settings = shift;
>>>>> ?return unless ref $settings->{h_region};
>>>>>
>>>>> ?my @h_regions = map {
>>>>> ? my ($h_ref,$h_start,$h_end,$h_color) =
>>>>> /^(.+):(\d+)\.\.(\d+)(?:@(\S+))?/;
>>>>> ? defined($h_ref) && $h_ref eq $settings->{ref}
>>>>> ? ? ? ? ? ? ? ?? [$h_start,$h_end,$h_color||'lightgrey']
>>>>> ? ? ? ? ? ? ? ?: ()
>>>>> ?}
>>>>> ? @{$settings->{h_region}};
>>>>>
>>>>> ?return unless @h_regions;
>>>>> ?return hilite_regions_closure(@h_regions);
>>>>> }
>>>>>
>>>>> # this subroutine generates a Bio::Graphics::Panel callback closure
>>>>> # suitable for hilighting a region of a panel.
>>>>> # The args are a list of [start,end,color]
>>>>> sub hilite_regions_closure {
>>>>> ?my @h_regions = @_;
>>>>>
>>>>> ?return sub {
>>>>> ? my $gd ? ? = shift;
>>>>> ? my $panel ?= shift;
>>>>> ? my $left ? = $panel->pad_left;
>>>>> ? my $top ? ?= $panel->top;
>>>>> ? my $bottom = $panel->bottom;
>>>>> ? for my $r (@h_regions) {
>>>>> ? ? my ($h_start,$h_end,$h_color) = @$r;
>>>>> ? ? my ($start,$end) = $panel->location2pixel($h_start,$h_end);
>>>>> ? ? if ($end-$start <= 1) { $end++; $start-- } # so that we always see
>>>>> something
>>>>> ? ? # assuming top is 0 so as to ignore top padding
>>>>> ? ? $gd->filledRectangle($left+$start,0,$left+$end,$bottom,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?$panel->translate_color($h_color));
>>>>> ? }
>>>>> ?};
>>>>> }
>>>>>
>>>>>
>>>>> Scott Cain wrote:
>>>>>
>>>>> Hello Xianjun,
>>>>>
>>>>> I don't think that approach will work. ?What you almost certainly need
>>>>> to do is a postgrid callback that does the drawing of the highlighted
>>>>> region. ?For example code of how to do this, take a look at the
>>>>> make_postgrid_callback subroutine in GBrowse 1.69. ?The option
>>>>> -postgrid is a method of Bio::Graphics::Panel.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 12, 2009 at 4:38 PM, Xianjun Dong<Xianjun.Dong at bccs.uib.no>
>>>>> wrote:
>>>>>
>>>>>
>>>>> HI,
>>>>>
>>>>> I am not sure this is the right place I can get help.
>>>>>
>>>>> I've suffered by a problem for several days: I want to highlight parts
>>>>> of
>>>>> regions in my track, using a different background color. To do that, I
>>>>> defined a glyph named "background", based on the
>>>>> 'Bio::Graphics::Glyph::generic' module. I override the draw_component()
>>>>> method, by adding code like below:
>>>>>
>>>>> $gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>>
>>>>> # the script is pasted at the end
>>>>>
>>>>> This will draw a rectangle with top=0, bottom=$gd->height. I made the
>>>>> highlight regions into a list of features, and add_track with
>>>>> -glyph=>'background'. (see the following script, test.pl) This really
>>>>> works
>>>>> as I expect, which will add a colored block at background of all tracks
>>>>> in a
>>>>> panel (including the ruler arrow). You can see the output image in
>>>>> attached
>>>>> file "test.bioperl1.2.3.png"
>>>>>
>>>>> Now, the problem comes: when I switch to Bioperl 1.5 (or 1.6), it does
>>>>> not
>>>>> work. Well, it works, but the highlight part only shrink to a low
>>>>> height,
>>>>> instead of covering all tracks in the panel. I also attached the output
>>>>> here, see the file "test.bioperl1.6.png".
>>>>>
>>>>> I tried to think about the reason, the 'background' module is based on
>>>>> the
>>>>> generic module. What can cause the difference? Is it because $gd->height
>>>>> is
>>>>> different, or the tracks followed with 'background' track can not draw
>>>>> from
>>>>> the first position?
>>>>>
>>>>> Well. I can stick to use Bioperl 1.2.3 to avoid the problem. ("Smart
>>>>> person
>>>>> solve problem, wise person avoid problem"...) But another problem is
>>>>> coming:
>>>>> Bio::Graphics in Bioperl 1.2.3 does not support $panel->create_web_map()
>>>>> function, which means I have to use some higher version if I want to
>>>>> create
>>>>> web map for my graphics, but then I have to give up using highlight
>>>>> background.
>>>>>
>>>>> OK. It's long enough for my first-time submission here. Hope someone can
>>>>> throw me some clue.
>>>>>
>>>>> Thanks ahead!!
>>>>>
>>>>> Xianjun
>>>>>
>>>>>
>>>>> ==================== test.pl =======================
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use strict;
>>>>> use lib "$ENV{HOME}/lib";
>>>>>
>>>>> use Bio::Graphics;
>>>>> use Bio::Graphics::Feature;
>>>>> my $ftr= 'Bio::Graphics::Feature';
>>>>>
>>>>> # processed_transcript
>>>>> my $trans1 =
>>>>> $ftr->new(-start=>50,-end=>10,-name=>'ZK154.1',-type=>"3'-UTR");
>>>>> my $trans2 =
>>>>> $ftr->new(-start=>100,-end=>50,-name=>'ZK154.2',-type=>'CDS');
>>>>> my $trans3 =
>>>>> $ftr->new(-start=>350,-end=>225,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans4 =
>>>>> $ftr->new(-start=>650,-end=>500,-name=>'ZK154.3',-type=>'CDS',
>>>>> -source=>'a');
>>>>> my $trans5 =
>>>>> $ftr->new(-start=>700,-end=>650,-name=>'ZK154.3',-type=>"5'-UTR");
>>>>> my $trans ?=
>>>>> $ftr->new(-segments=>[$trans1,$trans2,$trans3,$trans4,$trans5]);
>>>>>
>>>>> # hightlight
>>>>> my $trans31 =
>>>>>
>>>>> $ftr->new(-start=>240,-end=>450,-name=>'hightlight1',-type=>'background',
>>>>> -source=>'a');
>>>>> my $trans41 =
>>>>>
>>>>> $ftr->new(-start=>650,-end=>600,-name=>'hightlight2',-type=>'multihourglass',
>>>>> -source=>'b');
>>>>>
>>>>> my $panel= Bio::Graphics::Panel->new(-width=>1200,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -length=>1050,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -start =>0,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_left=>12,
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -pad_right=>12);
>>>>>
>>>>> # the following track works as I expected in bioperl 1.2.3, but not in
>>>>> 1.5
>>>>> and 1.6
>>>>> $panel->add_track([$trans41,$trans31],
>>>>> ? ? ? ?-glyph ? => 'background',
>>>>> ? ? ? ? ? ? ? ?-block_bgcolor => sub{return (shift->source eq
>>>>> 'a')?'#cccccc':'#fffc22'},
>>>>> ? ? ? ? ? ? ? ?);
>>>>>
>>>>> $panel->add_track($ftr->new(-start=>100,-end=>1000),
>>>>> ? ? ? ? ? ? ? ?-glyph=>'arrow',
>>>>> ? ? ? ? ? ? ? ?-double=>1,
>>>>> ? ? ? ? ? ? ? ?-tick=>2);
>>>>>
>>>>> $panel->add_track($trans,
>>>>> ? ? ? ?-glyph ? => 'transcript2', # 'transcript2', #process_5utr',
>>>>> ? ? ? ? ? ? ? ?-fgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-bgcolor => 'darkred',
>>>>> ? ? ? ? ? ? ? ?-title => '$source',
>>>>> ? ? ? ? ? ? ? ?-link =>
>>>>> 'http://www.ensembl.org/Homo_sapiens/transview?transcript=$name',
>>>>> ?#EnsEMBL
>>>>> ? ? ? ? ? ? ? ?);
>>>>> ?print $panel->png;
>>>>>
>>>>> # the following part works in bioperl 1.5 and 1.6, but not work in
>>>>> Bioperl
>>>>> 1.2.3
>>>>> my $map = $panel->create_web_map("image");
>>>>> $panel->finished();
>>>>>
>>>>> 1;
>>>>>
>>>>> ==================== background.pm =======================
>>>>> package Bio::Graphics::Glyph::background;
>>>>>
>>>>> use strict;
>>>>> use base 'Bio::Graphics::Glyph::generic';
>>>>> sub pad_top{
>>>>> ?return 0;
>>>>> }
>>>>>
>>>>> sub draw_component {
>>>>> ?my $self = shift;
>>>>> ?#$self->SUPER::draw_component(@_);
>>>>> ?my ($gd,$dx,$dy) = @_;
>>>>> ?my ($left,$top,$right,$bottom) = $self->bounds($dx,$dy);
>>>>>
>>>>> ?# draw an arrow to indicate the direction of transcript
>>>>> ?my $color = $self->option('block_bgcolor') || '#cccccc';
>>>>> ?$gd->filledRectangle($left,0,$right,$gd->height,
>>>>> $self->factory->translate_color($color));
>>>>> }
>>>>>
>>>>> 1;
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ==========================================
>>>>> Xianjun Dong
>>>>> PhD student, Lenhard group
>>>>> Computational Biology Unit
>>>>> Bergen Center for Computational Science
>>>>> University of Bergen
>>>>> Hoyteknologisenteret, Thormohlensgate 55
>>>>> N-5008 Bergen, Norway
>>>>> E-mail: xianjun.dong at bccs.uib.no
>>>>> Tel.: +47 555 84022
>>>>> Fax : +47 555 84295
>>>>> ==========================================
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> ==========================================
>> Xianjun Dong
>> PhD student, Lenhard group
>> Computational Biology Unit
>> Bergen Center for Computational Science
>> University of Bergen
>> Hoyteknologisenteret, Thormohlensgate 55
>> N-5008 Bergen, Norway
>> E-mail: xianjun.dong at bccs.uib.no
>> Tel.: +47 555 84022
>> Fax : +47 555 84295
>> ==========================================
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) ? ? ? ? ? ? ? ? ? ? 216-392-3087
> Ontario Institute for Cancer Research
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From roy.chaudhuri at gmail.com  Fri Jun 19 10:34:24 2009
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Fri, 19 Jun 2009 11:34:24 +0100
Subject: [Bioperl-l] Problems parsing scientific name from a Genbank file
In-Reply-To: <24095355.post@talk.nabble.com>
References: <24095355.post@talk.nabble.com>
Message-ID: <4A3B69B0.8080305@gmail.com>

Hi Cesar,

I can replicate this using an old Bioperl (version 1.5.2), but it 
appears to be fixed in version 1.6 and bioperl-live - the 
scientific_name method returns "Bacillus anthracis str. Sterne".

Hope this helps.
Roy.

Cesar Arze wrote:
> Hi all,
>    I've searched through the mailing list and bug-tracker looking for any
> indication of this (what I presume to be) bug I have been encountering when
> parsing certain Genbank files using SeqIO::GenBank but have yet to find
> anything. I apologize in advance if this is something that has already been
> addressed.
> 
> When parsing these files and extracting the scientific name it seems that
> line breaks are causing the lineage info found in the ORGANISM section to be
> captured as part of the scientific name. An example of this is accession
> NC_005945:
> 
>   ORGANISM  Bacillus anthracis str. Sterne
>             Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus;
> Bacillus
>             cereus group.
> 
> Bacillus cereus has a line break which then causes scientific name to
> capture "Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus"
> ending up with Bacillus anthracis str. Sterne Bacteria; Firmicutes;
> Bacillales; Bacillaceae; Bacillus; Bacillus" as the final scientific name.
> 
> Not sure if anyone has ever ran into this problem but I would very much
> appreciate any help or direction.


From cjfields at illinois.edu  Fri Jun 19 20:57:36 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 19 Jun 2009 15:57:36 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
Message-ID: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>

So, to follow up (and make sure we don't have any overlapping tuits)  
we should probably determine who wants to work on what (i.e. fastq  
updating, etc). I think it's possible to quickly add in Solexa/ 
Illumina/Sanger fastq similar to BioPython, just don't want to step on  
anyone's toes if they are halfway through doing this.

chris

On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:

> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for  
>>> specific memory/CPU intenstive things we also switch to using dbs.  
>>> For example, after many years of loyal use of disconnected_ranges  
>>> we switched to a simple SQL implementation of it, because of the  
>>> large performance gains it would give us.  Similarly in Ensembl as  
>>> well as in the old days of bioperl-db we opted for doing subseq  
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less  
>>> "disruptive" than avoiding object creation and provide significant  
>>> gains in performance. Could be set as an optional flag, and could  
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those  
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>> significant set of users out there who are dealing with next-gen  
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who  
>>>> would like to integrate nextgen into bioperl somehow, probably to  
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>> Would it be possible to have an ultra-light quality object with  
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>> return the data directly. At that point it's not taking much  
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up  
>>>> with a set of 'use cases' specifying what users want so we can  
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence  
>>>> database, something similar in theme to Lincoln's  
>>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Sat Jun 20 08:46:31 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 20 Jun 2009 09:46:31 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906200146t547a0492r23d5f123e01098e8@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields<cjfields at illinois.edu> wrote:
>
>
> On Jun 17, 2009, at 8:25 AM, Peter wrote:
>
>>> Peter's suggestions also are reasonable, though does biopython have a
>>> separate module for each of these variations? ?Our version (I believe)
>>> mainly varied the conversion within Bio::SeqIO::fastq itself based on the
>>> fastq variant passed in as a separate named argument.
>>
>> Biopython's SeqIO gives the three FASTQ variants their own unique
>> names. This format name is a required argument for parsing/writing
>> (we don't try and guess the file format from the data contents).
>> Internally we have three separate FASTQ parsers/writers although
>> they do share code.
>
> We could easily do the same if others agree. ?Actually, if we specified that
> shorthand for a variant on a format would be designated as -format =>
> 'format-variant', I think we could easily hack SeqIO to deal with that by
> splitting on '-' and passing everything to the constructor as (-format =>
> 'format', -variant => 'variant'). ?Very little repeated code in this case,
> just an additional named parameter indicating the format variant (and the
> SeqIO class can do the type checking on that within the constructor).

Yes, when I started using names like "fastq-solexa" I did have in mind
"main-variant" naming convention, and potentially Biopython may one
day actually use this structure when allocating a Bio.SeqIO job to the
appropriate parser or writer.

For now, the Biopython list of formats is fairly short (and there are
relatively few of these sub-formats) so to keep things simple we just
have a flat mapping from the format name (e.g. "fasta", "fastq",
"fastq-solexa") to the parser/write code.

Peter


From e.stupka at ucl.ac.uk  Sat Jun 20 20:12:18 2009
From: e.stupka at ucl.ac.uk (Elia Stupka)
Date: Sat, 20 Jun 2009 21:12:18 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<4A3933D0.4040808@sendu.me.uk>
	<8E010693-30B2-4E54-81CA-97755FC6CAE5@illinois.edu>
	<0500F8EB-F3BD-44E8-801A-7A45783DA40F@ucl.ac.uk>
	<3C32922E-690B-4246-BD44-7600153E57C7@illinois.edu>
	<69C726B6-41DC-41EC-9BC9-DFEC0267CD3B@ucl.ac.uk>
	<E53B7AAC-9128-4637-AB0F-AC6C0C195D01@illinois.edu>
Message-ID: <F99E2F7F-05F7-462B-A3ED-96E09746994B@ucl.ac.uk>

Hi Chris,

I agree. I have not written a single line of code so far, while Heikki  
has some (but has been silent for a while) and you have perhaps some  
code ready to roll. I am happy to help where needed, just let me know  
what you'd like me to focus on. If you want to go ahead and implement  
the fastq staff discussed I can focus on bioperl-run.

cheers

Elia


On 19 Jun 2009, at 21:57, Chris Fields wrote:

> So, to follow up (and make sure we don't have any overlapping tuits)  
> we should probably determine who wants to work on what (i.e. fastq  
> updating, etc). I think it's possible to quickly add in Solexa/ 
> Illumina/Sanger fastq similar to BioPython, just don't want to step  
> on anyone's toes if they are halfway through doing this.
>
> chris
>
> On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
>
>> Better than colorspaced discussions for sure ;)
>>
>> Elia
>>
>> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>>
>>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>>> other options.
>>>
>>> Illuminating discussion, thanks Elia!
>>>
>>> urgh, excuse unintended bad pun above...
>>>
>>> chris
>>>
>>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>>
>>>> Interesting that you mention the database issue. We found that  
>>>> for specific memory/CPU intenstive things we also switch to using  
>>>> dbs. For example, after many years of loyal use of  
>>>> disconnected_ranges we switched to a simple SQL implementation of  
>>>> it, because of the large performance gains it would give us.   
>>>> Similarly in Ensembl as well as in the old days of bioperl-db we  
>>>> opted for doing subseq within SQL where possible.
>>>>
>>>> Some lean way of SQL'izing specific components could be less  
>>>> "disruptive" than avoiding object creation and provide  
>>>> significant gains in performance. Could be set as an optional  
>>>> flag, and could use temporary ad hoc SQL databases?
>>>>
>>>> Still, priority now is to make SeqIO compliant with all those  
>>>> formats, than we can worry about performance :)
>>>>
>>>> Elia
>>>>
>>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>>
>>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>>
>>>>>> Tristan Lefebure wrote:
>>>>>>> Hello,
>>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>>> experience, another issue is bioperl speed. For example, if  
>>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>>> shortcuts...).
>>>>>>
>>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>>> significant set of users out there who are dealing with next- 
>>>>>> gen sequencing and would consider using BioPerl for their work?
>>>>>>
>>>>>> I'm working with all the 1000-genomes data at the Sanger, and  
>>>>>> we at least are probably never going to use BioPerl for the work.
>>>>>
>>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>>
>>>>> Judging by the feedback there are definitely a set of users who  
>>>>> would like to integrate nextgen into bioperl somehow, probably  
>>>>> to take advantage of other aspects of bioperl.
>>>>>
>>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>>> Would it be possible to have an ultra-light quality object  
>>>>>>> with few simple methods for next-gen reads?
>>>>>>
>>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>>> return the data directly. At that point it's not taking much  
>>>>>> advantage of BioPerl. But certainly it could be done...
>>>>>
>>>>>
>>>>> I suppose the best way to assess what needs to be done is come  
>>>>> up with a set of 'use cases' specifying what users want so we  
>>>>> can design around them, otherwise we're shooting in the dark.
>>>>>
>>>>> I'm personally wondering if this could be done as a sequence  
>>>>> database, something similar in theme to Lincoln's  
>>>>> SeqFeature::Store, but sequence only, and returns quality  
>>>>> objects in a similar manner (ala Storable)?  Not sure whether  
>>>>> that's feasible, but it's appears at least scalable.
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> ---
>>>> Senior Lecturer, Bioinformatics
>>>> UCL Cancer Institute
>>>> Paul O' Gorman Building
>>>> University College London
>>>> Gower Street
>>>> WC1E 6BT
>>>> London
>>>> UK
>>>>
>>>> Office (UCL): +44 207 679 6493
>>>> Office (ICMS): +44 0207 8822374
>>>>
>>>> Mobile: +44 7597 566 194
>>>> Mobile (Italy): +39 338 8448801
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801


From lincoln.stein at gmail.com  Sat Jun 20 21:01:43 2009
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Sat, 20 Jun 2009 17:01:43 -0400
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
Message-ID: <6dce9a0b0906201401j40175dbdscd71360396fe9f7a@mail.gmail.com>

Hi All,

Apropos of this, I am about to release to CPAN a BioPerl interface to SAM
and BAM files. The documentation is still in progress, but you can get CVS
access here:

% cvs -d :pserver:anonymous at gmod.cvs.sourceforge.net:/cvsroot/gmod co
gbrowse-adaptors/Bio-SamTools

Lincoln

On Wed, Jun 17, 2009 at 7:29 AM, Elia Stupka <e.stupka at ucl.ac.uk> wrote:

> Dear all,
>
> after several years of absence I am slowly coming back to Bioperl, and hope
> to contribute again to its development.
>
> One area that I was thinking of starting from, since we are actively
> involved with it, is to improve BIoperl's support fo next-gen sequencing
> data, tools, etc. Since I am sure I have missed out on a lot of recent
> developments, do let me know if/what is useful.
>
> One example that comes to mind is that the conversion of various formats
> to/from FASTQ does not seem to be supported. Some code can be found within
> Li Heng's script: http://maq.sourceforge.net/fq_all2std.pl but it would be
> good if it could make its way into SeqIO? And similarly, potentially, for
> other next-gen sequence formats?
>
> Similarly, there seems to be little in bioperl-run to support tools that
> have been developed in this area, such as Maq, BowTie, TopHat, etc?
>
> Do let me know if there is a past thread on this, or other people actively
> developing, etc. so that I can find out what priorities are.
>
> thanks and best regards to all (old friends and new),
>
> Elia
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <Renata.Musa at oicr.on.ca>


From hartzell at alerce.com  Mon Jun 22 13:18:20 2009
From: hartzell at alerce.com (George Hartzell)
Date: Mon, 22 Jun 2009 06:18:20 -0700
Subject: [Bioperl-l] Anyone at YAPC?
Message-ID: <19007.33948.411442.197063@already.dhcp.gene.com>


I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.

g.


From cjfields1 at gmail.com  Mon Jun 22 14:05:56 2009
From: cjfields1 at gmail.com (Chris Fields)
Date: Mon, 22 Jun 2009 09:05:56 -0500
Subject: [Bioperl-l] changing parameters in Bio::Tools::Run::RemoteBlast
In-Reply-To: <F52FFB80A7304749B467C46E10A2869D@jonas>
References: <F52FFB80A7304749B467C46E10A2869D@jonas>
Message-ID: <67ABC7E3-216E-4F5A-B18E-A775A6B4D8F7@gmail.com>

Jonas,

The best place to send questions is to the mail list (which I've  
cc'd).  If you reply make sure to keep the mail list in the reply-to.

There are two ways to set the parameters you want.  I'll show you what  
I consider the best, but I have no way to test it ATM.

$factory->submit_parameter($foo => 'bar')

is the syntax for setting PUT parameters.  Sad to see they didn't  
provide you with the exact PUT parameter names (as follows):

Max target sequences = 100 # MAX_NUM_SEQ
Expect threshold = 10  # EXPECT
Gap Costs = Existence 11 Extension 1   # GAPCOSTS
Compositional adjustments = Conditional compositional score matrix  
adjustment # COMPOSITION_BASED_STATISTICS

'Compositional adjustments' is as follows (from command-line blastall):

   -C  Use composition-based score adjustments for blastp or tblastn:
       As first character:
       D or d: default (equivalent to T)
       0 or F or f: no composition-based statistics
       2 or T or t: Composition-based score adjustments as in  
Bioinformatics 21:902-911,
       1: Composition-based statistics as in NAR 29:2994-3005, 2001
           2005, conditioned on sequence properties
       3: Composition-based score adjustment as in Bioinformatics  
21:902-911,
           2005, unconditionally
       For programs other than tblastn, must either be absent or be D,  
F or 0.
            As second character, if first character is equivalent to  
1, 2, or 3:

After the factory line and prior to the BLAST call you can add in the  
following (completely untested, excuse any possible mistakes) code:

my %put = (
    MAX_NUM_SEQ => 100,
    EXPECT      => 10,
    GAPCOSTS    => '11 1',
    COMPOSITION_BASED_STATISTICS => 2 # could be 1 as well
);

for my $putName (keys %put) {
    $self->submit_parameter($putName,$put{$putName});
}


chris

On Jun 22, 2009, at 8:14 AM, Jonas Schaer wrote:

> Hi there,
> I hope it's OK to ask you a question about the bio perl module   
> Bio::Tools::Run::RemoteBlast.
> My problem is, that I get different results using this perl-skript:
>
> #######################################################################################################################################################################################
>  use Bio::Seq::SeqFactory;
>  use Bio::Tools::Run::RemoteBlast;
>  use strict;
>  my @blast_report;
>  my $prog = 'blastp';
>  my $db   = 'nr';
>  my $e_val= '1e-10';
>  my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO' );
>  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>  #my $input = @_;
>  my  
> $ 
> blast_seq 
> = 
> 'MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE 
> ';
>  #$v is just to turn on and off the messages
>  my $v = 1;
>  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' =>  
> 'Bio::PrimarySeq');
>  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id =>  
> "$blast_seq");
>  my $filename='temp2.out';
>  my $r = $factory->submit_blast($seq);
>  print STDERR "waiting..." if( $v > 0 );
>    while ( my @rids = $factory->each_rid )
>    {
>        foreach my $rid ( @rids )
>        {
>            my $rc = $factory->retrieve_blast($rid);
>            if( !ref($rc) )
>            {
>                if( $rc < 0 )
>                {
>                    $factory->remove_rid($rid);
>                }
>                print STDERR "." if ( $v > 0 );
>            }
>                else
>                {
>                    my $result = $rc->next_result();
>                    $factory->save_output($filename);
>                    $factory->remove_rid($rid);
>                    print "\nQuery Name: ", $result->query_name(),  
> "\n";
>                    while ( my $hit = $result->next_hit )
>                    {
>                        next unless ( $v > 0);
>                        print "\thit name is ", $hit->name, "\n";
>                        while( my $hsp = $hit->next_hsp )
>                        {
>                            print "\t\tscore is ", $hsp->score, "\n";
>                        }
>                    }
>                }
>        }
>
>
>    }
> @blast_report = get_file_data ($filename);
> return @blast_report;
>
>
> sub get_file_data
> {
>    use strict;
>    my($filename) = @_;
>    use strict;
>    use warnings;
>    # Initialize variables
>    my @filedata = ( );
>    unless( open(GET_FILE_DATA, $filename) )
>    {
>        print STDERR "Cannot open file \"$filename\"\n\n";
>        exit;
>    }
>    @filedata = <GET_FILE_DATA>;
>    close GET_FILE_DATA;
>    print @filedata;
>    return @filedata;
> }
>
> #######################################################################################################################################################################################
>
> ... and the blastp on the ncbi-homepage. The people from NCBI wrote  
> me that I have to change some parameters:
> ""
> You need to have the following:
>
>
> Max target sequences = 100
> Expect threshold = 10
> Gap Costs = Existence 11 Extension 1
> Compositional adjustments = Conditional compositional score matrix  
> adjustment""
>
> Could you please tell me exactly how to change this parameters  
> within my perl-skript? I think I have to use the "put" command, but  
> I just cannot find out, how...
>
> Regards and thank you so much in advance :),
>
> Jonas Schaer


From biopython at maubp.freeserve.co.uk  Mon Jun 22 14:24:55 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Jun 2009 15:24:55 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
Message-ID: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>

On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
> Peter wrote:
>> Other issues to keep in mind:
>>
>> (3) There should be no warning parsing files where the optional repeated
>> title is missing on the "+" lines (as discussed earlier on the BioPerl
>> list).
>
> Agreed, though we'll have to check the current fastq parser to see if that's
> currently the case. ?I thought that was fixed but maybe not?
>
>> (4) When writing FASTQ files should BioPerl omit the optional repeated
>> title on the "+" line? Biopython omits this as I understand this to be
>> common practice, and can make a big different to file sizes - especially
>> on short read data from Solexa/Illumina.
>
> Agreed, particularly if it's commonly encountered.
>
>> (5) Also test reading and writing files with an optional description (as
>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>> for examples, e.g.
>>
>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>
> Should be easy enough to implement with a simple regex.
>
>> (6) Test reading and writing files where the encoded quality string starts
>> with a "@" or a "+" character, e.g.
>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>
>> Peter
>
> Mark, getting all that? ;>
>
> chris

Another couple of points that I should have remembered earlier,
related to converting between PHRED scores and Solexa scores.
On the bright side, with Illumina abandoning the Solexa scores
in pipeline 1.3+, these issues will go away with time:

(7) If BioPerl will be converting Solexa scores to/from PHRED
scores as integers automatically (as discussed earlier), make
sure you round to the nearest whole number (don't just truncate
with a call to int!). MAQ does this by adding 0.5 before calling
int (while in Biopython I just use Python's round function).

(8) When asked to write out an old Solexa style FASTQ file,
what will you do if given a standard Sanger FASTQ file (or a
new Illumina 1.3+ FASTQ file) containing a base with PHRED
quality zero? This maps to a Solexa quality of minus infinity...
Right now the development version of Biopython will throw an
error in this situation, but mapping to the lowest observed
Solexa score might be reasonable.

Peter


From cjfields at illinois.edu  Mon Jun 22 13:54:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 08:54:22 -0500
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <19007.33948.411442.197063@already.dhcp.gene.com>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
Message-ID: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>

I think some of the regular #bioperl folk are there (Jay Hannah, R.  
Buels, etc).  May be worth going on IRC to find everyone.

I'm giving serious thought to going next year if I can get enough work  
done towards a perl6 or Moose-based bioperl.

chris

On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:

>
> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>
> g.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From vofford at rvc.ac.uk  Mon Jun 22 16:10:43 2009
From: vofford at rvc.ac.uk (Offord, Victoria)
Date: Mon, 22 Jun 2009 17:10:43 +0100
Subject: [Bioperl-l] Clustalw
Message-ID: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>

Hi,

 
Can anyone help and tell me where I am going wrong please J 

I am getting this error from the following script:

 
------------- EXCEPTION: Bio::Root::Exception -------------

MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
-output=gcg   -matrix=BLOSUM -ktuple=2
-outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
file or directory

STACK: Error::throw

STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357

STACK: Bio::Tools::Run::Alignment::Clustalw::_run
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756

STACK: Bio::Tools::Run::Alignment::Clustalw::align
/usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515

STACK: tester.pl:25

-----------------------------------------------------------

 
#--------------------------------------------SCRIPT---------------------
--------------------------#

#!/usr/bin/perl -w

use Bio::Tools::Run::Alignment::Clustalw;

$ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';

use Bio::Seq;

 
 my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');

 my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);

 
my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";

my $b =
"NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";

my $seq1 = Bio::Seq->new ( -seq  => $a,

                           -id   => 'real',

                           -desc => 'this is a real Seq');

 my $seq2 = Bio::Seq->new ( -seq  => $b,

                           -id   => 'test',

                           -desc => 'this is a test Seq');


my @seq_array = ($seq1,$seq2);

 
my $seq_array_ref = \@seq_array;

my $aln = $factory->align($seq_array_ref);

 
From Kevin.M.Brown at asu.edu  Mon Jun 22 16:48:27 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 22 Jun 2009 09:48:27 -0700
Subject: [Bioperl-l] Clustalw
In-Reply-To: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
References: <C79C4DD076C6834EB10EB0CA1F50635513109AF5@hhw2kex01.rvc.ac.uk>
Message-ID: <1A4207F8295607498283FE9E93B775B4060B9BAF@EX02.asurite.ad.asu.edu>

Do you have ClustalW installed and in your path? 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Offord, Victoria
> Sent: Monday, June 22, 2009 9:11 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Clustalw
> 
> Hi,
> 
>  
> 
> Can anyone help and tell me where I am going wrong please J 
> 
> I am getting this error from the following script:
> 
>  
> 
>  
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> 
> MSG: ClustalW call (clustalw align  -infile=/tmp/8PVli9JWEa/L_pxrEtzD1
> -output=gcg   -matrix=BLOSUM -ktuple=2
> -outfile=/tmp/8PVli9JWEa/XtAremlqau 2>&1) failed to start: 0 | No such
> file or directory
> 
> STACK: Error::throw
> 
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::_run
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:756
> 
> STACK: Bio::Tools::Run::Alignment::Clustalw::align
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:515
> 
> STACK: tester.pl:25
> 
> -----------------------------------------------------------
> 
>  
> 
>  
> 
>  
> 
>  
> 
> #--------------------------------------------SCRIPT-----------
> ----------
> --------------------------#
> 
> #!/usr/bin/perl -w
> 
> use Bio::Tools::Run::Alignment::Clustalw;
> 
> $ENV{CLUSTALDIR} = '/var/local/clustalw-2.0.9';
> 
> use Bio::Seq;
> 
>  
> 
>  my @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
> 
>  my $factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
> 
>  
> 
> my $a = "NPFECDCSMEWMQRVNNLTARQHPKILDLPNVECIMPHARGTPIRPIISLKPKDFLCK";
> 
> my $b =
> "NPFECDCSMEWLQRINNLTTRQHPHVVDLGNIECLMPHSRSAPLRPLASLSASDFVCKYESHCPP";
> 
> my $seq1 = Bio::Seq->new ( -seq  => $a,
> 
>                            -id   => 'real',
> 
>                            -desc => 'this is a real Seq');
> 
>  my $seq2 = Bio::Seq->new ( -seq  => $b,
> 
>                            -id   => 'test',
> 
>                            -desc => 'this is a test Seq');
> 
> 
>                            
> 
> my @seq_array = ($seq1,$seq2);
> 
>  
> 
> my $seq_array_ref = \@seq_array;
> 
> my $aln = $factory->align($seq_array_ref);
> 
>  
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jun 22 19:20:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 14:20:14 -0500
Subject: [Bioperl-l] bioperl-dev or branch? : redux
In-Reply-To: <6DF025D32D664F61BC64B49184A2E6DD@NewLife>
References: <991fb8210905210826v2a7990c0u90fcb3256f54b7d7@mail.gmail.com>
	<D719EC86-CB16-4875-833B-2818C26030C8@duke.edu>
	<6DF025D32D664F61BC64B49184A2E6DD@NewLife>
Message-ID: <4766E259-B184-4552-817E-FBBB3A71A17F@illinois.edu>

On Jun 17, 2009, at 11:47 AM, Mark A. Jensen wrote:

> Hi All,
> I thought I'd revisit this thread, since in the last couple weeks,
> have used both techniques (bioperl-dev and branch from trunk) to
> produce completed projects. My thoughts:
>
> Using bioperl-dev was very nice for creating Bio::Search::Tiling, a
> new addition to the core api. There was no pressure to conform to the
> existing api there. In particular, there was no implicit insistence to
> make things work through Bio::Search::Utils, and I was free to factor
> it out. The Tiling api was definitely unstable until the end, when it
> was ported to the core. As I made regular reports to bioperl-l,
> everything was transparent and up front, and I received excellent
> suggestions there (as usual).
> For Bio::Restriction, using the branch was just as natural. Here, the
> existing structure was well established, and all the work needed to
> happen beneath the api. All old t/Restriction tests needed to pass,
> and additional ones created for the new functionality. So here, using
> bioperl-dev wasn't natural, even though some "experiments" needed to
> be tried (some succeeded and some failed, as you can see in the
> commentary at Bug #2855). Even though the new code turned out to
> require substantial effort, the effort was required to fix a true bug
> in the working core, and any fixes needed to work transparently with
> respect to the users for whom this bug had not been an issue. Using
> the branch made it relatively easy to merge quickly back into the core
> when done, and there is a certain psychological pressure too provided
> by an open branch which is helpful.
>
> Hilmar raised the very good point in the previous discussion that
> (essentially) bioperl-dev shouldn't become a sandbox with lots of
> unfinished code scraps and derelict stuff that doesn't work. My view
> is bioperl-dev will become a sandbox only if we treat it like
> one. I've filled out the Bioperl-dev page on the wiki
> (http://www.bioperl.org/wiki/Bioperl-dev) with this in mind. Providing
> some recognition to devs there whose modules become part of the
> core may be a better way to insure that projects that are started on
> bioperl-dev actually get finished, than to prescribe beforehand what
> kinds of projects may get started. I believe this follows the adage of
> liberality on what is accepted, and strictness on what is emitted.
>
> cheers, MAJ

The main reason I wanted a bioperl-dev is for some code or  
implementations that don't seem to fit on a branch or directly into  
core, but would definitely be of use.  The tendency in the past has  
been to accept anything that works into core (the 'bazaar' approach).   
Initially that worked well, but the long-term end result has become  
potentially unmaintainable code bloat.  Committing new code to a  
branch isn't a great idea either, primarily b/c the code may be lost  
to the branch if it isn't followed up and remerged into trunk.  And  
forcing the code to fit into bioperl (or vice versa, which happened  
re: Feature Annotation) isn't the best way either.

Like Hilmar, though, I don't want dev to become a (sandbox|code  
dumping ground) either, so I think some additional discussion is  
warranted if anyone else wants to chime in.

chris


From mauricio at open-bio.org  Mon Jun 22 19:56:33 2009
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Mon, 22 Jun 2009 14:56:33 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <A53006055C854297AAA58F6650F4F867@NewLife>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
Message-ID: <4A3FE1F1.40607@open-bio.org>

Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 
release and latest code from bioperl-live. Also added bioperl-dev and 
bioperl-pise to the list.

Cheers,
Mauricio.


Mark A. Jensen wrote:
> cheers Mauricio! MAJ
> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
> <mauricio at open-bio.org>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
> <bioperl-l at bioperl.org>
> Sent: Thursday, June 11, 2009 12:46 PM
> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
> 
> 
>> Hi Mark,
>>
>> I'll take a look into this sometime between today and tomorrow. Will 
>> keep you posted. Thanks for the heads up :)
>>
>> Mauricio.
>>
>>
>> Mark A. Jensen wrote:
>>> Hi Chris and list-
>>> Will documentation for release 1.6 be available in pdoc on 
>>> doc.bioperl.org?
>>> I notice also that autogenerated documentation for bioperl-live 
>>> doesn't contain
>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>> cheers, Mark
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> 
> 


From cjfields at illinois.edu  Mon Jun 22 20:29:46 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:29:46 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
Message-ID: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>

On Jun 22, 2009, at 9:24 AM, Peter wrote:

> On Wed, Jun 17, 2009 at 6:06 PM, Chris Fields wrote:
>> Peter wrote:
>>> Other issues to keep in mind:
>>>
>>> (3) There should be no warning parsing files where the optional  
>>> repeated
>>> title is missing on the "+" lines (as discussed earlier on the  
>>> BioPerl
>>> list).
>>
>> Agreed, though we'll have to check the current fastq parser to see  
>> if that's
>> currently the case.  I thought that was fixed but maybe not?
>>
>>> (4) When writing FASTQ files should BioPerl omit the optional  
>>> repeated
>>> title on the "+" line? Biopython omits this as I understand this  
>>> to be
>>> common practice, and can make a big different to file sizes -  
>>> especially
>>> on short read data from Solexa/Illumina.
>>
>> Agreed, particularly if it's commonly encountered.
>>
>>> (5) Also test reading and writing files with an optional  
>>> description (as
>>> well as an identifier) on the "@" (and "+") lines. See the NCBI SRA
>>> for examples, e.g.
>>>
>>> @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
>>> +SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
>>> IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
>>
>> Should be easy enough to implement with a simple regex.
>>
>>> (6) Test reading and writing files where the encoded quality  
>>> string starts
>>> with a "@" or a "+" character, e.g.
>>> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/029911.html
>>>
>>> Peter
>>
>> Mark, getting all that? ;>
>>
>> chris
>
> Another couple of points that I should have remembered earlier,
> related to converting between PHRED scores and Solexa scores.
> On the bright side, with Illumina abandoning the Solexa scores
> in pipeline 1.3+, these issues will go away with time:
>
> (7) If BioPerl will be converting Solexa scores to/from PHRED
> scores as integers automatically (as discussed earlier), make
> sure you round to the nearest whole number (don't just truncate
> with a call to int!). MAQ does this by adding 0.5 before calling
> int (while in Biopython I just use Python's round function).

That can probably be done with sprintf if needed.  It avoids a call to  
POSIX functions.

> (8) When asked to write out an old Solexa style FASTQ file,
> what will you do if given a standard Sanger FASTQ file (or a
> new Illumina 1.3+ FASTQ file) containing a base with PHRED
> quality zero? This maps to a Solexa quality of minus infinity...
> Right now the development version of Biopython will throw an
> error in this situation, but mapping to the lowest observed
> Solexa score might be reasonable.
>
> Peter

Maybe address with a warning followed by assigning to the lowest  
solexa score?

chris


From cjfields at illinois.edu  Mon Jun 22 20:27:32 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 22 Jun 2009 15:27:32 -0500
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife>
	<4A3134EB.4080702@open-bio.org>
	<A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <D9414186-E1DD-47B5-A0CF-9B96CD8151F8@illinois.edu>

np.  Thanks Mauricio!

chris

On Jun 22, 2009, at 2:56 PM, Mauricio Herrera Cuadra wrote:

> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0  
> release and latest code from bioperl-live. Also added bioperl-dev  
> and bioperl-pise to the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org 
>> >
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" <bioperl-l at bioperl.org 
>> >
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow.  
>>> Will keep you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on  
>>>> doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live  
>>>> doesn't contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jun 23 02:46:58 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 22:46:58 -0400
Subject: [Bioperl-l] announcing bioperl-max, a public AMI
In-Reply-To: <3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
References: <57C9F3FC7A4A42189608BD882C03DB9C@NewLife>
	<3F9CAC33-EE29-4843-8E21-532DC7569A7D@illinois.edu>
Message-ID: <78130116A84C4D989F3BCC217E8C5ACE@NewLife>

Done-- fortinbras-public/bioperl-max-0.1.1 is at ami-b55dbbdc; rakudo cloned at 
00:44 UTC,
parrot @ r39729, bioperl-live @ 15800, nexml @ r1136.
cheers!
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Wednesday, June 10, 2009 12:36 AM
Subject: Re: [Bioperl-l] announcing bioperl-max, a public AMI


> I'll be trying that out, particularly re: bioperl-run. For bioperl-db  do you 
> have mysql or pg?
>
> Heh, I see Moose is installed.  Just need svn'd parrot and git updated  rakudo 
> and we could do some damage...
>
> chris
>
> On Jun 9, 2009, at 11:10 PM, Mark A. Jensen wrote:
>
>> Hi All,
>>
>> I've built a public Amazon machine image, loaded with many many
>> goodies, including the most recent (r15747) trunks of
>> - bioperl-live
>> - bioperl-run
>> - bioperl-db/biosql
>> The base machine is a public clean install of Ubuntu 8.03 Hardy/32-bit
>> by Alestic. Many fave tools, including blast, hmmer, hyphy, phyml,
>> emboss, and more are all there (and most even pass bioperl-run  tests), and
>> perl is loaded with Moose, XML::LibXML, XML::Compile, Bio::Phylo
>> (r1071) and others. This is *not* a lean mean fighting machine.
>>
>> Please give it a try if you're so inclined. Fuller details (including
>> image id and ssh-rsa signature) are at http://fortinbras.us/bioperl-max .
>>
>> Ping me if it doesn't work.
>>
>> Cheers,
>> Mark
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Tue Jun 23 03:22:48 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 22 Jun 2009 23:22:48 -0400
Subject: [Bioperl-l] 1.6 on doc.bioperl.org?
In-Reply-To: <4A3FE1F1.40607@open-bio.org>
References: <17AD00895AFD43E1A1436D1065092BAC@NewLife><4A3134EB.4080702@open-bio.org><A53006055C854297AAA58F6650F4F867@NewLife>
	<4A3FE1F1.40607@open-bio.org>
Message-ID: <8B93DCE168434F608620AF17CAF12A9F@NewLife>

awesome, MHC- cheers and thanks-MAJ
----- Original Message ----- 
From: "Mauricio Herrera Cuadra" <mauricio at open-bio.org>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
<bioperl-l at bioperl.org>
Sent: Monday, June 22, 2009 3:56 PM
Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?


> Sorry for the delay. Pdoc site has been updated with docs for 1.6.0 release 
> and latest code from bioperl-live. Also added bioperl-dev and bioperl-pise to 
> the list.
>
> Cheers,
> Mauricio.
>
>
> Mark A. Jensen wrote:
>> cheers Mauricio! MAJ
>> ----- Original Message ----- From: "Mauricio Herrera Cuadra" 
>> <mauricio at open-bio.org>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: "Chris Fields" <cjfields at illinois.edu>; "BioPerl List" 
>> <bioperl-l at bioperl.org>
>> Sent: Thursday, June 11, 2009 12:46 PM
>> Subject: Re: [Bioperl-l] 1.6 on doc.bioperl.org?
>>
>>
>>> Hi Mark,
>>>
>>> I'll take a look into this sometime between today and tomorrow. Will keep 
>>> you posted. Thanks for the heads up :)
>>>
>>> Mauricio.
>>>
>>>
>>> Mark A. Jensen wrote:
>>>> Hi Chris and list-
>>>> Will documentation for release 1.6 be available in pdoc on doc.bioperl.org?
>>>> I notice also that autogenerated documentation for bioperl-live doesn't 
>>>> contain
>>>> new modules (or HIVQuery & Tiling, anyway ;) )--
>>>> cheers, Mark
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From pmr at ebi.ac.uk  Tue Jun 23 11:00:38 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 12:00:38 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
Message-ID: <4A40B5D6.40504@ebi.ac.uk>

We just added FASTQ parsing to EMBOSS and faced the same issues.

Parsing was easy - find the '@' line, read sequence until the '+' line
is reached, then read (seqlen) quality characters ... and check the next
line starts with '@'

Quality scores are kept as phred values. Phred of 0 means unknown, which
in Solexa is -5 (0.75 error rate = could be anything). We assume lower
quality scores are from alignments rather than single reads.

We gave up on trying to guess the quality score standard and require
users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
format files. If we only want the sequence then we don't care so we allow
"fastq" as a sequence format and ignore the quality scores in that case.

We also allow the integer quality score format ... is anyone still using
that (it looks horrible to me :-)

Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.

Any further tips would be very useful.

regards,

Peter Rice


From biopython at maubp.freeserve.co.uk  Tue Jun 23 11:29:56 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 12:29:56 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40B5D6.40504@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
Message-ID: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>

On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> We just added FASTQ parsing to EMBOSS and faced the same issues.
>

I was going to chat to you about this at BOSC, and suggest this be
added to EMBOSS - but you are well ahead of me ;)

> Parsing was easy - find the '@' line, read sequence until the '+' line
> is reached, then read (seqlen) quality characters ... and check the next
> line starts with '@'

That is basically what I did for Biopython.

> Quality scores are kept as phred values. Phred of 0 means unknown,
> which in Solexa is -5 (0.75 error rate = could be anything).

A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
quite follow your leap that this corresponds to a Solexa quality of -5. Could
you clarify?

> We assume lower quality scores are from alignments rather than single reads.

Did you mean to say "higher quality scores" (i.e. lower probability of error),
e.g a PHRED score of 80 which you can get from MAQ doing read mapping
or something consensus based.

> We gave up on trying to guess the quality score standard and require
> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
> format files. If we only want the sequence then we don't care so we allow
> "fastq" as a sequence format and ignore the quality scores in that case.

What format names have you used? Ideally we'd have the same names
in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
"fastq-illumina").

> We also allow the integer quality score format ... is anyone still using
> that (it looks horrible to me :-)

Do you mean the QUAL file format holding PHRED scores? Roche provide
tools to turn their SFF files into FASTA and QUAL files, so they are still used.

> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July 15th.
>
> Any further tips would be very useful.

Great. See you at BOSC 2009!

Peter
(Biopython)


From pmr at ebi.ac.uk  Tue Jun 23 12:22:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Tue, 23 Jun 2009 13:22:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <4A40C909.40803@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>
> 
> I was going to chat to you about this at BOSC, and suggest this be
> added to EMBOSS - but you are well ahead of me ;)

Not that well ahead really ... someone asked for it in our BoF at
BOSC/ISMB last year so we thought we'd better get it done before this
one. it was implemented a couple of days ago :-)

>> Parsing was easy - find the '@' line, read sequence until the '+' line
>> is reached, then read (seqlen) quality characters ... and check the next
>> line starts with '@'
> 
> That is basically what I did for Biopython.
> 
>> Quality scores are kept as phred values. Phred of 0 means unknown,
>> which in Solexa is -5 (0.75 error rate = could be anything).
> 
> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
> quite follow your leap that this corresponds to a Solexa quality of -5. Could
> you clarify?

Phred score is -10 log(p) where p is the probability of error. A phred
of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
(3/4 chance that any base you pick is wrong).

Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
why Solexa scores can go down to -5 in their fastq format.

>> We assume lower quality scores are from alignments rather than single reads.
> 
> Did you mean to say "higher quality scores" (i.e. lower probability of error),
> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
> or something consensus based.

Actually I mean both. Error probabilities below 0.75 for a single base
are silly, and error probabilities below 0.0001 make sense only when two
or more high quality bases are aligned.

>> We gave up on trying to guess the quality score standard and require
>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>> format files. If we only want the sequence then we don't care so we allow
>> "fastq" as a sequence format and ignore the quality scores in that case.
> 
> What format names have you used? Ideally we'd have the same names
> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
> "fastq-illumina").

We don't normally use '-' in our format names so we have fastqsanger,
fastqsolexa, fastqillumina and fastqint. None of these have been tried
on users as yet.

The '-' names look nice though. We can consider introducing them. Do you
have a full list of format names (sequence, feature, alignment, etc.) we
can try to conform to?

>> We also allow the integer quality score format ... is anyone still using
>> that (it looks horrible to me :-)
> 
> Do you mean the QUAL file format holding PHRED scores? Roche provide
> tools to turn their SFF files into FASTA and QUAL files, so they are still used.

Probably ... unless there is a Solexa version too.

regards,

Peter


From rmb32 at cornell.edu  Tue Jun 23 14:28:08 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Tue, 23 Jun 2009 07:28:08 -0700
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com>
	<FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
Message-ID: <4A40E678.8010709@cornell.edu>

Yep, YAPC is great!  This is my first one.  I saw a guy walking around 
here with a nametag that I thought said "Mark Jensen".  MAJ, are you here?

Rob

Chris Fields wrote:
> I think some of the regular #bioperl folk are there (Jay Hannah, R. 
> Buels, etc).  May be worth going on IRC to find everyone.
> 
> I'm giving serious thought to going next year if I can get enough work 
> done towards a perl6 or Moose-based bioperl.
> 
> chris
> 
> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
> 
>>
>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>
>> g.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From maj at fortinbras.us  Tue Jun 23 15:54:24 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 23 Jun 2009 11:54:24 -0400
Subject: [Bioperl-l] Anyone at YAPC?
In-Reply-To: <4A40E678.8010709@cornell.edu>
References: <19007.33948.411442.197063@already.dhcp.gene.com><FE1E935F-C10A-471F-8E9A-27658F5216A2@illinois.edu>
	<4A40E678.8010709@cornell.edu>
Message-ID: <DD5C6FE6AC5842CEAA4487EEC65AC726@NewLife>

I think there are about 75000 of us; that one ain't me, I'm afraid. Maybe next 
year! cheers  MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "bioperl-l List" <bioperl-l at bioperl.org>
Sent: Tuesday, June 23, 2009 10:28 AM
Subject: Re: [Bioperl-l] Anyone at YAPC?


> Yep, YAPC is great!  This is my first one.  I saw a guy walking around here 
> with a nametag that I thought said "Mark Jensen".  MAJ, are you here?
>
> Rob
>
> Chris Fields wrote:
>> I think some of the regular #bioperl folk are there (Jay Hannah, R. Buels, 
>> etc).  May be worth going on IRC to find everyone.
>>
>> I'm giving serious thought to going next year if I can get enough work done 
>> towards a perl6 or Moose-based bioperl.
>>
>> chris
>>
>> On Jun 22, 2009, at 8:18 AM, George Hartzell wrote:
>>
>>>
>>> I'm in Pittsburgh at YAPC 10 and would be happy to say hello to folks.
>>>
>>> g.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Tue Jun 23 20:34:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 15:34:48 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <21116F70-93A3-4539-9BE2-61C838BA730E@illinois.edu>


On Jun 23, 2009, at 7:22 AM, Peter Rice wrote:

> Peter wrote:
> ...
>>> Parsing was easy - find the '@' line, read sequence until the '+'  
>>> line
>>> is reached, then read (seqlen) quality characters ... and check  
>>> the next
>>> line starts with '@'
>>
>> That is basically what I did for Biopython.

This is now what bioperl will do (at least when I commit changes today  
or tomorrow).

> ...
>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so  
>>> we allow
>>> "fastq" as a sequence format and ignore the quality scores in that  
>>> case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do  
> you
> have a full list of format names (sequence, feature, alignment,  
> etc.) we
> can try to conform to?

We (bioperl) are using biopython's convention of format-variant, or at  
least that's how I'm coding it up.  With SeqIO it's fairly easy to  
check for the format variant prior to loading the class and pass it in  
as a second named parameter.

I have actually thought of adding in fastqint as an option (it would  
be fairly easy to do).

chris


From cjfields at illinois.edu  Tue Jun 23 21:04:25 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 23 Jun 2009 16:04:25 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
Message-ID: <49A4AD93-69FB-406E-8FFB-99C74A457402@illinois.edu>

Just so we're on the same page data-wise, would there be a common set  
of fastq data files to use for tests?  I am using some from SRA (which  
is all converted to Sanger).  Just need a few small ones for older  
solexa and newer illumina.

chris

On Jun 23, 2009, at 6:29 AM, Peter wrote:

> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
>> Code is in the EMBOSS CVS, and will appear in release 6.1.0 on July  
>> 15th.
>>
>> Any further tips would be very useful.
>
> Great. See you at BOSC 2009!
>
> Peter
> (Biopython)


From biopython at maubp.freeserve.co.uk  Tue Jun 23 21:39:48 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Jun 2009 22:39:48 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A40C909.40803@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
Message-ID: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>

On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter wrote:
>> On Tue, Jun 23, 2009 at 12:00 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>> We just added FASTQ parsing to EMBOSS and faced the same issues.
>>>
>>
>> I was going to chat to you about this at BOSC, and suggest this be
>> added to EMBOSS - but you are well ahead of me ;)
>
> Not that well ahead really ... someone asked for it in our BoF at
> BOSC/ISMB last year so we thought we'd better get it done before this
> one. it was implemented a couple of days ago :-)
>

Well, ahead of my asking!

>>> Quality scores are kept as phred values. Phred of 0 means unknown,
>>> which in Solexa is -5 (0.75 error rate = could be anything).
>>
>> A Phred quality of 0 means probability of error is 1, so yes, unknown. I don't
>> quite follow your leap that this corresponds to a Solexa quality of -5. Could
>> you clarify?
>
> Phred score is -10 log(p) where p is the probability of error. A phred
> of 0 implies 1.0 (certainty) of error, but 0.75 is a better estimate
> (3/4 chance that any base you pick is wrong).
>
> Solexa scores are -10 log(p/(1-p)) so p=0.75 comes out at -5. This is
> why Solexa scores can go down to -5 in their fastq format.
>
>>> We assume lower quality scores are from alignments rather than
>>> single reads.
>>
>> Did you mean to say "higher quality scores" (i.e. lower probability of error),
>> e.g a PHRED score of 80 which you can get from MAQ doing read mapping
>> or something consensus based.
>
> Actually I mean both. Error probabilities below 0.75 for a single base
> are silly, and error probabilities below 0.0001 make sense only when two
> or more high quality bases are aligned.

I see what you mean - a probability of error of 0.75 matches that
for a random base call, obvious when you put it like that. Of course,
there is this nasty little thought at the back of my mind that sooner
or later someone will use FASTQ files for proteins (e.g. from some
mass-spec protein sequencing).

A probability less than that (e.g. 0) is actually worse than random and
could be considered as mean "we're pretty sure this isn't the stated
letter". But that would be silly, as you say.

>>> We gave up on trying to guess the quality score standard and require
>>> users to say whether they are sanger, solexa (1.0) or Illumina (1.3)
>>> format files. If we only want the sequence then we don't care so we allow
>>> "fastq" as a sequence format and ignore the quality scores in that case.
>>
>> What format names have you used? Ideally we'd have the same names
>> in EMBOSS, BioPerl and Biopython (i.e. "fastq", "fastq-solexa", and
>> "fastq-illumina").
>
> We don't normally use '-' in our format names so we have fastqsanger,
> fastqsolexa, fastqillumina and fastqint. None of these have been tried
> on users as yet.
>
> The '-' names look nice though. We can consider introducing them. Do you
> have a full list of format names (sequence, feature, alignment, etc.) we
> can try to conform to?

See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Getting EMBOSS to conforming should be trivial - in general when
picking a format name for Biopython's SeqIO or AlignIO (and we
have avoided multiple aliases with one exception) we have tried to
use anything shared by BioPerl and EMBOSS. The FASTQ variants
are unusual in that Biopython got to invent some names.

In future where would be a good place to discuss these kinds of
cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

>>> We also allow the integer quality score format ... is anyone still
>>> using that (it looks horrible to me :-)
>>
>> Do you mean the QUAL file format holding PHRED scores?
>> Roche provide tools to turn their SFF files into FASTA and
>> QUAL files, so they are still used.
>
> Probably ... unless there is a Solexa version too.

We may be talking at cross purposes here, this is QUAL format:
http://www.bioperl.org/wiki/Qual_sequence_format

Peter


From pmr at ebi.ac.uk  Wed Jun 24 11:48:23 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 12:48:23 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>	
	<BF526BE6-AFA8-4F66-9665-D6328F9B4FFA@illinois.edu>	
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>	
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>	
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>	
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>	
	<4A40B5D6.40504@ebi.ac.uk>	
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>	
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
Message-ID: <4A421287.4000203@ebi.ac.uk>

Peter wrote:
> On Tue, Jun 23, 2009 at 1:22 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>> The '-' names look nice though. We can consider introducing them. Do you
>> have a full list of format names (sequence, feature, alignment, etc.) we
>> can try to conform to?
> 
> See http://biopython.org/wiki/SeqIO and http://biopython.org/wiki/AlignIO

Thanks. I'll take a look at those.

> Getting EMBOSS to conforming should be trivial - in general when
> picking a format name for Biopython's SeqIO or AlignIO (and we
> have avoided multiple aliases with one exception) we have tried to
> use anything shared by BioPerl and EMBOSS. The FASTQ variants
> are unusual in that Biopython got to invent some names.
> 
> In future where would be a good place to discuss these kinds of
> cross-platform issues? (i.e. BioPerl, Biopython, EMBOSS, etc).

I was planning to suggest a get-together at BOSC in Stockholm so we can
identify common cross-platform issues. I'm sure there are many ways we
can conform with naming and interfaces and perhaps even share code.

>>>> We also allow the integer quality score format ... is anyone still
>>>> using that (it looks horrible to me :-)
>>> Do you mean the QUAL file format holding PHRED scores?
>>> Roche provide tools to turn their SFF files into FASTA and
>>> QUAL files, so they are still used.
>> Probably ... unless there is a Solexa version too.
> 
> We may be talking at cross purposes here, this is QUAL format:
> http://www.bioperl.org/wiki/Qual_sequence_format

Yes that is different. We'll worry about separate QUAL files later (we
already find separate GFF files a pain for features) and still with the
"fastqint" format name.

regards,

Peter


From biopython at maubp.freeserve.co.uk  Wed Jun 24 14:56:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 15:56:13 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A421287.4000203@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
Message-ID: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>

On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> I was planning to suggest a get-together at BOSC in Stockholm so we can
> identify common cross-platform issues. I'm sure there are many ways we
> can conform with naming and interfaces and perhaps even share code.
>

That would be a good idea - but while there are quite a few Biopython
people at BOSC this year, I don't know if there will be many from BioPerl
(there isn't a BioPerl update talk scheduled).

>>>>> We also allow the integer quality score format ... is anyone still
>>>>> using that (it looks horrible to me :-)
>>>> Do you mean the QUAL file format holding PHRED scores?
>>>> Roche provide tools to turn their SFF files into FASTA and
>>>> QUAL files, so they are still used.
>>> Probably ... unless there is a Solexa version too.
>>
>> We may be talking at cross purposes here, this is QUAL format:
>> http://www.bioperl.org/wiki/Qual_sequence_format
>
> Yes that is different. We'll worry about separate QUAL files later (we
> already find separate GFF files a pain for features) and still with the
> "fastqint" format name.

So when you say "fastqint" are you talking about something else?
Could you show us an example record in this format?

Peter
[I need to remember to proof read my evening emails more carefully]


From vecchi.b at gmail.com  Wed Jun 24 16:13:02 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:13:02 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
Message-ID: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>

Jay asked me to forward this to the list, since he sometimes has problems
getting his mails delivered.
Feel free to suggest topics for the bioperl hackathon to take place tomorrow
and on friday!

Bruno.


From: Jay Hannah <jay at jays.net>
Date: June 24, 2009 11:55:42 AM EDT
To: Bioperl <bioperl-l at bioperl.org>
Subject: Hackathon tomorrow (I think)

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

  http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in Bugzilla.

Come yell at me (us?) in IRC:

  http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at illinois.edu  Wed Jun 24 16:22:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:22:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906170625s2ce16596j1683342840d1eec4@mail.gmail.com>
	<1EF8DD47-3641-45DB-B425-98462598E922@illinois.edu>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
Message-ID: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>


On Jun 24, 2009, at 9:56 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 12:48 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
>>
>> I was planning to suggest a get-together at BOSC in Stockholm so we  
>> can
>> identify common cross-platform issues. I'm sure there are many ways  
>> we
>> can conform with naming and interfaces and perhaps even share code.
>>
>
> That would be a good idea - but while there are quite a few Biopython
> people at BOSC this year, I don't know if there will be many from  
> BioPerl
> (there isn't a BioPerl update talk scheduled).

Most of us are caught up with other work, though I will likely be able  
to dedicate more time to it in the ext few months.

Also doesn't help that my travel stipend doesn't start until Aug. 1.

>>>>>> We also allow the integer quality score format ... is anyone  
>>>>>> still
>>>>>> using that (it looks horrible to me :-)
>>>>> Do you mean the QUAL file format holding PHRED scores?
>>>>> Roche provide tools to turn their SFF files into FASTA and
>>>>> QUAL files, so they are still used.
>>>> Probably ... unless there is a Solexa version too.
>>>
>>> We may be talking at cross purposes here, this is QUAL format:
>>> http://www.bioperl.org/wiki/Qual_sequence_format
>>
>> Yes that is different. We'll worry about separate QUAL files later  
>> (we
>> already find separate GFF files a pain for features) and still with  
>> the
>> "fastqint" format name.
>
> So when you say "fastqint" are you talking about something else?
> Could you show us an example record in this format?
>
> Peter
> [I need to remember to proof read my evening emails more carefully]

The same as fastq, except the ASCII quality is converted to actual  
score:

@4_1_912_360
AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
+4_1_912_360
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40  
40 40 40 40 40 40 26 40 40 14 39 40 40
@4_1_54_483
TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
+4_1_54_483
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40  
28 40 40 40 40 40 40 16 40 40 5 40 40
chris


From cjfields at illinois.edu  Wed Jun 24 16:26:22 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:26:22 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
Message-ID: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>

1) Any help towards bugzilla fixes would be most welcome.
2) Better GFF3 integration
3) Typed but lightweight seqfeatures
4) Bio::Moose?

I can dedicate more time to the latter two in about a month, but I'll  
be tied up until then.  Let me know if anyone needs collab on biomoose  
on github; Mark Jensen's already added.

chris

On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:

> Jay asked me to forward this to the list, since he sometimes has  
> problems
> getting his mails delivered.
> Feel free to suggest topics for the bioperl hackathon to take place  
> tomorrow
> and on friday!
>
> Bruno.
>
>
> From: Jay Hannah <jay at jays.net>
> Date: June 24, 2009 11:55:42 AM EDT
> To: Bioperl <bioperl-l at bioperl.org>
> Subject: Hackathon tomorrow (I think)
>
> Hola,
>
> So a few of us here at YAPC might try to be productive tomorrow (and
> Friday?).
>
> I don't know if we have any commit bits attending.
>
> Feel free to suggest things:
>
>  http://yapc10.org/yn2009/wiki?node=BioPerl
>
> Or point me to list(s) of things. Perhaps we'll try to help out in  
> Bugzilla.
>
> Come yell at me (us?) in IRC:
>
>  http://www.bioperl.org/wiki/Irc
>
> Thanks,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Wed Jun 24 16:27:39 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Jun 2009 17:27:39 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
Message-ID: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>

On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu> wrote:
>> So when you say "fastqint" are you talking about something else?
>> Could you show us an example record in this format?
>>
>> Peter
>
> The same as fastq, except the ASCII quality is converted to actual score:
>
> @4_1_912_360
> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
> +4_1_912_360
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40 40 40 40
> 40 40 40 40 26 40 40 14 39 40 40
> @4_1_54_483
> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
> +4_1_54_483
> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40 40 28 40
> 40 40 40 40 40 16 40 40 5 40 40

OK - and who uses this "Integer FASTQ" files?

Peter


From vecchi.b at gmail.com  Wed Jun 24 16:40:50 2009
From: vecchi.b at gmail.com (Bruno Vecchi)
Date: Wed, 24 Jun 2009 13:40:50 -0300
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<1a0c1b750906240938n23bbe8b1h42d99b6e345fee49@mail.gmail.com>
Message-ID: <1a0c1b750906240940t7c0003f9hf10eb30c0d85a5ce@mail.gmail.com>

>
> Is there a todo list for biomoose? I'd be glad to hack in, but I'm afraid
> to step into someone else's work or to do things without general agreement.
> It would be nice to have directions for small sized chunks of work to do.
> In any case, count me in!
>
> 2009/6/24 Chris Fields <cjfields at illinois.edu>
>
> 1) Any help towards bugzilla fixes would be most welcome.
>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>> 4) Bio::Moose?
>>
>> I can dedicate more time to the latter two in about a month, but I'll be
>> tied up until then.  Let me know if anyone needs collab on biomoose on
>> github; Mark Jensen's already added.
>>
>> chris
>>
>>
>> On Jun 24, 2009, at 11:13 AM, Bruno Vecchi wrote:
>>
>>  Jay asked me to forward this to the list, since he sometimes has problems
>>> getting his mails delivered.
>>> Feel free to suggest topics for the bioperl hackathon to take place
>>> tomorrow
>>> and on friday!
>>>
>>> Bruno.
>>>
>>>
>>> From: Jay Hannah <jay at jays.net>
>>> Date: June 24, 2009 11:55:42 AM EDT
>>> To: Bioperl <bioperl-l at bioperl.org>
>>> Subject: Hackathon tomorrow (I think)
>>>
>>> Hola,
>>>
>>> So a few of us here at YAPC might try to be productive tomorrow (and
>>> Friday?).
>>>
>>> I don't know if we have any commit bits attending.
>>>
>>> Feel free to suggest things:
>>>
>>>  http://yapc10.org/yn2009/wiki?node=BioPerl
>>>
>>> Or point me to list(s) of things. Perhaps we'll try to help out in
>>> Bugzilla.
>>>
>>> Come yell at me (us?) in IRC:
>>>
>>>  http://www.bioperl.org/wiki/Irc
>>>
>>> Thanks,
>>>
>>> Jay Hannah
>>> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>


From jay at jays.net  Wed Jun 24 16:44:51 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 12:44:51 -0400
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
Message-ID: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>

On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> Let me know if anyone needs collab on biomoose on github; Mark  
> Jensen's already added.

Anything on github should be trivial, even with no perms -- we can  
just fork and then send you (whoever) pull requests. github++  :)

> 1) Any help towards bugzilla fixes would be most welcome.

I don't know how to make any progress in bugzilla if no one has a  
commit bit...?

> 2) Better GFF3 integration
> 3) Typed but lightweight seqfeatures

Are there bugzilla tickets (or somewhere) describing those?

I wonder if anyone can help me get out of sporadic MailMan purgatory...

Thanks,

j


From cjfields at illinois.edu  Wed Jun 24 16:54:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 11:54:06 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
Message-ID: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>


On Jun 24, 2009, at 11:27 AM, Peter wrote:

> On Wed, Jun 24, 2009 at 5:22 PM, Chris Fields<cjfields at illinois.edu>  
> wrote:
>>> So when you say "fastqint" are you talking about something else?
>>> Could you show us an example record in this format?
>>>
>>> Peter
>>
>> The same as fastq, except the ASCII quality is converted to actual  
>> score:
>>
>> @4_1_912_360
>> AAGGGGCTAGAGAAACACGTAATGAAGGGAGGACTC
>> +4_1_912_360
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 21 40 40  
>> 40 40 40
>> 40 40 40 40 26 40 40 14 39 40 40
>> @4_1_54_483
>> TAATAAATGTGCTTCCTTGATGCATGTGCTATGATT
>> +4_1_54_483
>> 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 16 40 40  
>> 40 28 40
>> 40 40 40 40 40 16 40 40 5 40 40
>
> OK - and who uses this "Integer FASTQ" files?
>
> Peter

Not sure, but it is covered by MAQ via the conversion script (as FASTQ- 
int):

http://maq.sourceforge.net/fq_all2std.pl

chris


From jay at jays.net  Wed Jun 24 15:55:42 2009
From: jay at jays.net (Jay Hannah)
Date: Wed, 24 Jun 2009 11:55:42 -0400
Subject: [Bioperl-l] Hackathon tomorrow (I think)
Message-ID: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>

Hola,

So a few of us here at YAPC might try to be productive tomorrow (and  
Friday?).

I don't know if we have any commit bits attending.

Feel free to suggest things:

    http://yapc10.org/yn2009/wiki?node=BioPerl

Or point me to list(s) of things. Perhaps we'll try to help out in  
Bugzilla.

Come yell at me (us?) in IRC:

    http://www.bioperl.org/wiki/Irc

Thanks,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From bernd.web at gmail.com  Wed Jun 24 17:11:51 2009
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 24 Jun 2009 19:11:51 +0200
Subject: [Bioperl-l] Bioperl_scripts
Message-ID: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>

Hi,

The bioperl scripts section at
http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
examples.
However, it quite a number of scripts cannot be found anymore and return errors:

For example for the first link (scripts/install_bioperl_scripts.pl)
Filesystem has no item: File not found: revision 15800, path
'/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
/usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245

Also all scripts in the Bio::Graphics section cannot be found.
Is the http://www.bioperl.org/wiki/Bioperl_scripts page still supported?


Regards,
Bernd


From cjfields at illinois.edu  Wed Jun 24 20:57:51 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 15:57:51 -0500
Subject: [Bioperl-l] Bioperl_scripts
In-Reply-To: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
References: <716af09c0906241011s6cfe0698u6d3ba5f69f730d4c@mail.gmail.com>
Message-ID: <5AF99205-F977-45A1-B4AF-C3858A5727FD@illinois.edu>


On Jun 24, 2009, at 12:11 PM, Bernd Web wrote:

> Hi,
>
> The bioperl scripts section at
> http://www.bioperl.org/wiki/Bioperl_scripts is really handy for short
> examples.
> However, it quite a number of scripts cannot be found anymore and  
> return errors:
>
> For example for the first link (scripts/install_bioperl_scripts.pl)
> Filesystem has no item: File not found: revision 15800, path
> '/bioperl-live/trunk/scripts/install_bioperl_scripts.pl' at
> /usr/lib/perl5/site_perl/5.8.8/SVN/Web/action.pm line 245
>
> Also all scripts in the Bio::Graphics section cannot be found.
> Is the http://www.bioperl.org/wiki/Bioperl_scripts page still  
> supported?
>
> Regards,
> Bernd

Re: Bio::Graphics, all modules and related scripts have been moved to  
a separate repo and CPAN release (latest):

http://search.cpan.org/~lds/Bio-Graphics-1.96/

Beyond that I would consider all scripts and the wiki page supported.   
It's best to file this to bugzilla as a documentation issue so we fix  
it and don't about forget it amongst the flurry of email.

chris


From cjfields at illinois.edu  Wed Jun 24 21:10:34 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 16:10:34 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
Message-ID: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>


On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:

> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>> Let me know if anyone needs collab on biomoose on github; Mark  
>> Jensen's already added.
>
> Anything on github should be trivial, even with no perms -- we can  
> just fork and then send you (whoever) pull requests. github++  :)
>
>> 1) Any help towards bugzilla fixes would be most welcome.
>
> I don't know how to make any progress in bugzilla if no one has a  
> commit bit...?

For some reason I thought you had a commit bit; we can add you in if  
needed.  Anyway, patches are most definitely welcome ;>

>> 2) Better GFF3 integration
>> 3) Typed but lightweight seqfeatures
>
> Are there bugzilla tickets (or somewhere) describing those?

No as the issues are more complex than one single bug, but we do have  
something to help track for the time being:

http://www.bioperl.org/wiki/GFF_Refactor
http://www.bioperl.org/wiki/Align_Refactor

I'll probably file TODOs during the process for those refactors.  The  
easiest to tackle would be probably be Align/LocatableSeq refactors.

> I wonder if anyone can help me get out of sporadic MailMan  
> purgatory...
>
> Thanks,
>
> j

-c

PS - Don't feel constrained by the above.  There are many many areas  
to contribute to.


From pmr at ebi.ac.uk  Wed Jun 24 22:44:33 2009
From: pmr at ebi.ac.uk (Peter Rice)
Date: Wed, 24 Jun 2009 23:44:33 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
Message-ID: <4A42AC51.3090809@ebi.ac.uk>

Chris Fields wrote:
> Not sure, but it is covered by MAQ via the conversion script (as 
> FASTQ-int):

Are the scores phred or Solexa?

Peter Rice


From adlai at refenestration.com  Thu Jun 25 02:08:31 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 04:08:31 +0200
Subject: [Bioperl-l] Extreme newbie question.
Message-ID: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>

I have been trying to install BioPerl for a while now and after  
pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
Fink installation, a >cpan installation and removing my .cpan folder I  
am still at square 0. I do not want to do anymore damage to my  
computer, yet I really need a working install (especially to interface  
with remote DBs like GenBank. Can anyone give me some advice here?  
After each attempt, I have tried to run perldoc bptutorial.pl and  
tried test scripts with "use Bio::Perl" in the headers and I just  
receive  error mesages like the following:

Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level /Library/ 
Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
thread-multi-2level /Library/Perl/5.8.8 /Library/Perl /Network/Library/ 
Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
Library/Perl/5.8.1 .) at trsh.pl line 1.

I have been working from the OReilly book astering Perl for  
Bioinformatics and the INSTALL file and have scoured around the  
BioPerl website and am still stuck.

Thanks in advance,

Adlai


From kpclancy at hotmail.com  Thu Jun 25 02:31:17 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Wed, 24 Jun 2009 20:31:17 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net> 
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
Message-ID: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>


is there an intention to have a hackathon at ISMB this weekend - I know there is a 2 day BOSC 
kevin

> From: cjfields at illinois.edu
> To: jay at jays.net
> Date: Wed, 24 Jun 2009 16:10:34 -0500
> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> 
> 
> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> 
> > On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >> Let me know if anyone needs collab on biomoose on github; Mark  
> >> Jensen's already added.
> >
> > Anything on github should be trivial, even with no perms -- we can  
> > just fork and then send you (whoever) pull requests. github++  :)
> >
> >> 1) Any help towards bugzilla fixes would be most welcome.
> >
> > I don't know how to make any progress in bugzilla if no one has a  
> > commit bit...?
> 
> For some reason I thought you had a commit bit; we can add you in if  
> needed.  Anyway, patches are most definitely welcome ;>
> 
> >> 2) Better GFF3 integration
> >> 3) Typed but lightweight seqfeatures
> >
> > Are there bugzilla tickets (or somewhere) describing those?
> 
> No as the issues are more complex than one single bug, but we do have  
> something to help track for the time being:
> 
> http://www.bioperl.org/wiki/GFF_Refactor
> http://www.bioperl.org/wiki/Align_Refactor
> 
> I'll probably file TODOs during the process for those refactors.  The  
> easiest to tackle would be probably be Align/LocatableSeq refactors.
> 
> > I wonder if anyone can help me get out of sporadic MailMan  
> > purgatory...
> >
> > Thanks,
> >
> > j
> 
> -c
> 
> PS - Don't feel constrained by the above.  There are many many areas  
> to contribute to.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jun 25 03:54:28 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 24 Jun 2009 22:54:28 -0500
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
Message-ID: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>

I have no idea; I don't think there are many bioperl devs attending  
this year unfortunately.  Any meetings in the next year where we could  
set up a bioperl hackathon?  I will likely be available to attend if  
it's stateside...

chris

On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:

>
> is there an intention to have a hackathon at ISMB this weekend - I  
> know there is a 2 day BOSC
> kevin
>
>> From: cjfields at illinois.edu
>> To: jay at jays.net
>> Date: Wed, 24 Jun 2009 16:10:34 -0500
>> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
>>
>>
>> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
>>
>>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
>>>> Let me know if anyone needs collab on biomoose on github; Mark
>>>> Jensen's already added.
>>>
>>> Anything on github should be trivial, even with no perms -- we can
>>> just fork and then send you (whoever) pull requests. github++  :)
>>>
>>>> 1) Any help towards bugzilla fixes would be most welcome.
>>>
>>> I don't know how to make any progress in bugzilla if no one has a
>>> commit bit...?
>>
>> For some reason I thought you had a commit bit; we can add you in if
>> needed.  Anyway, patches are most definitely welcome ;>
>>
>>>> 2) Better GFF3 integration
>>>> 3) Typed but lightweight seqfeatures
>>>
>>> Are there bugzilla tickets (or somewhere) describing those?
>>
>> No as the issues are more complex than one single bug, but we do have
>> something to help track for the time being:
>>
>> http://www.bioperl.org/wiki/GFF_Refactor
>> http://www.bioperl.org/wiki/Align_Refactor
>>
>> I'll probably file TODOs during the process for those refactors.  The
>> easiest to tackle would be probably be Align/LocatableSeq refactors.
>>
>>> I wonder if anyone can help me get out of sporadic MailMan
>>> purgatory...
>>>
>>> Thanks,
>>>
>>> j
>>
>> -c
>>
>> PS - Don't feel constrained by the above.  There are many many areas
>> to contribute to.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jun 25 14:00:47 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 09:00:47 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <CB4314ED-4076-42AD-96CC-64CB429929D5@illinois.edu>


On Jun 24, 2009, at 5:44 PM, Peter Rice wrote:

> Chris Fields wrote:
>> Not sure, but it is covered by MAQ via the conversion script (as  
>> FASTQ-int):
>
> Are the scores phred or Solexa?
>
> Peter Rice

Not sure actually.  The perl script I linked to looks like it converts  
using the same scale as solexa (illumina 1.0).

chris


From chmille4 at gmail.com  Thu Jun 25 14:46:26 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 10:46:26 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
Message-ID: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>

Hi all,

Quick question I came across while writing the Bio::Nexml module.

I'm trying to link taxon data to a Bio::LocatableSeq object inside a
Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
SeqFeatures, but according to this HowTo (
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
considered to refer to a portion of a sequence, whereas something like taxon
data would refer to the entire sequence and should be handled as an
annotation. However, as far as I can tell Bio::LocatableSeq does not support
annotation objects.
What would be the best way to relate taxon data to a single sequence inside
an alignment?


Thanks,
Chase


From Kevin.M.Brown at asu.edu  Thu Jun 25 15:21:02 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 08:21:02 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink

That error suggests that the install fails and you need to figure out
why from the install error messages. I suspect you aren't doing the
install as root, but as a normal user who lacks the needed permissions
to change files in certain directories. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Adlai Burman
> Sent: Wednesday, June 24, 2009 7:09 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Extreme newbie question.
> 
> I have been trying to install BioPerl for a while now and after  
> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at  
> Fink installation, a >cpan installation and removing my .cpan 
> folder I  
> am still at square 0. I do not want to do anymore damage to my  
> computer, yet I really need a working install (especially to 
> interface  
> with remote DBs like GenBank. Can anyone give me some advice here?  
> After each attempt, I have tried to run perldoc bptutorial.pl and  
> tried test scripts with "use Bio::Perl" in the headers and I just  
> receive  error mesages like the following:
> 
> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/ 
> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level 
> /Library/ 
> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin- 
> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl 
> /Network/Library/ 
> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 / 
> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread- 
> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 / 
> Library/Perl/5.8.1 .) at trsh.pl line 1.
> 
> I have been working from the OReilly book astering Perl for  
> Bioinformatics and the INSTALL file and have scoured around the  
> BioPerl website and am still stuck.
> 
> Thanks in advance,
> 
> Adlai
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From David.Messina at sbc.su.se  Thu Jun 25 16:39:22 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 25 Jun 2009 18:39:22 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
Message-ID: <628aabb70906250939l7d1116d0sec9efa2c16235c75@mail.gmail.com>

Hi Adlai,
Did the Bioperl tests run successfully? Did you get the impression that the
installation was successful?

If not, what are the errors you see during the install process?

I ask because the error you included in your message is not necessarily
indicative of a failed installation (it could just be a path issue).

By the way, as I think is indicated somewhere in the installation
instructions, you don't actually need to install Bioperl to use most of its
functionality. Simply having the Bio/ directory in your PERL5LIB path is
enough.


Dave


From cjfields at illinois.edu  Thu Jun 25 17:02:48 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 12:02:48 -0500
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com>
Message-ID: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>

On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:

> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something  
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not  
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence  
> inside
> an alignment?
>
> Thanks,
> Chase

 From working with feature/annotation-rich alignment formats such as  
stockholm I found this is one of the areas for Align that needs some  
rethinking. One way to work around this w/o major refactoring is to  
have a full-length SeqFeature (pointing to the proper LocatableSeq)  
that stores the Bio::Annotation.  I don't necessarily like that  
approach as a long-term solution, though, as it's a little hacky and  
indirect, but it might get you started (just mark it as TODO so we can  
catch it at some point).

For a long-term solution I don't think the answer is as simple as  
making LocatableSeq Bio::AnnotatableI; that would not be congruent  
with the PrimarySeq implementation (which is not AnnotatableI).   
LocatableSeq is supposed to represent a simple PrimarySeq that can be  
mapped to other sequences via start/end/strand, and thus inherits from  
both Bio::PrimarySeq (note lack of 'I') and RangeI.

Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and  
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the  
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow  
e.g. features.

I personally think option #2 is easiest, as this means anything that  
is-a PrimarySeq is also AnnotatableI, and it might not break past  
scripts.  Not sure how this would affect overall performance though.

chris


From me at miguel.weapps.com  Thu Jun 25 14:09:29 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Thu, 25 Jun 2009 16:09:29 +0200
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <94da4c880906250709j7b2cb78dk77710bd43e20fd42@mail.gmail.com>

Dear all,
Is there a way to run muscle silently via
Bio::Tools::Run::Alignment::Muscle?

Cheers,

-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]

+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From chmille4 at gmail.com  Thu Jun 25 17:57:25 2009
From: chmille4 at gmail.com (Chase Miller)
Date: Thu, 25 Jun 2009 13:57:25 -0400
Subject: [Bioperl-l] Bio::LocatableSeq and Annotation vs Feature
In-Reply-To: <3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
References: <991fb8210906250746v19faa99dy690db524904bbf4a@mail.gmail.com> 
	<3149D8E9-F145-4438-973E-3728575F436E@illinois.edu>
Message-ID: <991fb8210906251057i25bbe511r84f5d1319f191421@mail.gmail.com>

Ok, I'll use the full length SeqFeature for now and mark it with a TODO.
 Thanks for the help.
Chase

On Thu, Jun 25, 2009 at 1:02 PM, Chris Fields <cjfields at illinois.edu> wrote:

> On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:
>
>  Hi all,
>>
>> Quick question I came across while writing the Bio::Nexml module.
>>
>> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
>> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
>> SeqFeatures, but according to this HowTo (
>> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
>> considered to refer to a portion of a sequence, whereas something like
>> taxon
>> data would refer to the entire sequence and should be handled as an
>> annotation. However, as far as I can tell Bio::LocatableSeq does not
>> support
>> annotation objects.
>> What would be the best way to relate taxon data to a single sequence
>> inside
>> an alignment?
>>
>> Thanks,
>> Chase
>>
>
> From working with feature/annotation-rich alignment formats such as
> stockholm I found this is one of the areas for Align that needs some
> rethinking. One way to work around this w/o major refactoring is to have a
> full-length SeqFeature (pointing to the proper LocatableSeq) that stores the
> Bio::Annotation.  I don't necessarily like that approach as a long-term
> solution, though, as it's a little hacky and indirect, but it might get you
> started (just mark it as TODO so we can catch it at some point).
>
> For a long-term solution I don't think the answer is as simple as making
> LocatableSeq Bio::AnnotatableI; that would not be congruent with the
> PrimarySeq implementation (which is not AnnotatableI).  LocatableSeq is
> supposed to represent a simple PrimarySeq that can be mapped to other
> sequences via start/end/strand, and thus inherits from both Bio::PrimarySeq
> (note lack of 'I') and RangeI.
>
> Three options:
> 1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and
> Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
> 2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the
> PrimarySeq AnnotationCollection).
> 3) All AnnotationI need to be linked back to the PrimarySeqI somehow e.g.
> features.
>
> I personally think option #2 is easiest, as this means anything that is-a
> PrimarySeq is also AnnotatableI, and it might not break past scripts.  Not
> sure how this would affect overall performance though.
>
> chris
>


From Kevin.M.Brown at asu.edu  Thu Jun 25 18:54:19 2009
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 25 Jun 2009 11:54:19 -0700
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<93BFA04E-D4DB-41B3-AAF6-56CA709980B6@refenestration.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4060BA08F@EX02.asurite.ad.asu.edu>

Please keep your replies on the list. 

> -----Original Message-----
> From: Adlai Burman [mailto:adlai at refenestration.com] 
> Sent: Thursday, June 25, 2009 11:39 AM
> To: Kevin Brown
> Subject: Re: [Bioperl-l] Extreme newbie question.
> 
> Thanks, Kevin.
> I did install everything using sudo. I will try again and pay  
> attention to the error log. I hope I did not introduce any conflicts  
> or weird path problems.
> 
> Adlai
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
> 
> > http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >
> > Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
> >
> > That error suggests that the install fails and you need to 
> figure out
> > why from the install error messages. I suspect you aren't doing the
> > install as root, but as a normal user who lacks the needed 
> permissions
> > to change files in certain directories.
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >> Adlai Burman
> >> Sent: Wednesday, June 24, 2009 7:09 PM
> >> To: bioperl-l at lists.open-bio.org
> >> Subject: [Bioperl-l] Extreme newbie question.
> >>
> >> I have been trying to install BioPerl for a while now and after
> >> pummeling my hard drive (Mac OS 10.5 intel) with several 
> attempts at
> >> Fink installation, a >cpan installation and removing my .cpan
> >> folder I
> >> am still at square 0. I do not want to do anymore damage to my
> >> computer, yet I really need a working install (especially to
> >> interface
> >> with remote DBs like GenBank. Can anyone give me some advice here?
> >> After each attempt, I have tried to run perldoc bptutorial.pl and
> >> tried test scripts with "use Bio::Perl" in the headers and I just
> >> receive  error mesages like the following:
> >>
> >> Can't locate Bio/Perl.pm in @INC (@INC contains: 
> /home/users/dag/lib/
> >> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
> >> /Library/
> >> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
> >> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
> >> /Network/Library/
> >> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
> >> Network/Library/Perl 
> /System/Library/Perl/Extras/5.8.8/darwin-thread-
> >> multi-2level /System/Library/Perl/Extras/5.8.8 
> /Library/Perl/5.8.6 /
> >> Library/Perl/5.8.1 .) at trsh.pl line 1.
> >>
> >> I have been working from the OReilly book astering Perl for
> >> Bioinformatics and the INSTALL file and have scoured around the
> >> BioPerl website and am still stuck.
> >>
> >> Thanks in advance,
> >>
> >> Adlai
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> 
> 


From adlai at refenestration.com  Thu Jun 25 18:59:10 2009
From: adlai at refenestration.com (Adlai Burman)
Date: Thu, 25 Jun 2009 20:59:10 +0200
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
Message-ID: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>

Hey again, I'm right into trying to install again and I now get a new  
error:

Client not fully configured, please proceed with configuring.
  o conf init urllist

any ideas?

Adlai

On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:

> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>
> That error suggests that the install fails and you need to figure out
> why from the install error messages. I suspect you aren't doing the
> install as root, but as a normal user who lacks the needed permissions
> to change files in certain directories.
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Adlai Burman
>> Sent: Wednesday, June 24, 2009 7:09 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Extreme newbie question.
>>
>> I have been trying to install BioPerl for a while now and after
>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>> Fink installation, a >cpan installation and removing my .cpan
>> folder I
>> am still at square 0. I do not want to do anymore damage to my
>> computer, yet I really need a working install (especially to
>> interface
>> with remote DBs like GenBank. Can anyone give me some advice here?
>> After each attempt, I have tried to run perldoc bptutorial.pl and
>> tried test scripts with "use Bio::Perl" in the headers and I just
>> receive  error mesages like the following:
>>
>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/lib/
>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>> /Library/
>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>> /Network/Library/
>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin-thread-
>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>
>> I have been working from the OReilly book astering Perl for
>> Bioinformatics and the INSTALL file and have scoured around the
>> BioPerl website and am still stuck.
>>
>> Thanks in advance,
>>
>> Adlai
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>


From cjfields at illinois.edu  Thu Jun 25 20:07:44 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 15:07:44 -0500
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>
	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <F3802595-7617-4CD5-AC8A-2B67069BE001@illinois.edu>

That would mean, within the cpan shell, type 'o conf init  
urllist' (again, requires sudo).

chris

On Jun 25, 2009, at 1:59 PM, Adlai Burman wrote:

> Hey again, I'm right into trying to install again and I now get a  
> new error:
>
> Client not fully configured, please proceed with configuring.
> o conf init urllist
>
> any ideas?
>
> Adlai
>
> On Jun 25, 2009, at 5:21 PM, Kevin Brown wrote:
>
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>>
>> Or http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink
>>
>> That error suggests that the install fails and you need to figure out
>> why from the install error messages. I suspect you aren't doing the
>> install as root, but as a normal user who lacks the needed  
>> permissions
>> to change files in certain directories.
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Adlai Burman
>>> Sent: Wednesday, June 24, 2009 7:09 PM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] Extreme newbie question.
>>>
>>> I have been trying to install BioPerl for a while now and after
>>> pummeling my hard drive (Mac OS 10.5 intel) with several attempts at
>>> Fink installation, a >cpan installation and removing my .cpan
>>> folder I
>>> am still at square 0. I do not want to do anymore damage to my
>>> computer, yet I really need a working install (especially to
>>> interface
>>> with remote DBs like GenBank. Can anyone give me some advice here?
>>> After each attempt, I have tried to run perldoc bptutorial.pl and
>>> tried test scripts with "use Bio::Perl" in the headers and I just
>>> receive  error mesages like the following:
>>>
>>> Can't locate Bio/Perl.pm in @INC (@INC contains: /home/users/dag/ 
>>> lib/
>>> perl5/ /Library/Perl/Updates/5.8.8/darwin-thread-multi-2level
>>> /Library/
>>> Perl/Updates/5.8.8 /System/Library/Perl/5.8.8/darwin-thread-
>>> multi-2level /System/Library/Perl/5.8.8 /Library/Perl/5.8.8/darwin-
>>> thread-multi-2level /Library/Perl/5.8.8 /Library/Perl
>>> /Network/Library/
>>> Perl/5.8.8/darwin-thread-multi-2level /Network/Library/Perl/5.8.8 /
>>> Network/Library/Perl /System/Library/Perl/Extras/5.8.8/darwin- 
>>> thread-
>>> multi-2level /System/Library/Perl/Extras/5.8.8 /Library/Perl/5.8.6 /
>>> Library/Perl/5.8.1 .) at trsh.pl line 1.
>>>
>>> I have been working from the OReilly book astering Perl for
>>> Bioinformatics and the INSTALL file and have scoured around the
>>> BioPerl website and am still stuck.
>>>
>>> Thanks in advance,
>>>
>>> Adlai
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Thu Jun 25 20:19:07 2009
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 25 Jun 2009 21:19:07 +0100
Subject: [Bioperl-l] Extreme newbie question.
In-Reply-To: <E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
References: <99C689C0-B999-45CB-8B99-0CABCC69AB77@refenestration.com>	<1A4207F8295607498283FE9E93B775B4060BA013@EX02.asurite.ad.asu.edu>
	<E840DA6D-6AE4-42A2-9380-3647658E3C6E@refenestration.com>
Message-ID: <4A43DBBB.2050109@sendu.me.uk>

Adlai Burman wrote:
> Hey again, I'm right into trying to install again and I now get a new 
> error:
> 
> Client not fully configured, please proceed with configuring.
>  o conf init urllist

Run cpan and do as it says.


From cjm at berkeleybop.org  Fri Jun 26 00:32:05 2009
From: cjm at berkeleybop.org (Chris Mungall)
Date: Thu, 25 Jun 2009 17:32:05 -0700
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
Message-ID: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>


I've written a module Bio::FeatureIO::seqont_owl, which generates  
Sequence Ontology compliant RDF/OWL. This will allow for example  
loading of GFF into triplestores and inference using OWL reasoners.

- It's experimental, fairly incomplete, and subject to change
- Relies on an experimental extension of SO
- Probably of interest to a minority of bp users
- It's not yet fully documented (but there will be a paper)
- It doesn't introduce any additional dependencies (all done via  
XML::Writer, which is already a dependency)
- Doesn't otherwise impinge on existing code

I'd like to get this under source control. Is the appropriate place  
for this:

- HEAD
- a branch
- bioperl-dev
- a separate repository

?

Cheers
Chris


From maj at fortinbras.us  Fri Jun 26 01:08:43 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 25 Jun 2009 21:08:43 -0400
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
Message-ID: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>

This sounds very Dev to me. Also cool.
MAJ
----- Original Message ----- 
From: "Chris Mungall" <cjm at berkeleybop.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Thursday, June 25, 2009 8:32 PM
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF


>
> I've written a module Bio::FeatureIO::seqont_owl, which generates  Sequence 
> Ontology compliant RDF/OWL. This will allow for example  loading of GFF into 
> triplestores and inference using OWL reasoners.
>
> - It's experimental, fairly incomplete, and subject to change
> - Relies on an experimental extension of SO
> - Probably of interest to a minority of bp users
> - It's not yet fully documented (but there will be a paper)
> - It doesn't introduce any additional dependencies (all done via  XML::Writer, 
> which is already a dependency)
> - Doesn't otherwise impinge on existing code
>
> I'd like to get this under source control. Is the appropriate place  for this:
>
> - HEAD
> - a branch
> - bioperl-dev
> - a separate repository
>
> ?
>
> Cheers
> Chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Fri Jun 26 01:35:06 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 25 Jun 2009 20:35:06 -0500
Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
In-Reply-To: <7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
References: <CE668EC6-1ED5-4BD8-868C-729031724A09@berkeleybop.org>
	<7DFBF7E8A6DB45F3AB836EBAFBE597AC@NewLife>
Message-ID: <12F203C3-689B-423E-9691-86EB1D500A7D@illinois.edu>

I agree.  Just to note, FeatureIO (even though it's in core) will be  
operated on at some future point to be simplified (and likely will  
move away from Bio::SF::Annotated).

chris

On Jun 25, 2009, at 8:08 PM, Mark A. Jensen wrote:

> This sounds very Dev to me. Also cool.
> MAJ
> ----- Original Message ----- From: "Chris Mungall" <cjm at berkeleybop.org 
> >
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Thursday, June 25, 2009 8:32 PM
> Subject: [Bioperl-l] experimental IO module for exporting OWL from GFF
>
>
>>
>> I've written a module Bio::FeatureIO::seqont_owl, which generates   
>> Sequence Ontology compliant RDF/OWL. This will allow for example   
>> loading of GFF into triplestores and inference using OWL reasoners.
>>
>> - It's experimental, fairly incomplete, and subject to change
>> - Relies on an experimental extension of SO
>> - Probably of interest to a minority of bp users
>> - It's not yet fully documented (but there will be a paper)
>> - It doesn't introduce any additional dependencies (all done via   
>> XML::Writer, which is already a dependency)
>> - Doesn't otherwise impinge on existing code
>>
>> I'd like to get this under source control. Is the appropriate  
>> place  for this:
>>
>> - HEAD
>> - a branch
>> - bioperl-dev
>> - a separate repository
>>
>> ?
>>
>> Cheers
>> Chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From rmb32 at cornell.edu  Fri Jun 26 04:27:55 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Thu, 25 Jun 2009 21:27:55 -0700
Subject: [Bioperl-l] BioPerl hackathon, hooray!
Message-ID: <4A444E4B.2000808@cornell.edu>

I'm pleased to announce a thoroughly climactic conclusion to the 
YAPC::NA 2009 BioPerl hackathon.

Between Jay Hannah (jhannah) and myself (rbuels), plus #bioperl virtual 
participant Bruno Vecchi (brunov), we SMASHED the HECK out of 6 bugs in 
the BioPerl Bugzilla.

Many thanks to the participants, let's do it again next year!

Rob


From jay at jays.net  Fri Jun 26 04:54:31 2009
From: jay at jays.net (Jay Hannah)
Date: Fri, 26 Jun 2009 00:54:31 -0400
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <4A444E4B.2000808@cornell.edu>
References: <4A444E4B.2000808@cornell.edu>
Message-ID: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>

On Jun 26, 2009, at 12:27 AM, Robert Buels wrote:
> I'm pleased to announce a thoroughly climactic conclusion to the  
> YAPC::NA 2009 BioPerl hackathon.

Feel free to check our work:

    http://github.com/rbuels/bioperl-live

:)

j
http://www.bioperl.org/wiki/User:Jhannah


From rahall2 at ualr.edu  Fri Jun 26 06:28:05 2009
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 26 Jun 2009 01:28:05 -0500
Subject: [Bioperl-l] Random nucleotide string generator?
Message-ID: <fc2dd7b3461f.4a442425@ualr.edu>

All,
 
Is there a random generator for creating nucleotides (of length l with composition frequencies a, c, g, and t) in there somewhere? 
 
I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
 
If not - what should the namespace be for such a module should it be undone and desirable? 
 
TIA!
 
Roger 
 
 
From David.Messina at sbc.su.se  Fri Jun 26 10:15:04 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 12:15:04 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com>

The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on this
post from Neil Saunders' blog:
http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/


You can also do this outside of BioPerl using shuffle from Sean Eddy's SQUID
package, available here:
[ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>

<ftp://selab.janelia.org/pub/software/squid/>

If not - what should the namespace be for such a module should it be undone
> and desirable?


Perhaps add it to Bio::SeqUtils?


Dave


From David.Messina at sbc.su.se  Fri Jun 26 11:37:44 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 13:37:44 +0200
Subject: [Bioperl-l] [yapc] BioPerl hackathon, hooray!
In-Reply-To: <E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
References: <4A444E4B.2000808@cornell.edu>
	<E316C92D-870D-4AFC-88EC-3897B351D901@jays.net>
Message-ID: <628aabb70906260437r18fc7543oc05761241fe810ff@mail.gmail.com>

Awesome, great work guys!
Thanks so much.


Dave


From David.Messina at sbc.su.se  Fri Jun 26 12:58:20 2009
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 26 Jun 2009 14:58:20 +0200
Subject: [Bioperl-l]  Random nucleotide string generator?
In-Reply-To: <1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
References: <fc2dd7b3461f.4a442425@ualr.edu>
	<628aabb70906260315o2a3fbaem19ef24b8594649e3@mail.gmail.com> 
	<1a0c1b750906260544u6a945ce1s652de19b8b27c615@mail.gmail.com>
Message-ID: <628aabb70906260558k585f6700ycef271e7f26dd1a3@mail.gmail.com>

[Forwarding Bruno's reply.... -Dave]
---------- Forwarded message ----------
From: Bruno Vecchi <vecchi.b at gmail.com>
Date: Fri, Jun 26, 2009 at 14:44
Subject: Re: [Bioperl-l] Random nucleotide string generator?
To: Dave Messina <David.Messina at sbc.su.se>


Here's a little script that I used for a somewhat related task. It produces
a randomized version of an input sequence (thus keeping the original's
composition). Maybe you could adjust it to your needs; providing an input
sequence with the desired length and composition you should get what you
want.

#!perl
use List::Util qw(shuffle);
use Bio::SeqIO;

my ($seqfile, $number) = @ARGV;

my $in = Bio::SeqIO->new(-file => $seqfile);
my $fh = Bio::SeqIO->newFh(-format => 'fasta');

my $seq = $in->next_seq;
my @chars = split '', $seq->seq;

for my $i (1 .. $number) {
    @chars = shuffle @chars;
    my $new_seq = Bio::Seq->new(-id => $i, -seq => join '', @chars);
    print $fh $new_seq;
}

You can use it like this from the command line (assuming you want 20 output
sequences):

shuffle.pl input_sequence.fasta 20 > random_sequences.fasta

Bruno.

2009/6/26 Dave Messina <David.Messina at sbc.su.se>

> The Bioperl solution piggybacks on EMBOSS. See Chris Fields' comment on
> this
> post from Neil Saunders' blog:
>
> http://nsaunders.wordpress.com/2007/07/25/howto-generate-random-sequences-using-emboss-and-perl/
>
>
> You can also do this outside of BioPerl using shuffle from Sean Eddy's
> SQUID
> package, available here:
> [ SQUID ftp site ] <ftp://selab.janelia.org/pub/software/squid/>
>
> <ftp://selab.janelia.org/pub/software/squid/>
>
> If not - what should the namespace be for such a module should it be undone
> > and desirable?
>
>
> Perhaps add it to Bio::SeqUtils?
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Fri Jun 26 08:30:12 2009
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Fri, 26 Jun 2009 10:30:12 +0200 (CEST)
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <Pine.LNX.4.44.0906261028110.14978-100000@bibo.EMBL-Heidelberg.DE>

a non-bioperl option would be to use something external like seq-gen or 
similar - tools designed for outputing "random" sequences simulated over a 
tree - one could simply sample a single simulated sequence at random from 
the output alignment

On Fri, 26 Jun 2009, Roger Hall wrote:

> All,
>  Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>  
> I noticed a thread about it from 2000 and nothing since (searching for "random sequence").
>  
> If not - what should the namespace be for such a module should it be undone and desirable? 
>  
> TIA!
>  
> Roger 
>  
>  
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
----------------------------------------------------------------------
Aidan Budd                                    tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

http://www.embl-heidelberg.de/~budd/
http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html


From me at miguel.weapps.com  Fri Jun 26 08:52:46 2009
From: me at miguel.weapps.com (=?ISO-8859-1?Q?Luis_Miguel_Rodr=EDguez?=)
Date: Fri, 26 Jun 2009 10:52:46 +0200
Subject: [Bioperl-l] Random nucleotide string generator?
In-Reply-To: <fc2dd7b3461f.4a442425@ualr.edu>
References: <fc2dd7b3461f.4a442425@ualr.edu>
Message-ID: <94da4c880906260152k3a764951u6ea8a6fdfa3b7f2c@mail.gmail.com>

Dear all, dear Roger,
I'm not sure if there is such generator (I think so).  Anyway, if you flag
it as "undone and desirable", please take into account the possibility of
extend the generator for dinucleotides, particularly useful when working
with secondary structure of RNA molecules,

Cheers,

On Fri, Jun 26, 2009 at 8:28 AM, Roger Hall <rahall2 at ualr.edu> wrote:

> All,
>
> Is there a random generator for creating nucleotides (of length l with
> composition frequencies a, c, g, and t) in there somewhere?
>
> I noticed a thread about it from 2000 and nothing since (searching for
> "random sequence").
>
> If not - what should the namespace be for such a module should it be undone
> and desirable?
>
> TIA!
>
> Roger
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Luis M. Rodriguez-R
[http://bioinf.uniandes.edu.co/~miguel/]
---------------------------------
Unidad de Bioinform?tica del Laboratorio de Micolog?a y Fitopatolog?a
Universidad de Los Andes, Colombia
[http://bioinf.uniandes.edu.co]


+ 57 1 3394949 ext 2619
lmrodriguezr at gmail.com
me at miguel.weapps.com


From pri2darshini at gmail.com  Fri Jun 26 10:18:55 2009
From: pri2darshini at gmail.com (priya darshini)
Date: Fri, 26 Jun 2009 15:48:55 +0530
Subject: [Bioperl-l] bioperl installation
Message-ID: <7c569a160906260318t5611fdd8nd536ae5139f5b1d4@mail.gmail.com>

Respected Sir,
                    I am K.Lakshmi priya Darshini. My specialization is M.Sc
bioinformatics. I am interseted in learning bioperl. My operating system is
windows Vista. I have followed the steps to install bioperl as given by your
team in the bioperl tutorial. But i am getting the error message as *"Begin
failed".Sir please help me to continue with my installation further. I am
using 5.10 version of perl.Waithing for your reply.*
* thanking you.*
*                  *
**
*regards,*
*lakshmi priya darshini.*


From Jonathan.Moore at warwick.ac.uk  Fri Jun 26 09:55:54 2009
From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan)
Date: Fri, 26 Jun 2009 10:55:54 +0100
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>

I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML files at the TAIR FTP site.

I've tried SeqIO with both tigr and tigrxml formats but both are giving errors in 1.6.0.  Has anyone advice on whether it's likely to be doable, or should I wait til the .gb files are available?

Jay Moore


From fungazid at yahoo.com  Fri Jun 26 11:59:06 2009
From: fungazid at yahoo.com (Fungazid)
Date: Fri, 26 Jun 2009 04:59:06 -0700 (PDT)
Subject: [Bioperl-l] Bio::Assembly::IO
Message-ID: <57633.49243.qm@web65505.mail.ac4.yahoo.com>


Hello,

I received an ACE file containing newbler assembly of 454 cDNA reads, and a corresponding phd.ball file. I was able to view and manipulate the contigs in this assembly using Consed on linux. Consed required ~1.5GB RAM, and the assembly was loaded within ~2 min. 
I would like to parse the assembly within my code (preferentially in Perl, but not necessarily), to fetch all read sequences for each contig, nucleotide quality, alignment to consensus, etc. 
I am trying to use Bio::Assembly::IO , but it eats more than my entire RAM (3GB), and is extremely slow (~1 hour before it crashes).
Maybe you have an idea ?
In addition, do you maybe aware of other non-visual parsers of ACE assembly format for Perl or other languages

Many thanks,
funazid   


From cjfields at illinois.edu  Fri Jun 26 17:00:41 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 12:00:41 -0500
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <FEC1932A-49FE-4E63-9727-F08520FF0252@illinois.edu>

If there are errors this should be submitted as a bug.  You should  
attach example data to the report after it (e.g. don't copy&paste into  
the text box).

http://www.bioperl.org/wiki/Bugs

chris

On Jun 26, 2009, at 4:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From plantboy at gmail.com  Fri Jun 26 18:46:35 2009
From: plantboy at gmail.com (cody h)
Date: Fri, 26 Jun 2009 11:46:35 -0700
Subject: [Bioperl-l] test suite failing on mac os x 10.5
Message-ID: <320708320906261146v2e799c82mc1b921218fc233c5@mail.gmail.com>

Hi,

I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7.
The Build.PL file executes fine, but the test suite fails dramatically,
returning the error "No database selected" for many of the tests. All the
error calls seem to be originating from line 852 in
BasePersistenceAdaptor.pm. I took a look at the code but I could not figure
out why it wasn't working.

I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql
server. The dependencies all seem to be working, but I haven't used them
enough to completely verify this, so that could be part of the problem. I
don't know which ones to check though. Does anyone have any idea why I might
be getting these "No database selected" errors? Here is a sample of the
error messages given by the ./Build test command (note, this same error is
generated byt 15/16 test files)

t/12ontology.t .... 1/738
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: error while executing statement in
Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::Persistent::PersistentObject::create
/Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244
STACK: t/12ontology.t:44
-----------------------------------------------------------
t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00)


From maj at fortinbras.us  Fri Jun 26 18:50:02 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 26 Jun 2009 14:50:02 -0400
Subject: [Bioperl-l] Fw: Inquiry about a prog written by [MAJ]
Message-ID: <0581B2DAE8514F418127D54407384905@NewLife>

Thought this should be archived to the list. 
MAJ

----- Original Message ----- 
From: Mark A. Jensen 
To: Ross KK Leung 
Sent: Thursday, June 25, 2009 8:46 AM
Subject: Re: Inquiry about a prog written by you


Hi Ross-
Yes, you can specify the recombinants, as "A/C/G[subtype]" in the query string. Unfortunately, the 10000 record limit is imposed by the Los Alamos site that my program accesses. You might be able to work around this if you're willing to write your own script using the BioPerl modules that are the basis for the hivq.PLS -- by using the modules to perform multiple queries, and collecting the the entire set of sequences over that series of queries. 
You might look at the documentation for the modules for ideas; try looking at http://www.bioperl.org/wiki/Module:Bio::DB::HIV and http://www.bioperl.org/wiki/Module:Bio::DB::Query::HIVQuery . 
best regards- 
Mark
  ----- Original Message ----- 
  From: Ross KK Leung 
  To: maj at fortinbras.us 
  Sent: Thursday, June 25, 2009 6:09 AM
  Subject: Inquiry about a prog written by you


  Dear Mark A. Jensen,

   
  A google search returns your program (http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/DB-HIV/hivq.PLS)

   
  I wonder whether the program is able to search recombinants (e.g. B incl. recombinants) and retrieve results more than 50000 records. This limitation is a bottleneck by the web-based search.

   
  Thanks for your advice, Ross


From rmb32 at cornell.edu  Fri Jun 26 21:06:06 2009
From: rmb32 at cornell.edu (Robert Buels)
Date: Fri, 26 Jun 2009 14:06:06 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
Message-ID: <4A45383E.40207@cornell.edu>

Reposting to bioperl list.

This is a really giant opportunity to expose some of the best 
technologists in the world to what we do in bioinformatics, and possibly 
to entice some of them to help us the heck out!  ;-)

Rob

On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
> University.  Can you offer any lecturer recommendations and could I 
> fill an entire multi day thread with BioPerl lectures?  I would also 
> like to "entice" MJD to come to YAPC with the use of BioPerl.
>
> Thanks for your thoughts.
>
> Heath Bair
> (Candybar)

-- 
Robert Buels
Bioinformatics Analyst, Sol Genomics Network
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca, NY  14853
Tel: 503-889-8539
rmb32 at cornell.edu
http://www.sgn.cornell.edu


From cain.cshl at gmail.com  Fri Jun 26 21:12:37 2009
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 26 Jun 2009 17:12:37 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <D2A53AB2-E35A-499B-B81A-13B9D61752CA@gmail.com>

Cool--Columbus is just down the road.  I could give a talk (or even  
multiple talks) on a variety of GMOD topics (which I consider BioPerl  
related, since so much of what we do depends on BioPerl).

Scott

On Jun 26, 2009, at 5:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Fri Jun 26 21:49:39 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 16:49:39 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <642C6C93-8FCD-4463-8A39-E15832F8714C@illinois.edu>

Well, if it's in Columbus I'll be there (I can make a drive out of it).

In short, we should probably get something going, yes. Lots of things  
we can talk about, inc. bioperl6, Bio::Moose, etc.

chris

On Jun 26, 2009, at 4:06 PM, Robert Buels wrote:

> Reposting to bioperl list.
>
> This is a really giant opportunity to expose some of the best  
> technologists in the world to what we do in bioinformatics, and  
> possibly to entice some of them to help us the heck out!  ;-)
>
> Rob
>
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would  
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State  
>> University.  Can you offer any lecturer recommendations and could I  
>> fill an entire multi day thread with BioPerl lectures?  I would  
>> also like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
>
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Jun 27 03:59:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 26 Jun 2009 20:59:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <19013.39182.97468.604560@already.dhcp.gene.com>


This does seems like a great opportunity.  I think you/the-community
could put together at least a day, and maybe more, of Bio and Perl
stuff.  I think that it's important to range beyond the stuff that's
in the BioPerl namespace and pull in something from the Gene Ontology
project, the Ensembl project[s], maybe libbio, etc....

g.

Robert Buels writes:
 > Reposting to bioperl list.
 > 
 > This is a really giant opportunity to expose some of the best 
 > technologists in the world to what we do in bioinformatics, and possibly 
 > to entice some of them to help us the heck out!  ;-)
 > 
 > Rob
 > 
 > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > > I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > > like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > > University.  Can you offer any lecturer recommendations and could I 
 > > fill an entire multi day thread with BioPerl lectures?  I would also 
 > > like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >
 > > Thanks for your thoughts.
 > >
 > > Heath Bair
 > > (Candybar)
 > 
 > -- 
 > Robert Buels
 > Bioinformatics Analyst, Sol Genomics Network
 > Boyce Thompson Institute for Plant Research
 > Tower Rd
 > Ithaca, NY  14853
 > Tel: 503-889-8539
 > rmb32 at cornell.edu
 > http://www.sgn.cornell.edu
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From cjfields at illinois.edu  Sat Jun 27 04:28:14 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 26 Jun 2009 23:28:14 -0500
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <19013.39182.97468.604560@already.dhcp.gene.com>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<19013.39182.97468.604560@already.dhcp.gene.com>
Message-ID: <EB3EB763-05F4-4F75-88F5-8A642E567ABA@illinois.edu>

Agree (and should add GMOD/Gbrowse to that as well).

chris

On Jun 26, 2009, at 10:59 PM, George Hartzell wrote:

>
> This does seems like a great opportunity.  I think you/the-community
> could put together at least a day, and maybe more, of Bio and Perl
> stuff.  I think that it's important to range beyond the stuff that's
> in the BioPerl namespace and pull in something from the Gene Ontology
> project, the Ensembl project[s], maybe libbio, etc....
>
> g.
>
> Robert Buels writes:
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best
>> technologists in the world to what we do in bioinformatics, and  
>> possibly
>> to entice some of them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would
>>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State
>>> University.  Can you offer any lecturer recommendations and could I
>>> fill an entire multi day thread with BioPerl lectures?  I would also
>>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sat Jun 27 04:56:41 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 00:56:41 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <4A45383E.40207@cornell.edu>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
Message-ID: <E6D907E51B8D477FBB635ED4B500C257@NewLife>

I think BioPerl has enough to talk about to have its own conference, 
which would coincide with its 15th anniversary in 2010. That may 
put the kibosh on the original  intent of the inviter, which ultimately is 
to get The Dominus to bite (and more power to her, I say. My 
programming style is forever changed, and I haven't even finished
The Book). 

If someone organizes it, I'll bring the chips and dip.
MAJ
----- Original Message ----- 
From: "Robert Buels" <rmb32 at cornell.edu>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Cc: <BAIRH at nationwide.com>
Sent: Friday, June 26, 2009 5:06 PM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


> Reposting to bioperl list.
> 
> This is a really giant opportunity to expose some of the best 
> technologists in the world to what we do in bioinformatics, and possibly 
> to entice some of them to help us the heck out!  ;-)
> 
> Rob
> 
> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
>> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
>> University.  Can you offer any lecturer recommendations and could I 
>> fill an entire multi day thread with BioPerl lectures?  I would also 
>> like to "entice" MJD to come to YAPC with the use of BioPerl.
>>
>> Thanks for your thoughts.
>>
>> Heath Bair
>> (Candybar)
> 
> -- 
> Robert Buels
> Bioinformatics Analyst, Sol Genomics Network
> Boyce Thompson Institute for Plant Research
> Tower Rd
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From maj at fortinbras.us  Sat Jun 27 05:30:34 2009
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 27 Jun 2009 01:30:34 -0400
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net><33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net><4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <B44649FB157145A3BE7153D163802926@NewLife>

[...to *him*, that is...pardon]

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Robert Buels" <rmb32 at cornell.edu>; "BioPerl List" 
<bioperl-l at lists.open-bio.org>
Sent: Saturday, June 27, 2009 12:56 AM
Subject: Re: [Bioperl-l] BioPerl at YAPC::2010


>I think BioPerl has enough to talk about to have its own conference, which 
>would coincide with its 15th anniversary in 2010. That may put the kibosh on 
>the original  intent of the inviter, which ultimately is to get The Dominus to 
>bite (and more power to her, I say. My programming style is forever changed, 
>and I haven't even finished
> The Book).
> If someone organizes it, I'll bring the chips and dip.
> MAJ
> ----- Original Message ----- 
> From: "Robert Buels" <rmb32 at cornell.edu>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Cc: <BAIRH at nationwide.com>
> Sent: Friday, June 26, 2009 5:06 PM
> Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
>
>
>> Reposting to bioperl list.
>>
>> This is a really giant opportunity to expose some of the best technologists 
>> in the world to what we do in bioinformatics, and possibly to entice some of 
>> them to help us the heck out!  ;-)
>>
>> Rob
>>
>> On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
>>> I am the Columbus.PM YAPC::2010 conference coordinator and I would like to 
>>> have a "BioPerl" thread at YAPC::NA::2010 at Ohio State University.  Can you 
>>> offer any lecturer recommendations and could I fill an entire multi day 
>>> thread with BioPerl lectures?  I would also like to "entice" MJD to come to 
>>> YAPC with the use of BioPerl.
>>>
>>> Thanks for your thoughts.
>>>
>>> Heath Bair
>>> (Candybar)
>>
>> -- 
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From kpclancy at hotmail.com  Sat Jun 27 10:04:20 2009
From: kpclancy at hotmail.com (Kevin Clancy)
Date: Sat, 27 Jun 2009 04:04:20 -0600
Subject: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
In-Reply-To: <02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
References: <8F66D28E-CD28-4A8D-83B2-0FFCA611F58F@jays.net>
	<54655E2F-0D4B-4F4D-8608-13BB052FE201@jays.net>
	<1a0c1b750906240913g5e113dadxe3d98d3fce3a74d@mail.gmail.com>
	<F01E3C47-6651-488C-91F6-73F42A5A65F4@illinois.edu>
	<FEF61DB9-34A6-4884-BCAF-1CAC83E809F1@jays.net>
	<20E946BE-D9E1-4815-991A-7817E0655879@illinois.edu>
	<COL107-W17352BCFE4B6CE766EF2C3CE340@phx.gbl>
	<02C22CF3-4AAD-4B4D-8F54-2999998A34E0@illinois.edu>
Message-ID: <COL107-W978FB7B4A3E98561F84E5CE320@phx.gbl>


I think ismb will be in Boston in 2010 (feels odd just typing that...)

maybe that is enough of a running start to set something up.

kevin
 
> CC: jay at jays.net; vecchi.b at gmail.com; bioperl-l at bioperl.org
> From: cjfields at illinois.edu
> To: kpclancy at hotmail.com
> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> Date: Wed, 24 Jun 2009 22:54:28 -0500
> 
> I have no idea; I don't think there are many bioperl devs attending 
> this year unfortunately. Any meetings in the next year where we could 
> set up a bioperl hackathon? I will likely be available to attend if 
> it's stateside...
> 
> chris
> 
> On Jun 24, 2009, at 9:31 PM, Kevin Clancy wrote:
> 
> >
> > is there an intention to have a hackathon at ISMB this weekend - I 
> > know there is a 2 day BOSC
> > kevin
> >
> >> From: cjfields at illinois.edu
> >> To: jay at jays.net
> >> Date: Wed, 24 Jun 2009 16:10:34 -0500
> >> CC: vecchi.b at gmail.com; bioperl-l at bioperl.org
> >> Subject: Re: [Bioperl-l] Fwd: Hackathon tomorrow (I think)
> >>
> >>
> >> On Jun 24, 2009, at 11:44 AM, Jay Hannah wrote:
> >>
> >>> On Jun 24, 2009, at 12:26 PM, Chris Fields wrote:
> >>>> Let me know if anyone needs collab on biomoose on github; Mark
> >>>> Jensen's already added.
> >>>
> >>> Anything on github should be trivial, even with no perms -- we can
> >>> just fork and then send you (whoever) pull requests. github++ :)
> >>>
> >>>> 1) Any help towards bugzilla fixes would be most welcome.
> >>>
> >>> I don't know how to make any progress in bugzilla if no one has a
> >>> commit bit...?
> >>
> >> For some reason I thought you had a commit bit; we can add you in if
> >> needed. Anyway, patches are most definitely welcome ;>
> >>
> >>>> 2) Better GFF3 integration
> >>>> 3) Typed but lightweight seqfeatures
> >>>
> >>> Are there bugzilla tickets (or somewhere) describing those?
> >>
> >> No as the issues are more complex than one single bug, but we do have
> >> something to help track for the time being:
> >>
> >> http://www.bioperl.org/wiki/GFF_Refactor
> >> http://www.bioperl.org/wiki/Align_Refactor
> >>
> >> I'll probably file TODOs during the process for those refactors. The
> >> easiest to tackle would be probably be Align/LocatableSeq refactors.
> >>
> >>> I wonder if anyone can help me get out of sporadic MailMan
> >>> purgatory...
> >>>
> >>> Thanks,
> >>>
> >>> j
> >>
> >> -c
> >>
> >> PS - Don't feel constrained by the above. There are many many areas
> >> to contribute to.
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hartzell at alerce.com  Sat Jun 27 17:08:10 2009
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 27 Jun 2009 10:08:10 -0700
Subject: [Bioperl-l] BioPerl at YAPC::2010
In-Reply-To: <E6D907E51B8D477FBB635ED4B500C257@NewLife>
References: <OF04ACE416.A66212B4-ON852575E1.0068C346-852575E1.006A8246@lnotes-gw.ent.nwie.net>
	<33722D87-C60D-4040-AA58-D8D2DCBD5448@jays.net>
	<4A45383E.40207@cornell.edu>
	<E6D907E51B8D477FBB635ED4B500C257@NewLife>
Message-ID: <19014.20986.867646.940277@already.dhcp.gene.com>


I had an eye-opening time at YAPC, and I think that it would be very
powerful to have many members of the Bio & Perl community rubbing
elbows with the folks leading (and following, for that matter) the
"Modern Perl" movement (in the broader sense, not _just_ chromatic):
Moose, DBIx::Class, Dist::Zilla, KiokoDB, etc....  I think that it
would help pull BioPerl and the others towards powerful mainstream
technologies and expose many of us to new people, tricks, and tools.
Having us off on our own, or mingling with ISMB'ers, doesn't really
stir the pot.

g.


Mark A. Jensen writes:
 > I think BioPerl has enough to talk about to have its own conference, 
 > which would coincide with its 15th anniversary in 2010. That may 
 > put the kibosh on the original  intent of the inviter, which ultimately is 
 > to get The Dominus to bite (and more power to her, I say. My 
 > programming style is forever changed, and I haven't even finished
 > The Book). 
 > 
 > If someone organizes it, I'll bring the chips and dip.
 > MAJ
 > ----- Original Message ----- 
 > From: "Robert Buels" <rmb32 at cornell.edu>
 > To: "BioPerl List" <bioperl-l at lists.open-bio.org>
 > Cc: <BAIRH at nationwide.com>
 > Sent: Friday, June 26, 2009 5:06 PM
 > Subject: Re: [Bioperl-l] BioPerl at YAPC::2010
 > 
 > 
 > > Reposting to bioperl list.
 > > 
 > > This is a really giant opportunity to expose some of the best 
 > > technologists in the world to what we do in bioinformatics, and possibly 
 > > to entice some of them to help us the heck out!  ;-)
 > > 
 > > Rob
 > > 
 > > On Jun 26, 2009, at 3:22 PM, BAIRH at nationwide.com wrote:
 > >> I am the Columbus.PM YAPC::2010 conference coordinator and I would 
 > >> like to have a "BioPerl" thread at YAPC::NA::2010 at Ohio State 
 > >> University.  Can you offer any lecturer recommendations and could I 
 > >> fill an entire multi day thread with BioPerl lectures?  I would also 
 > >> like to "entice" MJD to come to YAPC with the use of BioPerl.
 > >>
 > >> Thanks for your thoughts.
 > >>
 > >> Heath Bair
 > >> (Candybar)
 > > 
 > > -- 
 > > Robert Buels
 > > Bioinformatics Analyst, Sol Genomics Network
 > > Boyce Thompson Institute for Plant Research
 > > Tower Rd
 > > Ithaca, NY  14853
 > > Tel: 503-889-8539
 > > rmb32 at cornell.edu
 > > http://www.sgn.cornell.edu
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l at lists.open-bio.org
 > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > > 
 > >
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l at lists.open-bio.org
 > http://lists.open-bio.org/mailman/listinfo/bioperl-l
 > 
 > 
 > 


From richard.harrison at edinburgh.ac.uk  Mon Jun 29 22:43:54 2009
From: richard.harrison at edinburgh.ac.uk (Richard Harrison)
Date: Mon, 29 Jun 2009 23:43:54 +0100
Subject: [Bioperl-l] PopGen
Message-ID: <5FBB6056-386D-42E3-8236-1FEB8F5BE520@edinburgh.ac.uk>

Dear all,

I am having trouble with the PopGen modules and I was wondering if  
anyone had any ideas.

I am working with polymorphism data. I am trying to identify the  
derived vs ancestral allele between two species. I have been modifying  
the modules a bit to include different site models etc.  Here is where  
I fall over:

Within aln_to_population I can create a modified Genotype object to  
include details of the ancestral allele (see at end of this post).

However,  the problem that I have hit upon is that aln_to_population  
returns a population object, filled with IndividualI objects.  In  
other words, it takes my array of GenotypeI objects and converts them  
into IndividualI objects, wrapped in a single Population object.  This  
means that the information in the GenotypeI object about the ancestral/ 
derived states is lost. How can I overcome this?


Thanks,
Richard


###excerpt from aln_to_population


  $inds[$i]->add_Genotype(Bio::PopGen::Genotype->new
					   (-marker_name  => $nm,
					    -individual_id=> $inds[$i]->unique_id,
					    -alleles      => [$genotypes[$i]],
					    -outgroup      => $outgroup[0]));


###excerpt from Genotypes.pm

sub new {
   my($class, at args) = @_;

   my $self = $class->SUPER::new(@args);
   my ($name,$desc,$type,$uid,$af,$og) = $self->_rearrange([qw(NAME
							  DESCRIPTION
							  TYPE
							  UNIQUE_ID
							  ALLELE_FREQ
							  OUTGROUP)], at args);
   $self->{'_allele_freqs'} = {};
   $self->{'_outgroup_name'} = {};

   if( ! defined $uid ) {
       $uid = $UniqueCounter++;
   }
   if( defined $name) {
       $self->name($name);
   } else {
       $self->throw("Must provide a name when initializing a Marker");
   }
   defined $desc && $self->description($desc);
   defined $type && $self->type($type);


       $self->outgroup_name($og);


   $self->unique_id($uid);

   return $self;
}

=head2 og
  Title   : name
  Usage   : my $name = $marker->og();
  Function: Get the name of the outgroup
  Returns : string representing the name of the marker
  Args    : [optional] name


=cut

sub outgroup_name{
     my $self = shift;

     return $self->{'_outgroup_name'} = shift if @_;
     return $self->{'_outgroup_name'};
}


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Tue Jun 30 05:03:08 2009
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 29 Jun 2009 22:03:08 -0700
Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO
In-Reply-To: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk>
Message-ID: <E6D82027-AF55-4E64-BC8F-71F3F60D0E7E@bioperl.org>

There are several flavors of TIGR XML for rice and arabidoposis, and  
other projects etc, I don't know which is tracked with the current  
tigrxml version unfortunately but one can compare the test files in t/ 
data to the versions downloaded to see what is currently supported.   
Usually the gbk will be more consistently parseable but we can try and  
work it out if it is a sensible transformation.

On Jun 26, 2009, at 2:55 AM, Moore, Jonathan wrote:

> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML  
> files at the TAIR FTP site.
>
> I've tried SeqIO with both tigr and tigrxml formats but both are  
> giving errors in 1.6.0.  Has anyone advice on whether it's likely to  
> be doable, or should I wait til the .gb files are available?
>
> Jay Moore
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org


From paola.bisignano at gmail.com  Tue Jun 30 09:12:49 2009
From: paola.bisignano at gmail.com (Paola Bisignano)
Date: Tue, 30 Jun 2009 11:12:49 +0200
Subject: [Bioperl-l] Bioperl-l Digest, Vol 74, Issue 25
In-Reply-To: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
References: <mailman.29.1246204805.13888.bioperl-l@lists.open-bio.org>
Message-ID: <e9cf89740906300212jeac3fe6o2fa4414bb427f824@mail.gmail.com>

Hi,
I need a little help, to parse a file, but I tried to search some
modules of bioperl, but there are a lot, and I don't know how to
start, I find moduls for all db, for different web site, but not for
my favorite PDBsum....so I parsed a lot of thing on my own, even if I
was new in learning perl....but now I'm waiting for help...because I
need to parse a FASTA file, resulted from aligned sequences...I need
to extract the aligned sequences, only for the pdb in my lista....


my fasta file is like:

Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
  1>>>Sequence 3e7e:A - 333 aa
Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
17840403 residues in 79353 sequences

       opt      E()
< 20   286     0:===
  22     1     0:=          one = represents 135 library sequences
  24     1     0:=
  26     0     2:*
  28    21    18:*
  30    36   109:*
  32   237   421:== *
  34   956  1140:========*
  36  1924  2342:===============  *
  38  3591  3871:=========================== *
  40  4904  5400:=====================================  *
  42  6750  6600:================================================*=
  44  7145  7281:=====================================================*
  46  8047  7416:======================================================*=====
.........

>>2np8:A                                                  (159 aa)
 initn: 125 init1:  72 opt: 136  Z-score: 168.6  bits: 38.5 E(): 0.011
Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
overlap (59-204:13-153)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                                 ::
2np8:A                                               QWALEDFEIGRPLG
                                                             10

               70          80        90         100        110
Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
       .: :..:: : ....::.:  ::   :.  .  .  :: ..  ..  ..:  ....:.
2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
           20        30        40        50        60        70

         120         130       140       150       160       170
Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
        :....   :. :    ::.   ..  ..  :.      . ..  ..   .   :. ..:
2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
             80        90       100            110       120

           180       190        200       210       220       230
Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
       : ::::.:..::      ::: : . :.: :.
2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
       130             140       150

            240       250       260       270       280       290
Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP

            300       310       320       330
Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

>>2ojg:A                                                  (337 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.1  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:1-204)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2ojg:A                                              FDVGPRYTNLSYI-G
                                                            10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
           20        30         40        50             60

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
       70        80        90        100       110       120

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
       130       140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
            190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
            250       260       270       280       290       300

2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
            310       320       330

>>2oji:A                                                  (344 aa)
 initn:  85 init1:  53 opt: 140  Z-score: 168.0  bits: 39.5 E(): 0.012
Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
overlap (46-252:5-208)

               10        20        30        40        50        60
Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
                                                    :..: . . . .. :
2oji:A                                          RGQVFDVGPRYTNLSYI-G
                                                        10

               70        80        90        100       110
Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
       :::...:  : .: .:  . ..:  .:.:     :  ....:     ....:   ...
2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
       20        30        40         50             60        70

     120              130       140       150       160       170
Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
       ....       . ..:    :... .:::    . . .  .  : ...:  .. .:. ..
2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
             80        90        100       110       120       130

            180       190       200        210       220        230
Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
       .: :.::.:..:..     .  : . :.: . .      .  ..:    :  ..  : ::
2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
             140            150       160       170       180

              240       250       260       270       280       290
Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
       ..: .. .:: ..:.  .  ::
2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
        190       200       210       220       230       240

              300       310       320       330
Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC

2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
        250       260       270       280       290       300

2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
        310       320       330       340

.......
I show a part of the file...if I want for example only that two
alignment? are there moduls to parse...because I've tried to parse
whit regex but....without results :-(....
If anyone has suggestion for muduls or anything else, I'll be very
happy to learn
thanks
Paola


From giles.weaver at googlemail.com  Tue Jun 30 11:28:25 2009
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Tue, 30 Jun 2009 12:28:25 +0100
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <200906170927.13273.tristan.lefebure@gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
Message-ID: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>

I'm developing a transcriptomics database for use with next-gen data, and
have found processing the raw data to be a big hurdle.

I'm a bit late in responding to this thread, so most issues have already
been discussed. One thing that hasn't been mentioned is removal of adapters
from raw Illumina sequence. This is a PITA, and I'm not aware of any well
developed and documented open source software for removal of adapters (and
poor quality sequence) from Illumina reads.

My current Illumina sequence processing pipeline is an unholy mix of
biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting
the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure
perl to trim the poor quality sequence from each read, and bioperl with
emboss to remove the adapter sequence. I'm aware that the pipeline contains
bugs and would like to simplify it, but at least it does work...

Ideally I'd like to replace as much of the pipeline as possible with
bioperl/bioperl-run, but this isn't currently possible due to both a lack of
features and poor performance. I'm sure the features will come with time,
but the performance is more of a concern to me. I wonder if Bio::Moose might
be used to alleviate some of the performance issues? Might next-gen modules
be an ideal guinea pig for Bio::Moose?

For my purposes the tools that would love to see supported in
bioperl/bioperl-run are:

   - next-gen sequence quality parsing (to output phred scores)
   - sequence quality based trimming
   - sequencing adapter removal
   - filtering based on sequence complexity (repeats, entropy etc)
   - bioperl-run modules for bowtie etc.

Obviously all of these need to be fast!
I'd love to muck in, but I doubt I'll contribute much before
Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares!

Regarding trimming bad quality bases (see comments from Tristan Lefebure)
from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be
much faster than a primarily bioperl based implementation. I found
Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My
current code trims ~1300 sequences/second, including unzipping the raw data
and converting it to sanger fastq with biopython. Processing an entire
sequencing run with the whole pipeline takes in the region of 6-12h.

Hope this looooong post was of interest to someone!

Giles

2009/6/17 Tristan Lefebure <tristan.lefebure at gmail.com>

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).
>
> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan
>


From manchunjohn-ma at uiowa.edu  Tue Jun 30 16:17:08 2009
From: manchunjohn-ma at uiowa.edu (John M.C. Ma)
Date: Tue, 30 Jun 2009 11:17:08 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RepeatMasker crashes perl
Message-ID: <5486b2980906300917m20e8cd06sbaee207aed3a27c9@mail.gmail.com>

Hi everyone,

(OS: OpenSuSE 11.1, Versions: Perl:v5.10.0-i586-linux-thread-multi,
Bioperl: 1.6.0-cpan, Bioperl-run: 1.6.1-cpan, Ensembl: Ver 54-cvs)

This is the first time I use Bio::Tools::Run::RepeatMasker, and it
came with a strange crash that I can't think of a reason. I would
rather think it's my problem?

My code involved pulling a sequence from Ensembl-variation, put it
into a PrimarySeq Object and run RepeatMasker on it:

use strict;
use warnings;
use Bio::SeqIO;
use Bio::PrimarySeq;
use Bio::Tools::Run::RepeatMasker;
use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Variation::Variation;
[snips most Ensembl code as the sequence itself looks OK]
	my $ref_allele=$snp_obj->five_prime_flanking_seq.${$snp_obj->get_all_Alleles}[0]->allele.$snp_obj->three_prime_flanking_seq;
	my $mask_seq=Bio::PrimarySeq->new (-seq=>$ref_allele);
	my $rmasker_handle=Bio::Tools::Run::RepeatMasker->new(-species=>'rat',-noisy=>"1");
	my @masked_features=$rmasker_handle->run($mask_seq);
	my $masked_seq=$rmasker_handle->run;

And when I let the wrapper run, perl crashed with these warnings:

--------------------- WARNING ---------------------
MSG: RepeatMasker didn't find any repetitive sequences

---------------------------------------------------

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not open /tmp/EWLAmIVymd/wByClB8iqr.masked: No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357
STACK: Bio::Root::IO::_initialize_io
/usr/lib/perl5/site_perl/5.10.0/Bio/Root/IO.pm:310
STACK: Bio::SeqIO::_initialize /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:450
STACK: Bio::SeqIO::fasta::_initialize
/usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO/fasta.pm:81
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:347
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.10.0/Bio/SeqIO.pm:373
STACK: Bio::Tools::Run::RepeatMasker::_run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:320
STACK: Bio::Tools::Run::RepeatMasker::run
/usr/lib/perl5/site_perl/5.10.0/Bio/Tools/Run/RepeatMasker.pm:260
STACK: main::SeqList
/home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:40
STACK: /home/johnma/workspace/TaqMan_SNP_Order/TaqMan_SNP_Order.pl:63
-----------------------------------------------------------

What could happen?

Cheers,

John Ma,
University of Iowa


From cjfields at illinois.edu  Tue Jun 30 17:46:27 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 12:46:27 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<92C15E3391F64BAF801754E924122540@NewLife>
	<200906170927.13273.tristan.lefebure@gmail.com>
	<1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com>
Message-ID: <6723B5A0-9A21-4851-BD88-0BA3CC107439@illinois.edu>


On Jun 30, 2009, at 6:28 AM, Giles Weaver wrote:

> I'm developing a transcriptomics database for use with next-gen  
> data, and
> have found processing the raw data to be a big hurdle.
>
> I'm a bit late in responding to this thread, so most issues have  
> already
> been discussed. One thing that hasn't been mentioned is removal of  
> adapters
> from raw Illumina sequence. This is a PITA, and I'm not aware of any  
> well
> developed and documented open source software for removal of  
> adapters (and
> poor quality sequence) from Illumina reads.
>
> My current Illumina sequence processing pipeline is an unholy mix of
> biopython, bioperl, pure perl, emboss and bowtie. Biopython for  
> converting
> the Illumina fastq to Sanger fastq, bioperl to read the quality  
> values, pure
> perl to trim the poor quality sequence from each read, and bioperl  
> with
> emboss to remove the adapter sequence. I'm aware that the pipeline  
> contains
> bugs and would like to simplify it, but at least it does work...

My local bioperl is working with FASTQ parsing of Sanger and Illumina  
(but not solexa yet).  I'll commit what I have today, and we should be  
able to add in solexa soon.  We'll also need to add in write_seq  
support.

> Ideally I'd like to replace as much of the pipeline as possible with
> bioperl/bioperl-run, but this isn't currently possible due to both a  
> lack of
> features and poor performance. I'm sure the features will come with  
> time,
> but the performance is more of a concern to me. I wonder if  
> Bio::Moose might
> be used to alleviate some of the performance issues? Might next-gen  
> modules
> be an ideal guinea pig for Bio::Moose?

We should get FASTQ working in core first then optimize on speed (as  
Elia previously pointed out).  We can do that within the actual SeqIO  
parser using a few simple tricks. For instance my local  
Bio::SeqIO::fastq has a reconfigured next_seq to call an iterator that  
returns raw processed data as a simple hash ref; users have access to  
that method, so if one wanted they could retrieve the raw data  
directly, or pass it through a filter that only creates seq instances  
one wants on the fly (that would be where your quality checks, adaptor  
modification, etc. fit in).

In the end it might be to wrap a C/C++-based solution for speed.  As  
mentioned previously a C-based parser exists from Sanger Centre that  
we could incorporate in some fashion, but I would like if it were able  
to report back file position for fast indexing.  The code is fairly  
simple so it should be too hard to incorporate that in somehow.

Just so there is no confusion, Bio::Moose is an attempt to both lay  
out plans for perl6 and deal with inheritance issues within bioperl  
now. It's still in very early development and may not see a release  
until Dec. at the very earliest, it will be an alpha release then, and  
likely won't have every major class represented at that point.  It's  
also not intended to be backwards-compatible with bioperl core.  It  
may help, but that's not an absolute certainty.  As for bioperl6, it  
will be pre-alpha until perl6 spec reaches a stable draft and we have  
an active implementation.

> For my purposes the tools that would love to see supported in
> bioperl/bioperl-run are:
>
>   - next-gen sequence quality parsing (to output phred scores)
>   - sequence quality based trimming
>   - sequencing adapter removal
>   - filtering based on sequence complexity (repeats, entropy etc)
>   - bioperl-run modules for bowtie etc.
>
> Obviously all of these need to be fast!
> I'd love to muck in, but I doubt I'll contribute much before
> Bio::Moose/bioperl6, as the (bio)perl object system gives me  
> nightmares!

One can only read a file so fast (even with a highly optimized C/C++  
based parser), but I don't think that will be the limiting factor as  
much as object instantiation.

> Regarding trimming bad quality bases (see comments from Tristan  
> Lefebure)
> from Solexa/Illumina reads, I did find a mixed pure/bioperl solution  
> to be
> much faster than a primarily bioperl based implementation. I found
> Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow.  
> My
> current code trims ~1300 sequences/second, including unzipping the  
> raw data
> and converting it to sanger fastq with biopython. Processing an entire
> sequencing run with the whole pipeline takes in the region of 6-12h.

Right, hence coming up with a 'pre-filter' for raw data (hash refs)  
prior to object instantiation to speed things up.  This will be a bit  
easier with Bio::Moose as we can introspect attributes via the meta  
class, but this will be a while yet.

> Hope this looooong post was of interest to someone!
>
> Giles

It's always good to hear about such issues and what one expects.

chris


From cjfields at illinois.edu  Tue Jun 30 21:58:57 2009
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 30 Jun 2009 16:58:57 -0500
Subject: [Bioperl-l] Next-gen modules
In-Reply-To: <4A42AC51.3090809@ebi.ac.uk>
References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk>
	<320fb6e00906220724q5461d2c4veb88c9b24b9bb2f9@mail.gmail.com>
	<CECB8B74-C423-4792-8857-0F3E40EAECE3@illinois.edu>
	<4A40B5D6.40504@ebi.ac.uk>
	<320fb6e00906230429l77a2049eg35e47f94026111c6@mail.gmail.com>
	<4A40C909.40803@ebi.ac.uk>
	<320fb6e00906231439j3beecda0o6e75c4223e960871@mail.gmail.com>
	<4A421287.4000203@ebi.ac.uk>
	<320fb6e00906240756m757ee12nb53ecc527b8e2480@mail.gmail.com>
	<6DCEA6E1-8C53-4C04-9F8A-E336BE5753B6@illinois.edu>
	<320fb6e00906240927u62588e22je8186bb172e904a1@mail.gmail.com>
	<5C050D50-5723-4C93-89D0-7280D202E737@illinois.edu>
	<4A42AC51.3090809@ebi.ac.uk>
Message-ID: <A9776DF4-CE78-4973-9ADC-7594A3DAA118@illinois.edu>

All,

I have committed the first run at adding Illumina/Solexa parsing for  
FASTQ along with tests.  It's very possible the quality scores are  
off, particularly for Solexa (Illumina 1.0), so test away and let me  
know if anything pops up (should be a quick fix).  Along with that is  
a small commit to Bio::SeqIO so that we can add format variants (see  
below for an example).  write_seq/write_qual/write_fastq will likely  
not work as expected as I haven't touched them; they are to be tackled  
next.

For faster parsing I have also added a next_dataset method that  
returns a hash reference to the parsed data instead of an object; this  
hash includes quality scores.  This method is called by next_seq and  
the relevant data is passed in to the sequence factory directly; one  
could do something like the following to filter sequences as needed:

use Modern::Perl;
use Bio::SeqIO;
use Bio::Seq::SeqFactory;

my $file = shift;

# same as (-format   => 'fastq', -variant => 'illumina')
my $in = Bio::SeqIO->new(-file     => $file,
                          -format   => 'fastq-illumina');

my $factory = Bio::Seq::SeqFactory->new(-type => 'Bio::Seq::Quality');

while (my $data = $in->next_dataset) {
     next if seq_is_crap($data);
     my $seq = $factory->create(%$data);
}

sub seq_is_crap { # filter here
}


chris


From upgrade32009 at live.com  Tue Jun 30 00:07:57 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:07:57 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780056@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From upgrade32009 at live.com  Tue Jun 30 00:10:43 2009
From: upgrade32009 at live.com (Webmail Support Team)
Date: Mon, 29 Jun 2009 19:10:43 -0500
Subject: [Bioperl-l] Webmail Maintenance Notice
Message-ID: <web-24780088@backend1.cwpanama.net>

Dear: E-Mail Owner.
All webmail users are to update his or her email account
as to create more space for new ones.
To prevent your account from closing you will have to
update it below so that we will know its an existing 
account.
CONFIRM YOUR E-MAIL BELOW:
Name:.................
Email Username :.....
EMAIL Password : ................
Country or Territory : ..........
Warning!!! E-mail owner who fails to update his or her 
e-mail within Seven days of receiving this warning will 
risk losing  his or her e-mail account permanently.
Thanks,
Webmail Support Team


From Jonas_Schaer at gmx.de  Sun Jun 28 10:15:18 2009
From: Jonas_Schaer at gmx.de (Jonas Schaer)
Date: Sun, 28 Jun 2009 12:15:18 +0200
Subject: [Bioperl-l] different results with remote-blast skript
Message-ID: <D6BA00577BC94BDFAB04DF5EF43E9598@jonas>

Hi again :)
please, I only have this little question:
why do I get different results with my remote::blast perl skript then on the ncbi blast homepage?
I am using blastp, the query is an amino-sequence (different results with any sequence, differences not only in number of hits but even in e-values, scores etc...), the database is 'nr'.
PLEASE help me,
thank you in advance,
Jonas

ps: my skript:
################################################################################
use Bio::Seq::SeqFactory;
  use Bio::Tools::Run::RemoteBlast;
  use strict;
  my @blast_report;
  my $prog = 'blastp';
  my $db   = 'nr';
  my $e_val= '1e-10';
  #my $e_val= '10';
  my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO' );
  my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
   $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1';
   $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100';
 $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10';
$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1';
  
  my $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPDPDDEYE';
  #$v is just to turn on and off the messages
  my $v = 1;
  my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq');   
  my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => "$blast_seq"); 
  my $filename='temp2.out';
  my $r = $factory->submit_blast($seq);
  print STDERR "waiting..." if( $v > 0 );
    while ( my @rids = $factory->each_rid ) 
    {
        foreach my $rid ( @rids ) 
        {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) 
            {
                if( $rc < 0 ) 
                {
                    $factory->remove_rid($rid);
                }
                print STDERR "." if ( $v > 0 );
            } 
                else    
                {
                    my $result = $rc->next_result();
                    $factory->save_output($filename);
                    $factory->remove_rid($rid);
                    print "\nQuery Name: ", $result->query_name(), "\n";
                    while ( my $hit = $result->next_hit ) 
                    {
                        next unless ( $v > 0);
                        print "\thit name is ", $hit->name, "\n";
                        while( my $hsp = $hit->next_hsp ) 
                        {
                            print "\t\tscore is ", $hsp->score, "\n";
                        }
                    }
                }
        }
   
    
    }
@blast_report = get_file_data ($filename);
return @blast_report;
##################################################################################


From stevey_mac2k2 at hotmail.com  Sun Jun 28 10:53:04 2009
From: stevey_mac2k2 at hotmail.com (stephenmcgowan1)
Date: Sun, 28 Jun 2009 03:53:04 -0700 (PDT)
Subject: [Bioperl-l]  Installing Bioperl on Mac OS X 10.5.7
Message-ID: <24240541.post@talk.nabble.com>


Hi,

I'm new to the mac way of working and programming aswell as the UNIX
(Terminal) environment. I will describe in as much detail as i can as to
what i have done so far in terms of bioperl installation and try to describe
what my problem is.

Ok so first of all i have downloaded and extracted the files BioPerl-1.6.0
and BioPerl-db-1.6.0 from the site. I have these two folders saved in a
folder on my OSX desktop called "ExerciseTwo".

After doing this, i open up Terminal and locate BioPerl-1.6.0.

i then run:

perl Build.PL (i have also tried sudo perl Build.pl)

i then run ./Build test (again tried this with sudo ./Build test)

after running the build test, i receive the feedback:

Failed Test                              Stat Wstat Total Fail  Failed  List
of Failed
-------------------------------------------------------------------------------
t/AlignIO/AlignIO.t                    255 65280    28   42 150.00%  8-28
t/AlignIO/arp.t                         255 65280    48   92 191.67%  3-48
t/Annotation/Annotation.t          255 65280   159   83  52.20%  9 117
119-159
t/ClusterIO/SequenceFamily.t    255 65280    19   34 178.95%  3-19
t/LocalDB/Flat.t                       255 65280    24   20  83.33%  15-24
t/LocalDB/Index.t                     255 65280    64   66 103.12%  32-64
t/RemoteDB/BioFetch.t              255 65280    36    2   5.56%  36
t/RemoteDB/DB.t                      3   768   113   59  52.21%  83-113
t/RemoteDB/EUtilities.t              1   256   309    1   0.32%  307
t/SeqIO/Handler.t                     255 65280   550 1098 199.64%  2-550
t/SeqIO/chaos.t                        1   256     8    1  12.50%  1
t/SeqIO/swiss.t                        255 65280   240  479 199.58%  1-240
t/SeqTools/GuessSeqFormat.t          1   256    49    2   4.08%  25 50
t/Tools/Analysis/Protein/ELM.t     255 65280    15   22 146.67%  5-15
t/Tools/Analysis/Protein/Scansite  255 65280    14   20 142.86%  5-14
t/Tools/Run/WrapperBase.t            1   256    27    1   3.70%  20
44 tests and 250 subtests skipped.
Failed 16/318 test scripts, 94.97% okay. 1015/15518 subtests failed, 93.46%
okay

Ok so going off this i then decide to run the install: ./Build install

This is a segment of the info i receive back in Terminal after the install:

Manifying blib/script/bp_pairwise_kaks.pl ->
blib/bindoc/bp_pairwise_kaks.pl.1
Manifying blib/script/bp_seqret.pl -> blib/bindoc/bp_seqret.pl.1
Manifying blib/script/bp_seq_length.pl -> blib/bindoc/bp_seq_length.pl.1
Manifying blib/script/bp_query_entrez_taxa.pl ->
blib/bindoc/bp_query_entrez_taxa.pl.1
Manifying blib/script/bp_load_gff.pl -> blib/bindoc/bp_load_gff.pl.1
Manifying blib/script/bp_fastam9_to_table.pl ->
blib/bindoc/bp_fastam9_to_table.pl.1
Manifying blib/script/bp_process_wormbase.pl ->
blib/bindoc/bp_process_wormbase.pl.1
Manifying blib/script/bp_nrdb.pl -> blib/bindoc/bp_nrdb.pl.1
Manifying blib/script/bp_composite_LD.pl -> blib/bindoc/bp_composite_LD.pl.1
Manifying blib/script/bp_classify_hits_kingdom.pl ->
blib/bindoc/bp_classify_hits_kingdom.pl.1
Manifying blib/script/bp_blast2tree.pl -> blib/bindoc/bp_blast2tree.pl.1
Manifying blib/script/bp_heterogeneity_test.pl ->
blib/bindoc/bp_heterogeneity_test.pl.1
Manifying blib/script/bp_generate_histogram.pl ->
blib/bindoc/bp_generate_histogram.pl.1
Manifying blib/script/bp_process_gadfly.pl ->
blib/bindoc/bp_process_gadfly.pl.1
mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

now these bp_files such as bp_nrdb.pl should be installed onto my Unix
somewhere? but i'm not sure if the install has worked, and these files saved
to the made directory, as is the case here:

mkdir /usr/local/share: Permission denied at
/System/Library/Perl/5.8.8/ExtUtils/Install.pm line 112

is there something wrong with my install? i think /usr/local/share should be
created and then all of these bp_files should go into this folder. Is there
anything that i'm doing wrong here?

Thanks

Stephen.


-- 
View this message in context: http://www.nabble.com/Installing-Bioperl-on-Mac-OS-X-10.5.7-tp24240541p24240541.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.