From n.haigh at sheffield.ac.uk  Fri Dec  1 02:47:03 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 07:47:03 +0000
Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm?
In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com>
References: <519167.29410.qm@web50804.mail.yahoo.com>
Message-ID: <456FDDF7.1080403@sheffield.ac.uk>

Caitlin wrote:
> Hi all.
>
> I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references
> to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version?
> The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages
> among those deemed upgradable.
>
> Thanks,
>
> ~Katie
>
>
>   

Hi Katie,

Currently there is not an RC5 PPM package available - we are hoping to
have the official 1.5.2 release out pretty soon and there will
definitely be a PPM package for that! Are you experiencing any problems
with your current version of bioperl? If not, there is no need to worry,
once we've released an updated PPM package your PPM GUI should then be
able to see it as an upgrade - hopefully! :o)

Sendu, I know you were working on automatically generating PPM packages
- what is the current situation with regards to this?

Nath


---
avast! Antivirus: Inbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:46:58
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:47:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:00:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:00:18 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <456F27E9.70205@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>
	<456F27E9.70205@york.ac.uk>
Message-ID: <456FEF22.4090004@sendu.me.uk>

Samantha Thompson wrote:

You missed a step...


> use strict;
> use Bio::Perl;
> use Bio::Seq;
> use Bio::SeqIO;
> 
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> 
> #seq bit
> 
> #$seq_obj = Bio::Seq->new(-format => 'fasta');
> 
> my $seqio_obj = Bio::SeqIO->new(-file => 
> "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta');
> 
> my $seq_obj = $seqio_obj->next_seq;
> 
> 
> 
> #blast bit
> 
> my $remote_blast = Bio::Tools::Run::RemoteBlast->new (
>          -prog => 'blastp', -db => 'nr', -expect => '1e-15' );
> 
> my $blast_report = $remote_blast->submit_blast($seq_obj);

Go back to the Bptutorial:
http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29

And you'll see that submit_blast doesn't return a SearchIO object.

For a complete working example see the synopsis for RemoteBlast:
http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html


> #new part for SearchIO...
> 
> while( my $result = $blast_report->next_result ) {
>   while( my $hit = $result->next_hit ) {
>    while( my $hsp = $hit->next_hsp ) {
>     if( $hsp->length('total') > 100 ) {
>      if ( $hsp->percent_identity >= 75 ) {
>       print "Hit= ",       $hit->name,
>             ",Length=",     $hsp->length('total'),
>             ",Percent_id=", $hsp->percent_identity, "\n";
>      }
>     }
>    } 
>   }
> }

From bix at sendu.me.uk  Fri Dec  1 04:03:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:03:13 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <456FEFD1.4070704@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: The supplied lineage does not start near 'Photorhabdus luminescens
> subsp. laumondii'

In your uniprot_sprot.dat file there'll be some kind of entry with that 
Photorhabdus species. Can you post that entry (sans sequence if it has 
one) so I can take a look at it? Maybe post a few that cause problems, 
and a few that don't.

From bix at sendu.me.uk  Fri Dec  1 04:19:09 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:19:09 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
Message-ID: <456FF38D.3070508@sendu.me.uk>

Chris Fields wrote:
>> Nathan S. Haigh wrote:
>>> More updates:
>>>
>>> After the failed install I updating Module::Build, and re-ran the 
>>> install, I get:
>>>
>>> -- snip --
>>> Creating new 'Build' script for 'bioperl' version '1.005002005'
>>> Warning: while trying to determine prerequisites for 
>>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of 
>>> Module::Build the following error occurred: 'Failed to re-load 
>>> 'ModuleBuildBiope
>>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: 
>>> _build\lib C:\Perl\site\lib C:\
>>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1.
>>> '
>>>
>>> Falling back to META.yml for prerequisites 'YAML' not installed, 
>>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml'
>>> -- snip --
>> I had that problem fleetingly and it drove me crazy because 
>> later I couldn't reproduce it. Is it reproducible on your end?
> 
> During Module::Build installation I see this:
> 
> ...
> t\metadata........ok
>         8/43 skipped: YAML_support feature is not enabled

You were pointing out the YAML issue? I think I'm less concerned with 
that (solution: install YAML) and much more concerned with why it can't 
reload ModuleBuildBioperl (claiming it isn't in @INC). The module in 
question is in the same dir as the Build script, so it should be found 
automatically.

The only thing I can think of is that CPAN doesn't manage to chdir to 
the directory. Hopefully I'll be able to reproduce this and then I can 
investigate further.

From n.haigh at sheffield.ac.uk  Fri Dec  1 04:26:22 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 09:26:22 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <456FF53E.90907@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>>
>> I know that setting up the PPM is a pain, but I have to say it is 
>> much faster, and all required PPMs are available.  Which makes me 
>> curious: why bother with trying out a CPAN installation process at 
>> this point, especially when you have to use PPM to install some of 
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all 
> platforms, not just Windows. So thanks for trying it out and reporting 
> back. Secondly, the PPM method, like Bundle::BioPerl, is 
> all-or-nothing. The CPAN installation method allows an interactive 
> choice of which optional things to install.
>
> If what you say about DB_File is true, then that's a great shame!
>
>
> So I can do further trouble-shooting of my own, what is the sure-fire 
> way to completely clean-out an ActivePerl install, including any 
> modules you might have installed with PPMs or CPAN?
>
>

In addition, using CPAN allows you to run the test suite easily without 
the need to download it separately and run it after a PPM install.

I don't know of a way to clean out ActivePerl - I use VMWare Workstation 
and have a virtual machine with a fresh install of WinXP and ActivePerl 
5.8.8.819 - maybe someone else has ideas?

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 09:26:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:13:23 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:13:23 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
Message-ID: <456FF233.6040704@sendu.me.uk>

Chris Fields wrote:
> 
> I know that setting up the PPM is a pain, but I have to say it is much 
> faster, and all required PPMs are available.  Which makes me curious: 
> why bother with trying out a CPAN installation process at this point, 
> especially when you have to use PPM to install some of the prereqs 
> properly anyway?

Firstly, problems discovered and resulting fixes will help all 
platforms, not just Windows. So thanks for trying it out and reporting 
back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. 
The CPAN installation method allows an interactive choice of which 
optional things to install.

If what you say about DB_File is true, then that's a great shame!


So I can do further trouble-shooting of my own, what is the sure-fire 
way to completely clean-out an ActivePerl install, including any modules 
you might have installed with PPMs or CPAN?


From cjfields at uiuc.edu  Fri Dec  1 09:08:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:08:55 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>


On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I know that setting up the PPM is a pain, but I have to say it is  
>> much faster, and all required PPMs are available.  Which makes me  
>> curious: why bother with trying out a CPAN installation process at  
>> this point, especially when you have to use PPM to install some of  
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all  
> platforms, not just Windows. So thanks for trying it out and  
> reporting back. Secondly, the PPM method, like Bundle::BioPerl, is  
> all-or-nothing. The CPAN installation method allows an interactive  
> choice of which optional things to install.

Yes, I understand that.  My point is, you are generally forced to use  
PPM anyway due to several modules not installing properly (all the  
'trouble' distributions, like DB_File, are available via PPM).  I can  
see using CPAN as an alternative way of installing Bioperl for a  
distribution, or as the primary method via CVS or manually, but not  
for distributions.  At least not until the kinks are worked out for  
Windows users.

What are the significant issues for a bioperl PPM installation, based  
on the last PPM Nathan set up?  If there is a redirection problem,  
could we just modify the installation docs to address that ('due to  
problem X, you must install the following modules prior to installing  
BioPerl 1.5.2...').

> If what you say about DB_File is true, then that's a great shame!

We need to go through the various prereqs to see which ones need PPM  
vs CPAN.  In general, anything that requires C code compilation (and  
thus needs a recent VC++) will likely be an issue.

> So I can do further trouble-shooting of my own, what is the sure- 
> fire way to completely clean-out an ActivePerl install, including  
> any modules you might have installed with PPMs or CPAN?

Not sure, beyond uninstalling and cleaning out the Perl directory (I  
think you might be able to delete the site/ directory, but I haven't  
tried it).  ActivePerl comes preloaded with a number of non-core  
modules which makes it tricky to uninstall them one-by-one.

chris


From cjfields at uiuc.edu  Fri Dec  1 09:10:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:10:34 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <456FF38D.3070508@sendu.me.uk>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>


On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:

> You were pointing out the YAML issue? I think I'm less concerned  
> with that (solution: install YAML) and much more concerned with why  
> it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The  
> module in question is in the same dir as the Build script, so it  
> should be found automatically.
>
> The only thing I can think of is that CPAN doesn't manage to chdir  
> to the directory. Hopefully I'll be able to reproduce this and then  
> I can investigate further.

My thought was the two were related in some way.  I'm not sure to  
tell the truth.

-chris


From bix at sendu.me.uk  Fri Dec  1 09:17:41 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:17:41 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
	<10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
Message-ID: <45703985.5050203@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I know that setting up the PPM is a pain, but I have to say it is 
>>> much faster, and all required PPMs are available.  Which makes me 
>>> curious: why bother with trying out a CPAN installation process at 
>>> this point, especially when you have to use PPM to install some of 
>>> the prereqs properly anyway?
>>
>> Firstly, problems discovered and resulting fixes will help all 
>> platforms, not just Windows. So thanks for trying it out and reporting 
>> back. Secondly, the PPM method, like Bundle::BioPerl, is 
>> all-or-nothing. The CPAN installation method allows an interactive 
>> choice of which optional things to install.
> 
> Yes, I understand that.  My point is, you are generally forced to use 
> PPM anyway due to several modules not installing properly (all the 
> 'trouble' distributions, like DB_File, are available via PPM).  I can 
> see using CPAN as an alternative way of installing Bioperl for a 
> distribution, or as the primary method via CVS or manually, but not for 
> distributions.  At least not until the kinks are worked out for Windows 
> users.

CPAN isn't being suggested as the primary or preferred installation 
method for Windows. That will still be PPM. I'm mentioning CPAN / manual 
installation in the Windows INSTALL docs for the benefit of anyone who 
wants a simple install and test environment when checking out from CVS.


> What are the significant issues for a bioperl PPM installation

None that I'm aware of - I just need to find the time to start looking 
into generating an appropriate PPD. Hopefully Nathan's wiki page on the 
subject will be all I need.


From bix at sendu.me.uk  Fri Dec  1 09:18:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:18:43 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
	<6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
Message-ID: <457039C3.30907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:
> 
>> You were pointing out the YAML issue? I think I'm less concerned with 
>> that (solution: install YAML) and much more concerned with why it 
>> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The 
>> module in question is in the same dir as the Build script, so it 
>> should be found automatically.
>>
>> The only thing I can think of is that CPAN doesn't manage to chdir to 
>> the directory. Hopefully I'll be able to reproduce this and then I can 
>> investigate further.
> 
> My thought was the two were related in some way.  I'm not sure to tell 
> the truth.

They weren't, using YAML is the fall-back position incase of earlier 
failure.

I've fixed it now in any case.

From gwu at molbio.mgh.harvard.edu  Fri Dec  1 10:19:42 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Fri, 01 Dec 2006 10:19:42 -0500
Subject: [Bioperl-l] One more load_seqdatabase.pl question
In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com>	<53C6D534-6E36-4061-B955-E74537839265@gmx.net>	<456CA667.6010609@molbio.mgh.harvard.edu>
	<ED3F5F49-78A7-4E63-ACB8-5E8F745F0C34@gmx.net>
	<456F5648.6070207@molbio.mgh.harvard.edu>
	<70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu>

Thanks Hilmar. I did include the -lookup switch on the command line. The 
warning messages say that the code failed to "INSERT" instead of 
"UPDATE", which sounds like a match was not found. But I was just 
loading the same Genbank file for the second time. To test if it 
actually updated the records, I made a minor modification on one of the 
COMMENT feature. Unfortunately it's not updated. By the way, the test 
genbank file has four "COMMENT" features but they are different. Any 
idea what's happening there?

I wonder if it's a bad idea to "UPDATE" a sequence.  Say I got a new 
sequence version with 5 features removed, 5 features modified and 5 
features new. If only --lookup is included, according to the POD, the 5 
new features will be inserted, the 5 modified features will be updated 
and the 5 removed features will be in the database untouched. This 
rendered the new sequence records a mixture of old and new versions. I 
did not see a reason anyone would like to have a sequence like this. 
Either include -remove to replace the old version if only one version is 
needed, or put the new version under a different name space if multiple 
versions are needed. Do I have the correct understanding of these issues?

I deeply appreciate your help.

Gang


Hilmar Lapp wrote:
> Right. You need to tell it to lookup sequences first if you know that 
> you are loading sequences which may be in the database already (see 
> the POD of load_seqdatabase.pl, switch --lookup; there are several 
> other command line options that control what will happen if a sequence 
> entry is already present in the database.).
>
> The messages in you report are warnings, not errors. It looks like 
> some of the comments are duplicated for a sequence, it doesn't look 
> like reason for concern. Is not so good if you get errors thrown.
>
>     -hilmar
>
> On Nov 30, 2006, at 5:08 PM, gang wu wrote:
>
>> Thanks Hilmar. Do you mean the NVL() clause will make 
>> load_seqdatabase.pl not work when update?
>>
>> I have problem with updating. Seems load_seqdatabase.pl only tries to 
>> insert instead of update. I used one of the test genbank file coming 
>> whith bioperl-db. Please take a look at the attached output.
>>
>> Thanks.
>>
>> Gang
>>
>> =========================================
>> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle 
>> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank 
>> -namespace test 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb
>> Loading 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb 
>> ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("This sequence was reannotated via the Ensembl system. 
>> Please visit the Ensembl web site, http://www.ensembl.org/ for more 
>> information. ","1") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("The /gene indicates a unique id for a gene, /cds a 
>> unique id for a translation and a /exon a unique id for an exon. 
>> These ids are maintained wherever possible between versions. For more 
>> information on how to interpret the feature table, please visit 
>> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>> ...
>> ...
>> ==========================================================
>> Hilmar Lapp wrote:
>>> These are the protein translations stored in the feature table as 
>>> tags of features, right? You can change the type of the column 
>>> (although there may be some issues when you update the column 
>>> because the NVL() clause won't work if I recall that correctly), but 
>>> doing so will deprive you of any 'normal' searches against that 
>>> column. (You can still use functions >from the DBMS_LOB package, but 
>>> they will be much slower and are completely non-standard.) It is up 
>>> to you whether that is too big of a price to pay for having some 
>>> redundant protein translations (translating the feature's DNA 
>>> sequence should give you the same) in the database. I always trimmed 
>>> those feature tags off (using a custom SeqProcessor). An alternative 
>>> is to convert these feature tags into actual bioentries (i.e., 
>>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do 
>>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote:
>>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank 
>>>> genome sequences to my Oracle BioSQL database. I saw some 
>>>> errors(See attached warning message) related to 
>>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE 
>>>> column), which has Varchar2 data type of maximum 4000 bytes. Did 
>>>> anybody mention this issue before? Should I just modify the column 
>>>> to a type being able store more data such as LONG or CLOB? Thanks. 
>>>> Gang Log information: ============================================ 
>>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc 
>>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace 
>>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading 
>>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- 
>>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: 
>>>> unexpected failure of statement execution: ORA-01461: can bind a 
>>>> LONG value only for insert into a LONG column (DBD ERROR: error 
>>>> possibly near <*> indicator at char 12 in 'INSERT INTO 
>>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) 
>>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] 
>>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: 
>>>> FK[Bio::SeqFeature::Generic]:14898, 
>>>> FK[Bio::Annotation::SimpleValue]:800, 
>>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV 
>>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR 
>>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI 
>>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP 
>>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA 
>>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY 
>>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA 
>>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI 
>>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW 
>>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL 
>>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN 
>>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY 
>>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT 
>>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL 
>>>> VQATYQASA! 
>>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV 
>>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY 
>>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV 
>>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE 
>>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG 
>>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV 
>>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL 
>>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL 
>>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT 
>>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL 
>>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV 
>>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY 
>>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD 
>>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR 
>>>> VKLDFNFM! 
>>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS 
>>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN 
>>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL 
>>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD 
>>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE 
>>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV 
>>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL 
>>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS 
>>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF 
>>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL 
>>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA 
>>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL 
>>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN 
>>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE 
>>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL 
>>>> WLSVGADAS! 
>>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY 
>>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND 
>>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES 
>>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS 
>>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV 
>>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW 
>>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV 
>>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS 
>>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV 
>>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM 
>>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI 
>>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK 
>>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR 
>>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG 
>>>> QRKFIPAK! 
>>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ 
>>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", 
>>>> rank:"1" -------------------------------------------------- 
>>>> =============================================   
>>>> _______________________________________________ Bioperl-l mailing 
>>>> list Bioperl-l at lists.open-bio.org 
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From bosborne11 at verizon.net  Fri Dec  1 09:55:18 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 01 Dec 2006 09:55:18 -0500
Subject: [Bioperl-l] An announcement
Message-ID: <C195AC86.BB6A%bosborne11@verizon.net>

bioperl-l,

I would like to call your attention to a job posting and in doing so I
realize that I?m probably breaking a rule of this list. I apologize and and
acknowledge that I?ve transgressed. The reason I do this is because this is
an interesting job that is relevant to a lot of what we do in this mailing
list, and some of you might want to consider it. The posting is here:

http://www.nescent.org/main/employment.html#gmodhelpdesk

I encourage you to pass this on to anyone who you think might be interested.

Thanks again,

Brian O.


From cjfields at uiuc.edu  Fri Dec  1 11:49:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 10:49:32 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF53E.90907@sheffield.ac.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk>
Message-ID: <D464535F-E70F-44B4-AD48-3CC79181869C@uiuc.edu>


On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote:
...
> In addition, using CPAN allows you to run the test suite easily  
> without the need to download it separately and run it after a PPM  
> install.

A PPM, by design, is supposed to imply that the distribution passes  
tests for the specified platform, at that point in time, after all  
prereqs are installed and any additional postinstall operations  
(install C libraries, modify config files, etc) are complete.  The  
ActiveState automated PPM building process dictates that; if it fails  
any test, it will not be made into a PPM.  It's sort of a 'stamp of  
approval' that all tests pass, so you don't need to run them.

However, a test may fail (and a PPM may not get generated) for pretty  
superficial reasons, such as the makefile not specifying that a  
module is needed, server issues, etc, so the automated process isn't  
fullproof.  That's why Kobes and the other repositories are  
available, where the PPM/PPD is manually generated and made to work  
specifically for Windows (or whatever other platform).

Saying that, it is completely up to the person packaging the  
distribution to follow those rules if one were to make a PPM  
manually.  You don't even have to run tests prior to using 'nmake  
ppd'.  We can currently state, though, that all tests pass when all  
prereqs are installed for this distribution.  At least at this point  
in time!

> I don't know of a way to clean out ActivePerl - I use VMWare  
> Workstation and have a virtual machine with a fresh install of  
> WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas?

I haven't tried it that way.  I have Parallels on Mac OS X (I run a  
SigmaPlot/Excel combo off it).  My tests were using a native WinXP  
installation (i.e. not virtually) on my old Dell.  It shouldn't make  
a difference; VMWare, Parallels, and the like should all run  
ActivePerl for WinXP since it's a virtual machine.  Windows Vista, on  
the other hand...

I think with PPM4 you can install to a custom directory.  It may be  
possible to install all new modules to that directory, then you would  
at least have an idea of what was there (though I don't think you can  
delete it directly w/o screwing up the PPM database).

chris


From bix at sendu.me.uk  Fri Dec  1 12:12:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 17:12:49 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <45706291.80201@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:

I extracted just Q7N3Q6 from 
ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
and was able to load it in using load_seqdatabase.pl under linux with no 
errors. If you make a file with just that sequence do you still get the 
error?

Is anyone else able to reproduce the problem?

From cjfields at uiuc.edu  Fri Dec  1 12:57:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 11:57:18 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45703985.5050203@sendu.me.uk>
Message-ID: <006301c71572$24be8830$15327e82@pyrimidine>


> Chris Fields wrote:
> PPM).  I can 
> > see using CPAN as an alternative way of installing Bioperl for a 
> > distribution, or as the primary method via CVS or manually, but not 
> > for distributions.  At least not until the kinks are worked out for 
> > Windows users.
> 
> CPAN isn't being suggested as the primary or preferred 
> installation method for Windows. That will still be PPM. I'm 
> mentioning CPAN / manual installation in the Windows INSTALL 
> docs for the benefit of anyone who wants a simple install and 
> test environment when checking out from CVS.

That's fine by me.  I think the focus is making sure the PPM works, but that
shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
was never released concurrently with the distribution (if at all); it
generally followed by a few weeks to a few months past a final release.

> > What are the significant issues for a bioperl PPM installation
> 
> None that I'm aware of - I just need to find the time to 
> start looking into generating an appropriate PPD. Hopefully 
> Nathan's wiki page on the subject will be all I need.

I'll try testing it out today and next week (the more people we have looking
into the issue the better).  I'm sure that Module::Build hasn't updated to
using PPM4 XML formatting, but the tags are similar enough.  I can always
create a local PPM database using a similar directory structure to
bioperl.org/DIST and test an installation from it.

chris


From n.haigh at sheffield.ac.uk  Fri Dec  1 13:52:55 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 18:52:55 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707A07.7000106@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   
>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>   

To clarify a few things about PPM4 XML and to highlight the main 
differences:

1) The use of PROVIDE and REQUIRE tags
2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma 
separated tuples like PPM3 XML
4) the VERSION in PROVIDE and REQUIRE are used internally to do version 
comparisons for upgrades and solving prereqs etc
5) Module names should all contain '::' either natively according their 
namespace, if it doesn't have one natively, then one is appended to the 
end e.g. "GD::"
6) the VERSION in the SOFTPKG key is for human readability only
7) the NAME in SOFTPKG is used to identify which packages are actually 
the same.

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 18:52:57
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 13:52:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 18:52:44 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <457079FC.7010209@sendu.me.uk>

Sendu Bala wrote:
> pelikan at cs.pitt.edu wrote:
[snip]
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
> 
> I extracted just Q7N3Q6 from 
> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux with no 
> errors. If you make a file with just that sequence do you still get the 
> error?
> 
> Is anyone else able to reproduce the problem?

In fact, if I just try and load it again I reproduce the problem.
The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092

And I have a tentative fix that extends Brian's fix there. Committed to 
HEAD only atm. I don't know anything about bioperl-db and don't have the 
faintest clue why this is happening, nor the time to figure it out. Can 
someone please have a proper look at this and decide if my fix is sane?

All I can say is the the test suites for bioperl-live and bioperl-db 
continue to pass, but that isn't really saying much.


PS. having used load_seqdatabase.pl to load a sequence, how do I remove 
it afterwards?

From cjfields at uiuc.edu  Fri Dec  1 14:00:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:00:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <EAE311A7-DB66-4CFC-9598-EA6FCAED9B7F@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

I can reproduce on both WinXP and Mac OS X using the latest bioperl- 
db/bioperl-live and a BioSQL database preloaded with taxonomy.   
Notably the bug doesn't show up with a database lacking taxonomy,  
where no lookup is used (I guess).

Here's some overly verbose debugging (apologies):

Loading saved.flat ...
attempting to load adaptor class for Bio::Seq::RichSeq
	attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
	attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
	attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Tree::Tree
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Root::Root
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
	attempting to load module Bio::DB::BioSQL::RootIAdaptor
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Tree::TreeI
	attempting to load module Bio::DB::BioSQL::TreeIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Tree::NodeI
	attempting to load module Bio::DB::BioSQL::NodeIAdaptor
	attempting to load module Bio::DB::BioSQL::NodeAdaptor
attempting to load adaptor class for Bio::Tree::TreeFunctionsI
	attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor
no adaptor found for class Bio::Tree::Tree
attempting to load adaptor class for Bio::DB::Taxonomy::list
	attempting to load module Bio::DB::BioSQL::listAdaptor
attempting to load adaptor class for Bio::DB::Taxonomy
	attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load adaptor class for Bio::Annotation::Collection
	attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
	attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
	attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
	attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
	attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
	attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
	attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
	attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
	attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
	attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
	attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
	attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
	attempting to load module Bio::DB::BioSQL::LocationIAdaptor
	attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
	attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,  
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority)  
VALUES (?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "Swiss- 
Prot" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load driver for adaptor class  
Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for  
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon,  
taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class  
= ? AND ncbi_taxon_id = ?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid)
prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM  
taxon node, taxon taxon, taxon_name name WHERE name.taxon_id =  
node.taxon_id AND taxon.left_value BETWEEN node.left_value AND  
node.right_value AND taxon.taxon_id = ? AND name.name_class =  
'scientific name' ORDER BY node.left_value
attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::SeqAdaptor
Could not store Q7N3Q6:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The supplied lineage does not start near 'Photorhabdus  
luminescens subsp. laumondii'
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ 
Bio/Species.pm:166
STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:552
STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ 
Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK: load_seqdatabase.pl:620
-----------------------------------------------------------

at load_seqdatabase.pl line 633


chris

From cjfields at uiuc.edu  Fri Dec  1 14:01:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:01:59 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45707A07.7000106@sheffield.ac.uk>
References: <006301c71572$24be8830$15327e82@pyrimidine>
	<45707A07.7000106@sheffield.ac.uk>
Message-ID: <C233572F-BD36-4DBE-BE9B-2C097F4C939B@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>>> Chris Fields wrote:
>>> PPM).  I can
>>>> see using CPAN as an alternative way of installing Bioperl for a  
>>>> distribution, or as the primary method via CVS or manually, but  
>>>> not for distributions.  At least not until the kinks are worked  
>>>> out for Windows users.
>>>>
>>> CPAN isn't being suggested as the primary or preferred  
>>> installation method for Windows. That will still be PPM. I'm  
>>> mentioning CPAN / manual installation in the Windows INSTALL docs  
>>> for the benefit of anyone who wants a simple install and test  
>>> environment when checking out from CVS.
>>>
>>
>> That's fine by me.  I think the focus is making sure the PPM  
>> works, but that
>> shouldn't hold up the final 1.5.2 release.  The PPM for previous  
>> releases
>> was never released concurrently with the distribution (if at all); it
>> generally followed by a few weeks to a few months past a final  
>> release.
>>
>>
>>>> What are the significant issues for a bioperl PPM installation
>>>>
>>> None that I'm aware of - I just need to find the time to start  
>>> looking into generating an appropriate PPD. Hopefully Nathan's  
>>> wiki page on the subject will be all I need.
>>>
>>
>> I'll try testing it out today and next week (the more people we  
>> have looking
>> into the issue the better).  I'm sure that Module::Build hasn't  
>> updated to
>> using PPM4 XML formatting, but the tags are similar enough.  I can  
>> always
>> create a local PPM database using a similar directory structure to
>> bioperl.org/DIST and test an installation from it.
>>
>> chris
>>
>
> To clarify a few things about PPM4 XML and to highlight the main  
> differences:
>
> 1) The use of PROVIDE and REQUIRE tags
> 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
> 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma  
> separated tuples like PPM3 XML
> 4) the VERSION in PROVIDE and REQUIRE are used internally to do  
> version comparisons for upgrades and solving prereqs etc
> 5) Module names should all contain '::' either natively according  
> their namespace, if it doesn't have one natively, then one is  
> appended to the end e.g. "GD::"
> 6) the VERSION in the SOFTPKG key is for human readability only
> 7) the NAME in SOFTPKG is used to identify which packages are  
> actually the same.
>
> Nath

Okay.  Maybe place this in the wiki (PPM page) for future reference?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 14:05:38 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 19:05:38 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707D02.9070504@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   

Forgot to say, one really annoying thing about PPM is that it seems to 
display all the versions of Bioperl defined in the XML file. An 
addition, I think a bug in PPM4 means that if a package is available in 
ActiveStates repo PPM4 always want to install it rather than a more 
recent version in a different repo (this includes upgrades). This 
results in this annoying behaviour:
1) If activestate and bioperl repos are active, searching for bioperl 
lists several versions
2) If you are using PPM4 GUI, and have installed a non activestate 
version, then it says you can upgrade to the version in activestates 
repo (even if it's actually a downgrade).
3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" 
it will always install the version in the activestate repo.
4) I'm sure there are also some other annoyances.

In the end, it means the best way to install and upgrade bioperl, is to 
search for bioperl packages and install the latest version by eye rather 
than relying in the "upgrade feature" (at least for the time being). You 
may also need to remove an old version of bioperl before installing a 
more recent version. NOTE: using "upgrade" runs the risk of installing 
bioperl 1.2.3 from activestate and not the latest version in any other repo!

I'll update the wiki when I have time.
Nath


>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0652-4, 30/11/2006
> Tested on: 01/12/2006 18:29:23
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
>
>
>
>   


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 19:05:39
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From cjfields at uiuc.edu  Fri Dec  1 14:06:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:06:53 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

Okay, just updated to get your latest CVS fixes for bioperl-live and  
it passes now for both Mac OS X and WinXP.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Dec  1 14:09:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:09:15 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <A85B86B9-3DCD-4855-AC06-675D19E3689E@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote:

>
> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

There's not much documentation on it, but it demonstrated several  
times in the test suite.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Dec  1 14:39:17 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 19:39:17 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
	<0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
Message-ID: <457084E5.2050300@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:
> 
>> pelikan at cs.pitt.edu wrote:
>>> Hello all,
>>>
>>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>>> without Cygwin. The "make test"s have all completed without error. This
>>> is my first time dealing with bioperl, so bear with me.
>>>
>>>    I've successfully loaded the most recent taxonomy information 
>>> using the
>>> biosql-schema scripts. After this, I attempted to load the most recent
>>> release of the uniprot flat file dataset with the following command:
>>>
>>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>>
>>> I am subsequently greeted by many of the following errors:
>>>
>>> Could not store Q7N3Q6:
>>
>> I extracted just Q7N3Q6 from
>> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz 
>>
>> and was able to load it in using load_seqdatabase.pl under linux with no
>> errors. If you make a file with just that sequence do you still get the
>> error?
>>
>> Is anyone else able to reproduce the problem?
> 
> Okay, just updated to get your latest CVS fixes for bioperl-live and it 
> passes now for both Mac OS X and WinXP.

Can you confirm if it is actually working correctly though? Like, having 
stored a previously-problem sequence, can you get it back out from the 
database and is its ->species() correct?

From cjfields at uiuc.edu  Fri Dec  1 14:52:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:52:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457084E5.2050300@sendu.me.uk>
Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine>

> > 
> > Okay, just updated to get your latest CVS fixes for 
> bioperl-live and 
> > it passes now for both Mac OS X and WinXP.
> 
> Can you confirm if it is actually working correctly though? 
> Like, having stored a previously-problem sequence, can you 
> get it back out from the database and is its ->species() correct?

I would assume so, if we can trust the species tests.  I will have to try it
again over the weekend.  I planned on loading a ton of protein sequences in
anyway, most of which are bacterial; if anything breaks it will probably be
with those.

I think Jason and Hilmar were going to get together about the BioSQL paper
at the hackathon.  That may be a good place to bring some of the species
issues, if they persist.

chris


From hlapp at gmx.net  Fri Dec  1 20:42:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 1 Dec 2006 20:42:05 -0500
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net>

Either using SQL:

	-- theoretically you should convince yourself first that there
	-- is only one such record (the UK is over acc,version,namespace)
	SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6';

or through bioperl-db (see the delete test for examples):

	my $db = Bio::DB::BioDB->new(....);
	my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6',
	                               -namespace=>'whatever you used when  
loading');
	my $adp = $db->get_persistence_adaptor($seq);
	my $pseq = $adp->find_by_unique_key($seq);
	$pseq->remove();
	$pseq->commit();

-hilmar

On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote:

> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From chhalling at verizon.net  Sun Dec  3 20:56:51 2006
From: chhalling at verizon.net (Conrad Halling)
Date: Sun, 03 Dec 2006 20:56:51 -0500
Subject: [Bioperl-l] BioPerl Wiki is down
Message-ID: <45738063.1070504@verizon.net>

When I attempted to navigate to http://www.bioperl.org/, I got the 
following message:

A database query syntax error has occurred. This may indicate a bug in 
the software. The last attempted database query was:

    (SQL query hidden)

from within function "MediaWikiBagOStuff::_doquery". MySQL returned 
error "1205: Lock wait timeout exceeded; try restarting transaction 
(localhost)".

-- 
Conrad Halling
chhalling at verizon.net


From rbirnie at totalise.co.uk  Sun Dec  3 16:38:02 2006
From: rbirnie at totalise.co.uk (richard)
Date: Sun, 3 Dec 2006 21:38:02 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
Message-ID: <200612032138.02522.rbirnie@totalise.co.uk>

Hi all,

I'm having a little trouble getting Bio::Graphics to give me the correct 
output and I'm looking for some help. I am trying to extend from example 5 of 
the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 
Eventually I intend the script to follow example 6 but I thought I'd try the 
simpler version first.

The basic aim of the script is that it takes as input a file containing a list 
of GenBank IDs plus some other info for alternative transcripts of a gene. 
This information is stored in a hash and the GenBank IDs are used to retrieve 
the appropriate entries from GenBank. I then want to use Bio::Graphics to 
generate a figure from the feature tables showing the CDSs from the 
alternative transcripts. 

So far I have managed to retrieve the GenBank entries extract the feature 
tables and store a reference to these in the hash mentioned above. I've also 
got Bio::Graphics to draw a basic image but some of the details aren't right 
and I don't understand why. I have attached the code I have so far, the input 
file and the output image to this mail. I didn't want to display it all in 
the main message but I'm not actually sure which bit is causing the problem. 
The code is very rough and in need of polishing but I need to get it to work 
correctly first.

These are the problems:
1) As I understand it this:

my $wholeseq = Bio::SeqFeature::Generic->new (
		-start => 1,
		-end => $refseq->length,
		-display_name =>$refseq->display_name
		);

should display the name of the gene (CD133/Prominin1) near the top of image. 
It doesn't, am I misunderstanding or is there an error in the code?

2) In the quoted example the CDS is broken up into smaller regions which are 
then linked together in example 6. This isn't happening in my code and I 
think it should be, I get one solid block for the CDS. I don't understand why 
this is because I'm not clear which parts of the feature table are used to 
define where the CDS should be split. I think this is the relevant bit of 
code:

foreach my $alt_trans (keys %main) {
	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

		my $feature = $main{$alt_trans}{'features'}{$tag};

		$panel->add_track($feature,
				-glyph => 'generic',
				-bgcolor => $colors[$idx++ % @colors],
				-fgcolor => 'black',
				-font2color => 'black',
				-key => $alt_trans,
				-bump => +1,
				-height => 8,
				-label => 1,
				-description => 1,
				) if ($tag eq 'CDS');

}
}

Can anyone tell me what I am doing wrong?

RefSeq entry for the gene of interest is here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386
If I understand correctly the example file used in the HOWTO is this gene:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320

Final question, does bioperl come with example scripts and is so where whould 
they normally be found on a Linux system?

If anyone is still reading this thanks for your patience. Any clarification 
will be appreciated.

regards,
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133_graphic_code
Type: application/x-perl
Size: 2702 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0001.bin 
-------------- next part --------------
sequence_ID	Exon_Boundary	Assay_location	Amplicon_length
NM_006017	9 - 10	1118	106
AF027208.1	9 - 10	1118	106
AK027420.1	9 - 10	1312	106
AK027422.1	9 - 10	1334	106
BC012089.1	9 - 10	1289	106
AY449689.1	8 - 9	1054	106
AY449690.1	8 - 9	1054	106
AY449691.1	8 - 9	1054	106
AY449692.1	9 - 10	1081	106
AY449693.1	9 - 10	1081	106
AF507034.1	8 - 9	1091	106
AK075411.1	9 - 10	1289	106
AF117225.1	9 - 10	1334	106
AK226033.1	-	1312	106
DQ895452.1	-	1054	106
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133.png
Type: image/png
Size: 4322 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0001.png 

From cjfields at uiuc.edu  Sun Dec  3 22:35:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Dec 2006 21:35:17 -0600
Subject: [Bioperl-l] BioPerl Wiki is down
In-Reply-To: <45738063.1070504@verizon.net>
References: <45738063.1070504@verizon.net>
Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu>

On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote:

> When I attempted to navigate to http://www.bioperl.org/, I got the
> following message:
>
> A database query syntax error has occurred. This may indicate a bug in
> the software. The last attempted database query was:
>
>     (SQL query hidden)
>
> from within function "MediaWikiBagOStuff::_doquery". MySQL returned
> error "1205: Lock wait timeout exceeded; try restarting transaction
> (localhost)".
>
> -- Conrad Halling
> chhalling at verizon.net

This has been an ongoing problem with the server; I have reported it  
previously to open-bio support.  There have been a few attempts to  
fix it which seem to work short-term but something else must be  
wrong.  Jason?  Chris D?

For my part, Googling found the following link, which indicates that  
this error may be due to heavy server load:

http://tibia.erig.net/TibiaWiki:Bug_reports

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Derek.Fairley at bll.n-i.nhs.uk  Mon Dec  4 05:18:37 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Mon, 4 Dec 2006 10:18:37 -0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C63D@bllmail.bll.n-i.nhs.uk>

Richard,

 
You can find instructions for installing the example scripts directory
here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_SCRIPTS 

 
or you can get individual scripts from here:

http://www.bioperl.org/wiki/Bioperl_scripts11 

 
Derek.

 
-----Original Message-----

From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard

Sent: 03 December 2006 21:38

To: Bioperl list

Subject: [Bioperl-l] confused by Bio::Graphics

 
Hi all,

 
I'm having a little trouble getting Bio::Graphics to give me the correct


output and I'm looking for some help. I am trying to extend from example
5 of 

the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 

Eventually I intend the script to follow example 6 but I thought I'd try
the 

simpler version first.

 
The basic aim of the script is that it takes as input a file containing
a list 

of GenBank IDs plus some other info for alternative transcripts of a
gene. 

This information is stored in a hash and the GenBank IDs are used to
retrieve 

the appropriate entries from GenBank. I then want to use Bio::Graphics
to 

generate a figure from the feature tables showing the CDSs from the 

alternative transcripts. 

 
So far I have managed to retrieve the GenBank entries extract the
feature 

tables and store a reference to these in the hash mentioned above. I've
also 

got Bio::Graphics to draw a basic image but some of the details aren't
right 

and I don't understand why. I have attached the code I have so far, the
input 

file and the output image to this mail. I didn't want to display it all
in 

the main message but I'm not actually sure which bit is causing the
problem. 

The code is very rough and in need of polishing but I need to get it to
work 

correctly first.

 
These are the problems:

1) As I understand it this:

 
my $wholeseq = Bio::SeqFeature::Generic->new (

            -start => 1,

            -end => $refseq->length,

            -display_name =>$refseq->display_name

            );

 
should display the name of the gene (CD133/Prominin1) near the top of
image. 

It doesn't, am I misunderstanding or is there an error in the code?

 
2) In the quoted example the CDS is broken up into smaller regions which
are 

then linked together in example 6. This isn't happening in my code and I


think it should be, I get one solid block for the CDS. I don't
understand why 

this is because I'm not clear which parts of the feature table are used
to 

define where the CDS should be split. I think this is the relevant bit
of 

code:

 
foreach my $alt_trans (keys %main) {

      foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

 
            my $feature = $main{$alt_trans}{'features'}{$tag};

 
            $panel->add_track($feature,

                        -glyph => 'generic',

                        -bgcolor => $colors[$idx++ % @colors],

                        -fgcolor => 'black',

                        -font2color => 'black',

                        -key => $alt_trans,

                        -bump => +1,

                        -height => 8,

                        -label => 1,

                        -description => 1,

                        ) if ($tag eq 'CDS');

 
}

}

 
Can anyone tell me what I am doing wrong?

 
RefSeq entry for the gene of interest is here:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386

If I understand correctly the example file used in the HOWTO is this
gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053
20

 
Final question, does bioperl come with example scripts and is so where
whould 

they normally be found on a Linux system?

 
If anyone is still reading this thanks for your patience. Any
clarification 

will be appreciated.

 
regards,

Richard

 
From rbirnie at totalise.co.uk  Mon Dec  4 04:30:36 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 09:30:36 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/551f1442/attachment.html 

From bix at sendu.me.uk  Mon Dec  4 09:37:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:37:16 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <45706671.9000201@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>	<456F27E9.70205@york.ac.uk>
	<456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk>
Message-ID: <4574329C.2030905@sendu.me.uk>

Samantha Thompson wrote:
> Hi,
> Thanks for all your help so far, I am still trying to understand a 
> couple of things...

You should make sure your replies are sent to the list, as you're likely 
to get a faster response.


[where $blast_report is the value returned by 
Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)]
> when I run this line..
> 
> $searchio = Bio::SearchIO->new(-format <http://www.perldoc.com/perl5.6/pod/func/format.html> => 'blast',
>                                -file   => $blast_report);
> 
> between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not open 1: No such file or directory
[snip]
> Does this mean that my BLAST is failing when I submit it?

No, the -file option of SearchIO->new() takes, unsurprisingly, a 
filename. I'd tell you to pay careful attention to the docs, but sadly 
the RemoteBlast docs are currently wrong.

submit_blast() claims to return 'Blast report object' (which in any case 
certainly wouldn't be a filename) when in fact it returns, as you 
discovered, a (for our purposes) meaningless number.

As I suggested before, you need to look at the synopsis for 
Bio::Tools::Run::RemoteBlast instead.

(having called submit_blast you must do the each_rid loop)


Does anyone care to go through the POD for RemoteBlast and update it to 
an accurate state?

From bix at sendu.me.uk  Mon Dec  4 09:40:27 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:40:27 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
	<BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
Message-ID: <4574335B.805@sendu.me.uk>

rbirnie at totalise.co.uk wrote:
> Hi all,
> 
> I've just seen my previous mail come through on the digest and I noticed 
> that the code I attached has been scrubbed which means that the message 
> won't make much sense. If I've contravened list rules by posting 
> attachments then apologies, I did look for a posting guide but couldn't 
> see one on the wiki. I deliberatley didn't put the whole code in the 
> main message because it's quite long. I'm not sure which part is wrong 
> so I don't know which part to post I'm just not seeing the output I 
> would expect from the example. What is the best thing for me to do?

I saw a few attachments on your post (including your code example), so I 
think what you did was fine.

From cjfields at uiuc.edu  Mon Dec  4 10:40:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 09:40:20 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <4574335B.805@sendu.me.uk>
Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine>


> rbirnie at totalise.co.uk wrote:
> > Hi all,
> > 
> > I've just seen my previous mail come through on the digest and I 
> > noticed that the code I attached has been scrubbed which means that 
> > the message won't make much sense. If I've contravened list 
> rules by 
> > posting attachments then apologies, I did look for a 
> posting guide but 
> > couldn't see one on the wiki. I deliberatley didn't put the 
> whole code 
> > in the main message because it's quite long. I'm not sure 
> which part 
> > is wrong so I don't know which part to post I'm just not seeing the 
> > output I would expect from the example. What is the best 
> thing for me to do?
> 
> I saw a few attachments on your post (including your code 
> example), so I think what you did was fine.

Same here.  I received a PNG file and two text files (a script and a data
file).

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

 
From rbirnie at totalise.co.uk  Mon Dec  4 11:06:51 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 16:06:51 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine>
References: <002001c717ba$823c1500$15327e82@pyrimidine>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612041606510.37306@webm5.global.net.uk>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/22c3c5e0/attachment.html 

From dmessina at wustl.edu  Mon Dec  4 11:46:16 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 4 Dec 2006 10:46:16 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
References: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <ACE259C3-DC1C-41CC-88F3-7ACF8B9D66AA@wustl.edu>

Hi Richard,


> [richard]
>
> These are the problems:
> 1) As I understand it this:
>
> my $wholeseq = Bio::SeqFeature::Generic->new (
> 		-start => 1,
> 		-end => $refseq->length,
> 		-display_name =>$refseq->display_name
> 		);
>
> should display the name of the gene (CD133/Prominin1) near the top  
> of image.
> It doesn't, am I misunderstanding or is there an error in the code?

The contents of a sequence object's display_name varies depending on  
the type of sequence record; for a sequence object created from a  
Genbank record, it's the value of the LOCUS field on the first line  
of the record.

If you want the gene name, you'll have to dig it out of the feature  
table. If you look at the  Genbank record for your first sequence,  
you'll see that under both the gene and CDS primary features, the  
HUGO gene abbreviation is stored under the "gene" secondary tag, and  
various synonyms are under the "note" and "product" secondary tags.

LOCUS       NM_006017               3794 bp    mRNA    linear   PRI  
17-NOV-2006
DEFINITION  Homo sapiens prominin 1 (PROM1), mRNA.
ACCESSION   NM_006017
VERSION     NM_006017.1  GI:5174386
[...skipping irrelevant part of the Genbank record...]
FEATURES             Location/Qualifiers
      source          1..3794
                      /organism="Homo sapiens"
                      /mol_type="mRNA"
                      /db_xref="taxon:9606"
                      /chromosome="4"
                      /map="4p15.32"
      gene            1..3794
                      /gene="PROM1"
                      /note="prominin 1; synonyms: AC133, CD133, PROML1,
                      MSTP061"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
      CDS             38..2635
                      /gene="PROM1"
                      /go_component="integral to plasma membrane  
[pmid 9389720];
                      membrane"
                      /go_process="response to stimulus; visual  
perception"
                      /note="hProminin; prominin (mouse)-like 1;  
hematopoietic
                      stem cell antigen"
                      /codon_start=1
                      /product="prominin 1"
                      /protein_id="NP_006008.1"
                      /db_xref="GI:5174387"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
[....more...]

In your script, you grab the primary features between lines 34-60.  
You can grab the secondary feature you want with something like:

[cribbed from the Feature-Annotation HOWTO]
for my $feat_object ($seq_object->get_SeqFeatures) {
    push @ids, $feat_object->get_tag_values("gene") if ($feat_object- 
 >has_tag("gene"));
}


> 2) In the quoted example the CDS is broken up into smaller regions  
> which are
> then linked together in example 6. This isn't happening in my code  
> and I
> think it should be, I get one solid block for the CDS. I don't  
> understand why
> this is because I'm not clear which parts of the feature table are  
> used to
> define where the CDS should be split. I think this is the relevant  
> bit of
> code:
>
> foreach my $alt_trans (keys %main) {
> 	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {
>
> 		my $feature = $main{$alt_trans}{'features'}{$tag};
>
> 		$panel->add_track($feature,
> 				-glyph => 'generic',
> 				-bgcolor => $colors[$idx++ % @colors],
> 				-fgcolor => 'black',
> 				-font2color => 'black',
> 				-key => $alt_trans,
> 				-bump => +1,
> 				-height => 8,
> 				-label => 1,
> 				-description => 1,
> 				) if ($tag eq 'CDS');
>
> }
> }


The problem here is that RefSeq mRNA records don't contain intron- 
exon boundary information. I think you'll have to get that from an  
assembly record. From the Entrez gene page for PROM1, I obtained a  
Genbank record for the PROM1 genomic locus:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb

Saving that as 'PROM1.gb' (the suffix is important), and running the  
bp_embl2picture.pl script on it, I got an image similar to Figure 6  
(attached).

Hope this helps,
Dave


?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PROM1.png
Type: image/png
Size: 8646 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment.png 

From bix at sendu.me.uk  Mon Dec  4 14:37:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 19:37:13 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
Message-ID: <457478E9.3060405@sendu.me.uk>

Chris Fields wrote:
> Sendu,
> 
> Are current plans to still try getting the final 1.5.2 release out
> before the hackathon next week?

Yes, I seriously hope so. I was kind of hoping to see test results from 
you and Nathan on the wiki though...


> There are a few commits I want to make, but I may wait until after
> 1.5.2 is out before I add them.

But don't let the release stop you. As long as you don't commit to the
1.5.2 branch it will be fine.

From cjfields at uiuc.edu  Mon Dec  4 14:34:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 13:34:34 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine>

Sendu,

Are current plans to still try getting the final 1.5.2 release out before
the hackathon next week?  There are a few commits I want to make, but I may
wait until after 1.5.2 is out before I add them.

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Dec  4 15:23:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 14:23:45 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine>

> Chris Fields wrote:
> > Sendu,
> > 
> > Are current plans to still try getting the final 1.5.2 release out 
> > before the hackathon next week?
> 
> Yes, I seriously hope so. I was kind of hoping to see test 
> results from you and Nathan on the wiki though...

Ah, forgot to post those!  Working on that now...

> > There are a few commits I want to make, but I may wait until after
> > 1.5.2 is out before I add them.
> 
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.

There are a few things I plan on adding over the next few weeks, including
some things for Bio::Location::SplitLocation.  However I'm sure some of the
latter will break tests, so I'll be adding it in a bit at a time.

It all depends when I can squeeze time in to work on them!

chris 


From pelikan at cs.pitt.edu  Mon Dec  4 17:34:59 2006
From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu)
Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST)
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>

Hello,

    My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the
latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
memory. "make test"s past fine.

The problem is that I'm not getting similar numbers of anything when I
load datasets using load_seqdatabase.pl. For instance, if I want to load
only protiens from Homo Sapiens,
I go to UniProt,
use the database search function,
do a text search for Homo Sapiens (returns 70914 hits),
export the hits to flat file format (--format swiss) using the data set
manager,
and load it using load_seqdatabase.pl.

The result of  "select count(*) from bioentry;" results in only 1003 entries.
Moreover it seems like the entries don't go past the B's in the alphabet -
I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%',
but I can find apolipoproteins, for example.

I know this is an annoying question, but if someone has more experience in
dealing with this issue, I would be grateful for any assistance. I don't
get any error messages, so it's difficult for me to tell what's going on.

-Richard


From n.haigh at sheffield.ac.uk  Tue Dec  5 01:53:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 06:53:34 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <4575176E.3020906@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

OK, I'll get onto this today.

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>

From n.haigh at sheffield.ac.uk  Tue Dec  5 06:43:16 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 11:43:16 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <45755B54.7080902@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

I've added my test results for Debian to the wiki.
Nath

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>

From bix at sendu.me.uk  Tue Dec  5 06:47:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 05 Dec 2006 11:47:06 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <45755B54.7080902@sheffield.ac.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk>
Message-ID: <45755C3A.9050903@sendu.me.uk>

Nathan S. Haigh wrote:
> Sendu Bala wrote:
>> Chris Fields wrote:
>>   
>>> Sendu,
>>>
>>> Are current plans to still try getting the final 1.5.2 release out
>>> before the hackathon next week?
>>>     
>> Yes, I seriously hope so. I was kind of hoping to see test results from 
>> you and Nathan on the wiki though...
>
> I've added my test results for Debian to the wiki.

Thanks (and to Chris as well). I can't tell you how much I loath and 
despise TCoffee and Tmhmm now ;)

From cjfields at uiuc.edu  Tue Dec  5 11:04:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Dec 2006 10:04:38 -0600
Subject: [Bioperl-l] Build.PL changes
Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine>

Sendu,

I think the Build.PL commits which force installation of XML::SAX::Expat
should be rolled back.  XML::Simple works with any XML::SAX backend, not
just XML::SAX::Expat, which hasn't been actively maintained since 2003 and
is deprecated in favor of XML::SAX::ExpatXS.  In fact, forcing
XML::SAX::Expat to install as the default XML::SAX backend currently breaks
blastxml parsing.

Note that forcing this also forces one to install the Expat library (now at
v 2), which now has some compatibility problems with XML::SAX::Expat (but
not ExpatXS).

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From qetzal at tutopia.com.br  Wed Dec  6 10:21:20 2006
From: qetzal at tutopia.com.br (giovani)
Date: Wed, 06 Dec 2006 10:21:20 -0500
Subject: [Bioperl-l] Biodiversity graphic
Message-ID: <auto-000222418003@frontend01.cg.ifxnetworks.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061206/9d9e4a09/attachment.html 

From benoit at ebi.ac.uk  Wed Dec  6 12:30:12 2006
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 06 Dec 2006 17:30:12 +0000
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <4576FE24.1030807@ebi.ac.uk>

giovani wrote:
> 
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having troubles. 
> here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.


It looks to me that the graph you pointing too has been made by gnuplot.
Why don't you use gnuplot or R instead ?

Ben

> 
> #!/usr/bin/perl -w
> 
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
> 
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
> 
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
> 
>  
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From gwu at molbio.mgh.harvard.edu  Wed Dec  6 16:12:57 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 06 Dec 2006 16:12:57 -0500
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <45773259.3010405@molbio.mgh.harvard.edu>

Do you mean the GD code can not run or it does not generate image as you 
wanted?

Gang

giovani wrote:
>
>
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having 
> troubles. here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.
>
> #!/usr/bin/perl -w
>
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
>
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
>
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Dec  6 17:39:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 06 Dec 2006 22:39:49 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
Message-ID: <457746B5.2020006@sendu.me.uk>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.

From cjfields at uiuc.edu  Wed Dec  6 21:30:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 6 Dec 2006 20:30:44 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


From hlapp at gmx.net  Wed Dec  6 22:20:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:20:14 -0500
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net>

I seriously doubt that load_seqdatabase.pl would have deliberately  
stopped loading the file. Either there was an error in loading an  
entry (which you should see, and you can also ask the script to just  
keep going by providing the --safe option), or the file only  
contained 1003 entries.

Note that you can get progress logging by using the --logchunk  
option, which will also give you a final count of the number of  
sequences loaded.

I'm not sure how you ran your search and your download on Uniprot. If  
I try what you describe I get 70491 hits, and if I try to export them  
using the data set manager I get the message:

This download mechanism only supports 1000 proteins. The first 1000  
proteins have been added from the selected.

Which perfectly explains what you see.

Did you convince yourself that the file contains 70491 entries? If  
you don't have grep and wc on your windows machine, you can use perl  
one-liners directly, e.g.,

perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' <your-file- 
here>

	-hilmar

On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote:

> Hello,
>
>     My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC,  
> and the
> latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
> memory. "make test"s past fine.
>
> The problem is that I'm not getting similar numbers of anything when I
> load datasets using load_seqdatabase.pl. For instance, if I want to  
> load
> only protiens from Homo Sapiens,
> I go to UniProt,
> use the database search function,
> do a text search for Homo Sapiens (returns 70914 hits),
> export the hits to flat file format (--format swiss) using the data  
> set
> manager,
> and load it using load_seqdatabase.pl.
>
> The result of  "select count(*) from bioentry;" results in only  
> 1003 entries.
> Moreover it seems like the entries don't go past the B's in the  
> alphabet -
> I can't find bioentry.descriptions like '%cytochrome%' or '% 
> myoglobin%',
> but I can find apolipoproteins, for example.
>
> I know this is an annoying question, but if someone has more  
> experience in
> dealing with this issue, I would be grateful for any assistance. I  
> don't
> get any error messages, so it's difficult for me to tell what's  
> going on.
>
> -Richard
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lzhtom at hotmail.com  Wed Dec  6 22:13:47 2006
From: lzhtom at hotmail.com (zhihua li)
Date: Thu, 07 Dec 2006 03:13:47 +0000
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
Message-ID: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>

Hi netters,

Recently I found this:

For constructing a new SeqI object, I had to write:
$seq_obj=Bio::SeqIO->new(
      -file => '/home/myfile',
      -format => 'Fasta');              #Note the dash before the two 
arguments.

If I omitted the dash:
$seq_obj=Bio::SeqIO->new(
     file => '/home/myfile',
     format => 'Fasta');
I'd get error:
MSG: Unknown format given or could not determine it []
STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377

So it seems to me that the dashes before the arguments are essential.  
However, when I tried to build a factory for StandaloneBlast, I found the 
other way around.

If the script had the dash:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             -program => 'blastn',
             -database => '/home/mydatabase');

I'd get the error message: 
MSG: Unallowed parameter: - !
STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
STACK Bio::Tools::Run::StandAloneBlast::new 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400

If I left out the dash by saying:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             program => 'blastn',
             database => '/home/mydatabase');

Everyting is fine.

Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm 
not allowed to?

Thanks in advance!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From hlapp at gmx.net  Wed Dec  6 22:56:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:56:44 -0500
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <CE76F074-5897-431C-9E39-9E096DBD1973@gmx.net>

Congrats! Great work, Sendu! Don't forget to celebrate.

	-hilmar

On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From arareko at campus.iztacala.unam.mx  Wed Dec  6 22:53:21 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 06 Dec 2006 21:53:21 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <45779031.3050202@campus.iztacala.unam.mx>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Thu Dec  7 00:06:36 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Dec 2006 21:06:36 -0800
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org>

hear! hear!  Excellent work.   Thanks for leading the effort on this  
release and all of the behind the scenes work, attention to detail,   
and cat herding work it took make this possible.

-jason

On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From n.haigh at sheffield.ac.uk  Thu Dec  7 02:23:47 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 07 Dec 2006 07:23:47 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <4577C183.7010501@sheffield.ac.uk>

I know I'm very new to Bioperl development and don't know very much yet,
so I'm probably not the best person to express the views of the Bioperl
developers or users. However, I'm sure I'm safe in saying that on behalf
of everyone associated with Bioperl a *huge* thank you must go out to
Sendu for the gargantuan effort he has put into this release.

Just looking over some of the e-mails he's sent over the past few weeks
alone, it's clear that he has devoted a huge amount of time to the
effort and in some cases with little sleep. Since there is very little
(or should I say no) monetary recognition in such an important and time
consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe
in the knowledge that his efforts have helped enormously and are clearly
recognised and fully appreciated by the Bioperl community.

Therefore, I'd just like to iterate what others have already
said.....Well done, excellent work!!!

Nath

From valiente at lsi.upc.edu  Thu Dec  7 03:25:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 09:25:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
In-Reply-To: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>

The following popped out when input more the 110 species to  
taxonomy2tree script version 1.4:

         (in cleanup)
------------- EXCEPTION  -------------
MSG: Must supply a Bio::Taxon
STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
flatfile.pm:260
STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
STACK (eval) taxonomy2tree.pl:0
STACK toplevel taxonomy2tree.pl:0

Any clues? Thanks,

Gabriel

From bix at sendu.me.uk  Thu Dec  7 04:24:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:24:39 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DDD7.7060208@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Are you able to narrow the problem down? What was your command line, 
what species were you using? Does it work with the first 110 species you 
tried? Is there anything special about the 111th?

Do I understand correctly that this was a problem during cleanup only, 
and didn't affect the correctness and completeness of the result?


From bix at sendu.me.uk  Thu Dec  7 04:33:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:33:18 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DFDE.6000500@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Oh, does it work with option -e? Or does it work if you delete your old 
indexes of the nodes and names files and let it re-create them?

From valiente at lsi.upc.edu  Thu Dec  7 04:38:03 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 10:38:03 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577DDD7.7060208@sendu.me.uk>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
	<4577DDD7.7060208@sendu.me.uk>
Message-ID: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>

Hi,

If you run the attached shell script you should be able to reproduce  
the problem. It is not about any species in particular, but about the  
total number of species: it crushes with more than 120 species. The  
resulting tree is not correct, I'm checking it further now. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/00f0aeda/attachment.obj 
-------------- next part --------------

On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>> The following popped out when input more the 110 species to   
>> taxonomy2tree script version 1.4:
>>          (in cleanup)
>> ------------- EXCEPTION  -------------
>> MSG: Must supply a Bio::Taxon
>> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/  
>> flatfile.pm:260
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
>> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
>> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
>> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
>> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
>> STACK (eval) taxonomy2tree.pl:0
>> STACK toplevel taxonomy2tree.pl:0
>> Any clues? Thanks,
>
> Are you able to narrow the problem down? What was your command  
> line, what species were you using? Does it work with the first 110  
> species you tried? Is there anything special about the 111th?
>
> Do I understand correctly that this was a problem during cleanup  
> only, and didn't affect the correctness and completeness of the  
> result?


From cjfields at uiuc.edu  Thu Dec  7 10:22:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 09:22:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110species
In-Reply-To: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine>

> Hi,
> 
> If you run the attached shell script you should be able to 
> reproduce the problem. It is not about any species in 
> particular, but about the total number of species: it crushes 
> with more than 120 species. The resulting tree is not 
> correct, I'm checking it further now. Thanks,
> 
> Gabriel

Gabriel, 

My guess is this may have to do with using an old taxonomy dump file.  I got
this to work on winXP using the latest NCBI taxonomy.  I had to modify
taxonomy2tree and your shell script to get it to play nice with Windows, but
I didn't get the error and I did get a tree (abbreviated for brevity):

(((((("Agrobacterium tumefaciens str. C58","Sinorhizobium
meliloti")Rhizobiaceae,...

chris


From cjfields at uiuc.edu  Thu Dec  7 13:44:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 12:44:32 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
References: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>


On Dec 6, 2006, at 9:13 PM, zhihua li wrote:

> Hi netters,
>
> Recently I found this:
>
> For constructing a new SeqI object, I had to write:
> $seq_obj=Bio::SeqIO->new(
>      -file => '/home/myfile',
>      -format => 'Fasta');              #Note the dash before the  
> two arguments.
>
> If I omitted the dash:
> $seq_obj=Bio::SeqIO->new(
>     file => '/home/myfile',
>     format => 'Fasta');
> I'd get error:
> MSG: Unknown format given or could not determine it []
> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>
> So it seems to me that the dashes before the arguments are  
> essential.  However, when I tried to build a factory for  
> StandaloneBlast, I found the other way around.
>
> If the script had the dash:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             -program => 'blastn',
>             -database => '/home/mydatabase');
>
> I'd get the error message: MSG: Unallowed parameter: - !
> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>
> If I left out the dash by saying:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             program => 'blastn',
>             database => '/home/mydatabase');
>
> Everyting is fine.
>
> Now I'm confused. Why sometimes I have to add the dash, while  
> sometimes I'm not allowed to?
>
> Thanks in advance!

I agree that this should be more consistent.  Does anyone know the  
reasoning for this?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Thu Dec  7 14:32:21 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Dec 2006 14:32:21 -0500
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
 constructor?
In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>
Message-ID: <C19DD675.BD72%bosborne11@verizon.net>

Chris,

The latest StandAloneBlast takes "dashed parameters", as in:

 @params = (-database => 'swissprot',-outfile => 'blast1.out');
 $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

Or

 my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp",
                                                     -database=>"swissprot",
                                                     -e => 1e-20);

So that's why I asked "what version?"

Someone made the change to allow dashes in @params a few months ago and I
believe that that someone was you!

Brian O.


On 12/7/06 1:44 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> 
> On Dec 6, 2006, at 9:13 PM, zhihua li wrote:
> 
>> Hi netters,
>> 
>> Recently I found this:
>> 
>> For constructing a new SeqI object, I had to write:
>> $seq_obj=Bio::SeqIO->new(
>>      -file => '/home/myfile',
>>      -format => 'Fasta');              #Note the dash before the
>> two arguments.
>> 
>> If I omitted the dash:
>> $seq_obj=Bio::SeqIO->new(
>>     file => '/home/myfile',
>>     format => 'Fasta');
>> I'd get error:
>> MSG: Unknown format given or could not determine it []
>> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>> 
>> So it seems to me that the dashes before the arguments are
>> essential.  However, when I tried to build a factory for
>> StandaloneBlast, I found the other way around.
>> 
>> If the script had the dash:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             -program => 'blastn',
>>             -database => '/home/mydatabase');
>> 
>> I'd get the error message: MSG: Unallowed parameter: - !
>> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
>> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>> 
>> If I left out the dash by saying:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             program => 'blastn',
>>             database => '/home/mydatabase');
>> 
>> Everyting is fine.
>> 
>> Now I'm confused. Why sometimes I have to add the dash, while
>> sometimes I'm not allowed to?
>> 
>> Thanks in advance!
> 
> I agree that this should be more consistent.  Does anyone know the
> reasoning for this?
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Dec  7 14:44:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 13:44:19 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <C19DD675.BD72%bosborne11@verizon.net>
References: <C19DD675.BD72%bosborne11@verizon.net>
Message-ID: <A12BC418-6400-46FC-8383-66E21D997E56@uiuc.edu>


On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote:

> Chris,
>
> The latest StandAloneBlast takes "dashed parameters", as in:
>
>  @params = (-database => 'swissprot',-outfile => 'blast1.out');
>  $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>
> Or
>
>  my $factory = Bio::Tools::Run::StandAloneBlast->new(-program  
> =>"wublastp",
>                                                      - 
> database=>"swissprot",
>                                                      -e => 1e-20);
>
> So that's why I asked "what version?"
>
> Someone made the change to allow dashes in @params a few months ago  
> and I
> believe that that someone was you!
>
> Brian O.

Nope, I plead innocent (at least to this!).  I haven't made any  
commits to StandAloneBlast.  These were added in by Torsten (see  
commits 1.59, 1.60), so you'll need to blame/thank him...

http://tinyurl.com/y7ym9g

So they're now a bit more consistent.  That's not to say  
StandAloneBlast doesn't need some major revisions....

BTW, I didn't see a post from you asking about the version.

Chris

From akarger at CGR.Harvard.edu  Thu Dec  7 16:32:51 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 7 Dec 2006 16:32:51 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>

I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.

I'm reading in some fungal GFFs generated by Jason Stajich. I

- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq

(Code below)

Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.

If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?

I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University

P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
    # Create a new object representing the exons' gene
    my $coding_loc_obj = new Bio::Location::Split;
    foreach my $exon (@sorted_exons) {
        $coding_loc_obj->add_sub_Location($exon->location);
    }

    # Build a spliced feature representing the whole gene
    my $spliced_feat = new Bio::SeqFeature::Generic(
        -start  => $coding_loc_obj->start,
        -end    => $coding_loc_obj->end,
        -strand => $strand_num,
        -primary=> "splicedGene",
    );
    $spliced_feat->location($coding_loc_obj);

    # Attach a contig object containing the sequence
    $spliced_feat->attach_seq($contig_obj->bioperl_object);

    # Get the spliced seq and translate to protein:
    my $coding_seq = $spliced_feat->spliced_seq->seq;
    my $protein = $spliced_feat->spliced_seq->translate->seq;


From bix at sendu.me.uk  Thu Dec  7 17:45:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 7 Dec 2006 15:45:32 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release
Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.
_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cjfields at uiuc.edu  Thu Dec  7 18:00:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 16:00:43 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From kaboroev at sfu.ca  Thu Dec  7 17:26:35 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Thu, 07 Dec 2006 14:26:35 -0800
Subject: [Bioperl-l] Bio::Graphics xyplot
Message-ID: <4578951B.5050206@sfu.ca>

Hi everyone,

I'm attempting to add an xyplot of the phred quality scores to an
Bio::Graphics image, and cannot get it to work.
I have the panel with a track for both the scale and the DNA displaying
properly.  When I attempt to add the xyplot i just get a garbled track
of, what looks like, timy xyplots for each datapoint.  I have the cvs
(updated today) of bioperl-live running.  I think what I am missing is
the creation of a "Sequence Feature Group" to hold the individual points
of the plot.  However, I cannot seem to find such an object. This is
what I attempted:

-------BEGIN---CODE-----------
# start panel
my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
                      -width     => $f_seqlen*10,
                      -pad_left  => 10,
                      -pad_right => 10,
                      -grid      => 1
                      );
# add scale
$panel->add_track(arrow =>
Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
              -double  => 1,
              -tick    => 2,
              -fgcolor => 'black');
# add DNA ($feature is of type Bio::SeqFeature::Annotated)
$panel->add_track(dna => $feature);
# get list of quality scores from database
my ($pqs_value) = $dbh->selectrow_array($sql);
my @pqs_value = split(/\s/,$pqs_value);
# create track
my $track =  $panel->add_track(-glyph        => 'xyplot',
                   -graph_type   => 'points',
                   -point_symbol => 'point',
                   -max_score    => 100,
                   -min_score    => 0,
                   -scale        => 'none');
# add "subfeatures" to
for (my $i=0;$i<$f_seqlen;$i++) {
   
$track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));

}
print $panel->png();
$panel->finished;
------END---CODE----------

I also attempted to create an array of the point features and passed
that by reference to the panel "add_track" as it describes in the xyplot
documentation, but that resulted in the exact same image.

keith

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From arareko at campus.iztacala.unam.mx  Thu Dec  7 18:15:53 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 7 Dec 2006 16:15:53 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cain at cshl.edu  Thu Dec  7 17:46:09 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 07 Dec 2006 17:46:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	a	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <1165531569.2569.49.camel@localhost.localdomain>

Amir,

I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:

  http://www.sequenceontology.org/gff3.shtml

(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).

Scott


On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
> 
> I'm reading in some fungal GFFs generated by Jason Stajich. I
> 
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
> 
> (Code below)
> 
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
> 
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
> 
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
> 
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
> 
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
> 
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
> 
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> 
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment.bin 

From cjfields at uiuc.edu  Thu Dec  7 21:52:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 20:52:47 -0600
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain>
Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine>

Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented.  However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond).  It's one of the things I want to work out for the next release.

If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.

chris

> Amir,
> 
> I don't know for sure what the problem is, but here is one 
> possibility:
> the number in column 8 of a GFF file is not the frame, it is 
> the phase.
> See the GFF3 spec for a description of what the phase is:
> 
>   http://www.sequenceontology.org/gff3.shtml
> 
> (It doesn't matter if you are using GFF3 or GFF2, as the 
> phase is the same in both).
> 
> Scott
> 
> 
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features 
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be 
> > translated into a protein.
> > 
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > 
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's 
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above 
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> > 
> > (Code below)
> > 
> > Unfortunately, I get the wrong result when the GFF features 
> have frame 
> > != 0. This happens for only a few percent of the exons, but when it 
> > does, I end up translating in the wrong frame.
> > 
> > If I read the docs correctly, Location objects don't have a 
> frame. So 
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the 
> > beginning of certain exons?
> > 
> > I suspect the answer to this is that I'm going about this in 
> > completely the wrong way, in which case, please tell me how 
> I ought to be doing it.
> > 
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> > 
> > P.S. In case you want to see actual code, here it is. After using 
> > Bio::Tools::GFF to create a sorted list of features for each exon 
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> > 
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> > 
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > 
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;


From jason at bioperl.org  Thu Dec  7 21:01:33 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 7 Dec 2006 18:01:33 -0800
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org>

This was a problem in the gene prediction output I suspect, more  
recent versions of the program should have fixed this.  I do not  
currently have free time to deal with the errors in the small number  
of ORFs where this has happened.

I think you just need to do
  start -= start- (frame*strand)
for 1st exons.

You can also probably provide the 1st exon's frame to the translate  
function as another possibility but you should try and get the CDS  
correct first depending on your downstream analyses.

-jason
On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:

> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in  
> completely
> the wrong way, in which case, please tell me how I ought to be  
> doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Fri Dec  8 05:21:50 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 8 Dec 2006 15:51:50 +0530
Subject: [Bioperl-l] need help with phrap parser
Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>

Can anyone point me to a Phrap parser which parses the ace file to extract
what reads make up each contig (eg. read_a and read_b make contig1; read_d
read_e and read_z make contig2, and other information of the reads (like
whether the read is complemented or not with respect to the contig, what
region of the contig does each read contribute etc), basically the AF and BS
lines of the ACE output.

-- 
-Neeti
Even my blood says, B positive

From pmiguel at purdue.edu  Fri Dec  8 09:17:02 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 08 Dec 2006 09:17:02 -0500
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
Message-ID: <457973DE.6050900@purdue.edu>

neeti somaiya wrote:
> Can anyone point me to a Phrap parser which parses the ace file to extract
> what reads make up each contig (eg. read_a and read_b make contig1; read_d
> read_e and read_z make contig2, and other information of the reads (like
> whether the read is complemented or not with respect to the contig, what
> region of the contig does each read contribute etc), basically the AF and BS
> lines of the ACE output.
>
>   
neeti,

    To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. 
I write this because of an email sent me by David Gordon in 2001 included here 
without his permission:


> > Phrap writes BS lines which
> > indicate, for each consensus position, which read phrap uses at that
> > position to become the consensus.  These BS ("base segments") are 
> > manipulated by Consed when there are changes to the assembly, such as
> > joins, tears, removing reads, or changing the consensus.
>   
    The simplest way is:

egrep '^CO|AF|RD' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF|RD/);
}

But then you would need to parse the fields of interest. You get the 
position/strand in the contig from AF, then you get the length of the 
read from RD.

There does look like there is a part of bioperl that meant to perform 
this task--including Bio::Assembly::IO::ace but it looks like it was 
started, but never completed.

From cjfields at uiuc.edu  Fri Dec  8 10:17:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:17:31 -0600
Subject: [Bioperl-l] NAR Database Issue Papers
Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine>

For those interested, the Nucleic Acids Research Database issue papers have
been popping up in the Advance Access section of the NAR website:

http://nar.oxfordjournals.org/papbyrecent.dtl

Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are
represented.  Of particular note are a few mentions of formatting changes to
UniProt, EMBL, and other records, which should be taken care of in the
latest BioPerl release (fingers crossed!).

chris


From cjfields at uiuc.edu  Fri Dec  8 10:31:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:31:19 -0600
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <457973DE.6050900@purdue.edu>
Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine>

...
> But then you would need to parse the fields of interest. You get the 
> position/strand in the contig from AF, then you get the length of the 
> read from RD.
> 
> There does look like there is a part of bioperl that meant to perform 
> this task--including Bio::Assembly::IO::ace but it looks like it was 
> started, but never completed.

...and if anyone wants to chip in and work on it, let us know!   The various
Bio::Assembly modules are one of many areas that needs some updating.

chris


From akarger at CGR.Harvard.edu  Fri Dec  8 13:25:47 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:25:47 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>

> This was a problem in the gene prediction output I suspect, more  
> recent versions of the program should have fixed this.  I do not  
> currently have free time to deal with the errors in the small number  
> of ORFs where this has happened.
> 
> I think you just need to do
>   start -= start- (frame*strand)
> for 1st exons.

I used
    if (strand==1) {start += exon->frame}
    else {end -= exon->frame}

This took me from 90 translations that had * within the sequence to just
9, out of 5500 CDS in S bayanus.

> You can also probably provide the 1st exon's frame to the translate  
> function as another possibility but you should try and get the CDS  
> correct first depending on your downstream analyses.

Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase",
which I had never heard of before. My current, very limited,
understanding is that sometimes you'll have an exon with, say, 31 bp,
followed by an exon with 29 bp. When the intron gets spliced out, you
eventually get an mRNA of 60 bp, which translates to a protein of 20 aa.
But the second exon has a phase of 1, not 0, because you can't just
start translating at the first bp of the second exon and expect to get
nice amino acids.

By the way, whether or not phase is the same thing as frame, when I call
the frame() method on the features created by Bio::Tools::GFF, I get the
phase info. I assume that's a feature (no pun intended), not a bug?

I'm still confused as to why you would have a phase in the first exon,
though. Why not just say the CDS starts 1 or 2 bp later? (This is
probably a bio question, not a bioperl question, but a quick Google
didn't get me an answer. "Phase" isn't a very good search term.)

I guess the real question here, which Jason alludes to, is whether
SeqFeature->spliced_seq ought to take into account the phase information
of the first exon. Right now, it doesn't, so when you call
SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
where you would want spliced_seq to include the first bp or two? Should
there be an option to spliced_seq for whether you want to take phase
information into account?

I can't submit a bug report until we confirm it's a bug.

Thanks,
-Amir Karger

> -jason
> On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:
> 
> > I need to know how to get the frame information in exon features
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be
> > translated into a protein.
> >
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> >
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> >
> > (Code below)
> >
> > Unfortunately, I get the wrong result when the GFF features 
> have frame
> > != 0. This happens for only a few percent of the exons, but when it
> > does, I end up translating in the wrong frame.
> >
> > If I read the docs correctly, Location objects don't have a 
> frame. So
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the
> > beginning of certain exons?
> >
> > I suspect the answer to this is that I'm going about this in  
> > completely
> > the wrong way, in which case, please tell me how I ought to be  
> > doing it.
> >
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> >
> > P.S. In case you want to see actual code, here it is. After using
> > Bio::Tools::GFF to create a sorted list of features for each exon
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> >
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> >
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> >
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From akarger at CGR.Harvard.edu  Fri Dec  8 13:33:09 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:33:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>

> Another issue is the splittype() is not defined, though I 
> don't think that
> would kill anything as currently implemented.  However, one 
> thing we have
> passingly discussed is having Bio::Location::Split objects 
> possibly exhibit
> different (but expected) behaviors based upon the splittype() 
> (order, join,
> or bond).  It's one of the things I want to work out for the 
> next release.

Should I be writing -splittype => "JOIN" or some such in my new()?

-Amir Karger

> 
> chris
> 
> > Amir,
> > 
> > I don't know for sure what the problem is, but here is one 
> > possibility:
> > the number in column 8 of a GFF file is not the frame, it is 
> > the phase.
> > See the GFF3 spec for a description of what the phase is:
> > 
> >   http://www.sequenceontology.org/gff3.shtml
> > 
> > (It doesn't matter if you are using GFF3 or GFF2, as the 
> > phase is the same in both).
> > 
> > Scott
> > 
> > 
> > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > > I need to know how to get the frame information in exon features 
> > > (created by Bio::Tools::GFF) into a whole-gene feature 
> that will be 
> > > translated into a protein.
> > > 
> > > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > > 
> > > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > > - Create a Bio::Location::Split object containing each feature's 
> > > location
> > > - Create a Bio::SeqFeature::Generic object whose location 
> > is the above 
> > > BL::Split
> > > - Attach my contig Bio::Seq to the feature
> > > - get the protein with feature->spliced_seq->translate->seq
> > > 
> > > (Code below)
> > > 
> > > Unfortunately, I get the wrong result when the GFF features 
> > have frame 
> > > != 0. This happens for only a few percent of the exons, 
> but when it 
> > > does, I end up translating in the wrong frame.
> > > 
> > > If I read the docs correctly, Location objects don't have a 
> > frame. So 
> > > how do I get the correct spliced_seq, which skips one or 
> > two bp at the 
> > > beginning of certain exons?
> > > 
> > > I suspect the answer to this is that I'm going about this in 
> > > completely the wrong way, in which case, please tell me how 
> > I ought to be doing it.
> > > 
> > > Thanks,
> > > - Amir Karger
> > > Research Computing
> > > Life Sciences Division
> > > Harvard University
> > > 
> > > P.S. In case you want to see actual code, here it is. After using 
> > > Bio::Tools::GFF to create a sorted list of features for each exon 
> > > (basically stolen from the module POD), I:
> > >     # Create a new object representing the exons' gene
> > >     my $coding_loc_obj = new Bio::Location::Split;
> > >     foreach my $exon (@sorted_exons) {
> > >         $coding_loc_obj->add_sub_Location($exon->location);
> > >     }
> > > 
> > >     # Build a spliced feature representing the whole gene
> > >     my $spliced_feat = new Bio::SeqFeature::Generic(
> > >         -start  => $coding_loc_obj->start,
> > >         -end    => $coding_loc_obj->end,
> > >         -strand => $strand_num,
> > >         -primary=> "splicedGene",
> > >     );
> > >     $spliced_feat->location($coding_loc_obj);
> > > 
> > >     # Attach a contig object containing the sequence
> > >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > > 
> > >     # Get the spliced seq and translate to protein:
> > >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> > >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> 
> 
> 


From cjfields at uiuc.edu  Fri Dec  8 14:04:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 13:04:55 -0600
Subject: [Bioperl-l] Using frame info from GFF
	ingettinga	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>
Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine>


> > Another issue is the splittype() is not defined, though I 
> don't think 
> > that would kill anything as currently implemented.  
> However, one thing 
> > we have passingly discussed is having Bio::Location::Split objects 
> > possibly exhibit different (but expected) behaviors based upon the 
> > splittype() (order, join, or bond).  It's one of the things 
> I want to 
> > work out for the next release.
> 
> Should I be writing -splittype => "JOIN" or some such in my new()?
> 
> -Amir Karger

I missed the fact that 'JOIN' is the default splittype() from looking at the
constructor in Location::Split, so you actually don't have to explicitly set
it; apologies for that.  

If we make any changes that affect how Location::Split behaves we'll likely
leave the default splittype() as 'JOIN' as it's by far the most common join
operator.  

chris


From cjfields at uiuc.edu  Fri Dec  8 15:03:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 14:03:16 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>
Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine>

> Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> "phase", which I had never heard of before. My current, very 
> limited, understanding is that sometimes you'll have an exon 
> with, say, 31 bp, followed by an exon with 29 bp. When the 
> intron gets spliced out, you eventually get an mRNA of 60 bp, 
> which translates to a protein of 20 aa.
> But the second exon has a phase of 1, not 0, because you 
> can't just start translating at the first bp of the second 
> exon and expect to get nice amino acids.

I think the use of 'frame' here is meant relative to the DNA sequence (i.e.
ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
translation, three frames).  At least I think that's what is meant!

> By the way, whether or not phase is the same thing as frame, 
> when I call the frame() method on the features created by 
> Bio::Tools::GFF, I get the phase info. I assume that's a 
> feature (no pun intended), not a bug?
> 
> I'm still confused as to why you would have a phase in the 
> first exon, though. Why not just say the CDS starts 1 or 2 bp 
> later? (This is probably a bio question, not a bioperl 
> question, but a quick Google didn't get me an answer. "Phase" 
> isn't a very good search term.)

It could be b/c the location coordinates delineate the exon coding boundary.
It's conceivable the first exon in a sequence record is not the first exon
of the mRNA (i.e. there may be one or more exons prior to or past the exon
of interest that are in 'remote' sequence records).  Like this admittedly
extreme example (GB acc AF130134):

join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174,
AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595,
AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115,
AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428,
AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401,
AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128)

Also, the ends of the lcoation may be uncertain ('fuzzy'):

join(complement(1009..>1260),complement(AF081827.1:<1..177))

> I guess the real question here, which Jason alludes to, is whether
> SeqFeature->spliced_seq ought to take into account the phase 
> information
> of the first exon. Right now, it doesn't, so when you call
> SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
> where you would want spliced_seq to include the first bp or 
> two? Should there be an option to spliced_seq for whether you 
> want to take phase information into account?
> 
> I can't submit a bug report until we confirm it's a bug.
> 
> Thanks,
> -Amir Karger

You can already pass the frame or an offset to PrimarySeqI::translate().
Here are the args:

 Args    : -terminator    - character for terminator        default is *
           -unknown       - character for unknown           default is X
           -frame         - frame                           default is 0
           -codontable_id - codon table id                  default is 1
           -complete      - complete CDS expected           default is 0
           -throw         - throw exception if not complete default is 0
           -orf           - find 1st ORF                    default is 0
           -start         - alternative initiation codon
           -codontable    - Bio::Tools::CodonTable object
           -offset        - offset for fuzzy locations      default is 0

The offset comes from some GenBank seqfeatures which have an '\codon_start'
tag indicating which nucleotide to start translation from (1,2,3).  This is
essentially just the phase+1.  We could add a '-phase' argument for
convenience which accepts 0,1,2.

chris


From bobfreemanma at speakeasy.net  Fri Dec  8 15:47:15 2006
From: bobfreemanma at speakeasy.net (Bob Freeman)
Date: Fri, 8 Dec 2006 15:47:15 -0500
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
Message-ID: <p0623090bc19f7f46bd1d@[10.0.107.251]>

Can't seem to find a good post on this to answer my question:

Does anyone know a good way to (re)write BLAST reports in XML format? 
I've got about 30,000 reports I need to rewrite for a (good!) piece 
of java software that will only import xml formatted BLAST reports. 
Right now, all mine are plain text.

I don't think bioperl can do this yet, correct? If not, any 
suggestions, besides reblasting all 30,000? I'd like to save a few 
trees and lumps of coal.

TIA,
Bob

-- 

-----------------------------------------------------
Bob Freeman, Ph.D.
Bioinformatics consultant
51 Downer Avenue, #2
Dorchester, MA  02125
617/699.7057, vox

If brains were taxed, he'd get a refund.
-- Anonymous

From camp_boot at hotmail.com  Sun Dec 10 05:00:55 2006
From: camp_boot at hotmail.com (synapse)
Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC)
Subject: [Bioperl-l] Driver program for PestFind.pm
Message-ID: <loom.20061210T105614-429@post.gmane.org>

   Dear All, 

   I apologize in advance for my almost total lack of knowledge of perl as a 
programming language. 

   I need to use PestFind program, part of the biop_run package of bioperl. My 
understanding is that I will need a simple wrapper program that will read 
arguments from the command line, and pass them to that module. 

   - Is there such program available that I can just use?

   - Does anyone know if pestfind can work on multiple sequence files (in fasta 
format), or does it only process single sequence files?

   Thanks a lot for the feedback. 


From cjfields at uiuc.edu  Sun Dec 10 13:45:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:45:26 -0600
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <p0623090bc19f7f46bd1d@[10.0.107.251]>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
	<p0623090bc19f7f46bd1d@[10.0.107.251]>
Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu>


On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote:

> Can't seem to find a good post on this to answer my question:
>
> Does anyone know a good way to (re)write BLAST reports in XML format?
> I've got about 30,000 reports I need to rewrite for a (good!) piece
> of java software that will only import xml formatted BLAST reports.
> Right now, all mine are plain text.
>
> I don't think bioperl can do this yet, correct? If not, any
> suggestions, besides reblasting all 30,000? I'd like to save a few
> trees and lumps of coal.
>
> TIA,
> Bob

The only BioPerl writers for BLAST reports are in BSML and HTML, not  
BLAST XML.  I don't think there there have been any requests for it,  
and no one has really stepped forward to submit one.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 10 13:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:55:16 -0600
Subject: [Bioperl-l] Driver program for PestFind.pm
In-Reply-To: <loom.20061210T105614-429@post.gmane.org>
References: <loom.20061210T105614-429@post.gmane.org>
Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu>


On Dec 10, 2006, at 4:00 AM, synapse wrote:

>    Dear All,
>
>    I apologize in advance for my almost total lack of knowledge of  
> perl as a
> programming language.
>
>    I need to use PestFind program, part of the biop_run package of  
> bioperl. My
> understanding is that I will need a simple wrapper program that  
> will read
> arguments from the command line, and pass them to that module.

PestFind is part of the EMBOSS suite of programs:

http://emboss.sourceforge.net/

The PestFind module in bioperl-run is actually used via Pise.

>    - Is there such program available that I can just use?

See above

>    - Does anyone know if pestfind can work on multiple sequence  
> files (in fasta
> format), or does it only process single sequence files?
>
>    Thanks a lot for the feedback.

No idea there, but the EMBOSS docs should tell you.

chris

From cjfields at uiuc.edu  Mon Dec 11 00:38:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 23:38:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>

I am writing up a few bioperl-run modules and have a simple question,  
though I don't know if anyone knows the answer.  I was curious as to  
why parameters for most (all?) bioperl-run modules lack the '-'  
preceding them.  This came up re: StandAloneBlast last week  
(something Torsten fixed), but I noticed just about every bioperl-run  
module uses the dashless parameters.

chris


From n.haigh at sheffield.ac.uk  Mon Dec 11 01:44:25 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Mon, 11 Dec 2006 06:44:25 +0000
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457CFE49.5010201@sheffield.ac.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

No idea!

Is there any reason for/against using dashed/dashless parameters? I
suppose dshed parameters allow you to easy see which tokens on the
command line are parameters and which are values. Should modules be able
to accept both? Should dashed be preferred?

Nath

From cjfields at uiuc.edu  Mon Dec 11 08:06:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 07:06:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457CFE49.5010201@sheffield.ac.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457CFE49.5010201@sheffield.ac.uk>
Message-ID: <D223B6BF-7C0C-41BF-B267-8C07F82FDD7D@uiuc.edu>


On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple question,
>> though I don't know if anyone knows the answer.  I was curious as to
>> why parameters for most (all?) bioperl-run modules lack the '-'
>> preceding them.  This came up re: StandAloneBlast last week
>> (something Torsten fixed), but I noticed just about every bioperl-run
>> module uses the dashless parameters.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> No idea!
>
> Is there any reason for/against using dashed/dashless parameters? I
> suppose dshed parameters allow you to easy see which tokens on the
> command line are parameters and which are values. Should modules be  
> able
> to accept both? Should dashed be preferred?
>
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

I'm thinking about it from the point of consistency.  When using a  
mix of core and run modules it can be a bit confusing, particularly  
when (as pointed out in the previous thread on StandAloneBlast) you  
can use only dashed parameters with core modules, while most (all?)  
run modules only accept dashless ones (in most cases some exception  
is thrown).  Torsten fixed this in StandAloneBlast so it accepts  
both, but shouldn't this rule also apply to all run modules?

Much of this probably is probably due to the donated nature of much  
of the bioperl-run code and Jason's 'cat-herding', and I understand  
that it would be a lot of work to change this for all run modules.   
However, we could at least try to start enforcing some loose rules  
with new bioperl-run wrappers (e.g. implement WrapperBase, use core- 
like parameters, etc).

chris


From akarger at CGR.Harvard.edu  Mon Dec 11 11:20:03 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 11 Dec 2006 11:20:03 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>

Chris Fields wrote:
> 
> > Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> > "phase", which I had never heard of before. My current, very 
> > limited, understanding is that sometimes you'll have an exon 
> > with, say, 31 bp, followed by an exon with 29 bp. When the 
> > intron gets spliced out, you eventually get an mRNA of 60 bp, 
> > which translates to a protein of 20 aa.
> > But the second exon has a phase of 1, not 0, because you 
> > can't just start translating at the first bp of the second 
> > exon and expect to get nice amino acids.
> 
> I think the use of 'frame' here is meant relative to the DNA 
> sequence (i.e.
> ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
> translation, three frames).  At least I think that's what is meant!

I agree. By the way, I'd love a reference to a simple bio-explanation of
what's happening here. Google searches for "coding sequence phase" are
not all that relevant.

> > I'm still confused as to why you would have a phase in the 
> > first exon, though. Why not just say the CDS starts 1 or 2 bp 
> > later? (This is probably a bio question, not a bioperl 
> > question, but a quick Google didn't get me an answer. "Phase" 
> > isn't a very good search term.)
> 
> It could be b/c the location coordinates delineate the exon 
> coding boundary.
> It's conceivable the first exon in a sequence record is not 
> the first exon
> of the mRNA (i.e. there may be one or more exons prior to or 
> past the exon
> of interest that are in 'remote' sequence records).

That's certainly not the case here, because the files have the entire
genomes in them.

> Also, the ends of the lcoation may be uncertain ('fuzzy'):
> 
> join(complement(1009..>1260),complement(AF081827.1:<1..177))

Also not the case here. These locations aren't listed as fuzzy.

Any other thoughts?

> > I guess the real question here, which Jason alludes to, is whether
> > SeqFeature->spliced_seq ought to take into account the phase 
> > information
> > of the first exon. Right now, it doesn't, so when you call
> > SeqFeature->spliced_seq->translate, you get gibberish. Are 
> there cases
> > where you would want spliced_seq to include the first bp or 
> > two? Should there be an option to spliced_seq for whether you 
> > want to take phase information into account?
> 
> You can already pass the frame or an offset to 
> PrimarySeqI::translate().
>  We could add a '-phase' argument for
> convenience which accepts 0,1,2.

But as Jason pointed out, you should find the problem earlier. What if I
want to get the RNA sequence that will become the protein? then having a
phase arg to translate() doesn't help. Should there be a phase arg to
spliced_seq?

Which raises another bio question: at what point are the first 1 or 2 bp
dropped when you have a phase of 1 or 2? Do they appear in the mRNA? 

-Amir Karger


From bix at sendu.me.uk  Mon Dec 11 13:21:42 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 13:21:42 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457DA1B6.1060706@sendu.me.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.

I didn't follow that particular thread, but from my experience there is 
a useful distinction between bioperl options using the - as normal for 
full consistency with core (eg. -verbose), whilst the options that 
belong to the program the run module is a wrapper for do not take 
dashes. Again, this seems consistent within the run package.

I'd suggest sticking to the current pattern.


Cheers,
Sendu.

From cjfields at uiuc.edu  Mon Dec 11 15:07:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 14:07:16 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DA1B6.1060706@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
Message-ID: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>


On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple  
>> question,  though I don't know if anyone knows the answer.  I was  
>> curious as to  why parameters for most (all?) bioperl-run modules  
>> lack the '-'  preceding them.  This came up re: StandAloneBlast  
>> last week  (something Torsten fixed), but I noticed just about  
>> every bioperl-run  module uses the dashless parameters.
>
> I didn't follow that particular thread, but from my experience  
> there is a useful distinction between bioperl options using the -  
> as normal for full consistency with core (eg. -verbose), whilst the  
> options that belong to the program the run module is a wrapper for  
> do not take dashes. Again, this seems consistent within the run  
> package.

I respectfully disagree that this is a 'useful' distinction.  My main  
point is consistency.  To me, it's counterintuitive to have two  
Bioperl classes, both which inherit Bio::Root::Root, use two  
different syntaxes for any parameters passed to the constructor, even  
if some are 'program' parameters.  It's also not consistent with  
StandAloneBlast or RemoteBlast, both which are considered bioperl-run  
modules even though they are in core, and both or which use dashed  
parameters (StandAloneBlast actually allows both).  In fact, it isn't  
consistent within bioperl-run itself.   
Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a  
hashref!

Okay, judging by the previous examples, 'consistency' isn't a word I  
would use to describe bioperl-run as a whole (back to Jason's 'cat- 
herding' analogy).  It would be easier to let it slide for now,  
especially since changing them would be a serious pain, not to  
mention an API issue.  But shouldn't there be some consistency?

And what about new modules?  Do we follow the historical (possibly  
confusing) 'dashless' route, or use the core-like dashed approach  
(thus breaking from the other run modules)?

> I'd suggest sticking to the current pattern.
>
>
> Cheers,
> Sendu.

I'll allow for both, ala StandAloneBlast.  Doesn't hurt to be safe. ; >

Have fun at the hackathon!

chris

From bix at sendu.me.uk  Mon Dec 11 16:19:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 16:19:55 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
Message-ID: <457DCB7B.8050500@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I am writing up a few bioperl-run modules and have a simple 
>>> question,  though I don't know if anyone knows the answer.  I was 
>>> curious as to  why parameters for most (all?) bioperl-run modules 
>>> lack the '-'  preceding them.  This came up re: StandAloneBlast last 
>>> week  (something Torsten fixed), but I noticed just about every 
>>> bioperl-run  module uses the dashless parameters.
>>
>> I didn't follow that particular thread, but from my experience there 
>> is a useful distinction between bioperl options using the - as normal 
>> for full consistency with core (eg. -verbose), whilst the options that 
>> belong to the program the run module is a wrapper for do not take 
>> dashes. Again, this seems consistent within the run package.
> 
> I respectfully disagree that this is a 'useful' distinction.  My main 
> point is consistency.
[snip]

We're on the same page in terms of what we think would be a Good Thing, 
and allowing both ways (dashed and dashless) sounds reasonable. I was 
just suggesting why bioperl-run might be the way it was. Further to 
that, there is the practical aspect that it is a lot simpler to figure 
out which are the program options so they can be farmed out to the 
AUTOLOAD methods - again something that isn't done in core.

If you come up with some generic way of dealing with options and farming 
to AUTOLOAD, perhaps there's scope for applying it to all the run 
wrappers (ideally via one of their base classes), so they all instantly 
gain dashed-mode capability.


From cjfields at uiuc.edu  Mon Dec 11 17:05:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 16:05:56 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DCB7B.8050500@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
	<457DCB7B.8050500@sendu.me.uk>
Message-ID: <F046DB23-35C7-414A-8616-46D3C5760B49@uiuc.edu>


On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote:
...

>>
>> I respectfully disagree that this is a 'useful' distinction.  My main
>> point is consistency.
> [snip]
>
> We're on the same page in terms of what we think would be a Good  
> Thing,
> and allowing both ways (dashed and dashless) sounds reasonable. I was
> just suggesting why bioperl-run might be the way it was. Further to
> that, there is the practical aspect that it is a lot simpler to figure
> out which are the program options so they can be farmed out to the
> AUTOLOAD methods - again something that isn't done in core.

Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly  
code maintenance.  I'm somewhat neutral on the idea of using AUTOLOAD  
as a short-term solution, though using heredoc and an eval{} block  
works well for me (and shows up when using $self->can('method') or  
when checking for methods via Class::Inspector).

> If you come up with some generic way of dealing with options and  
> farming
> to AUTOLOAD, perhaps there's scope for applying it to all the run
> wrappers (ideally via one of their base classes), so they all  
> instantly
> gain dashed-mode capability.

I think that's the crux of the problem; they do not all have the same  
base class (except Bio::Root::Root).  Most use WrapperBase.  I  
thought at one point a Run-specific root module would be a good idea,  
but WrapperBase already works well.

I'll go ahead with my modules and think about it some more.  You  
could ask the powers-that-be (jason, hilmar, etc) what they think as  
well.

chris

From bosborne11 at verizon.net  Mon Dec 11 17:24:54 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 11 Dec 2006 17:24:54 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <C1A344E6.BE53%bosborne11@verizon.net>

Amir,

Google "intron phase", you will see a number of useful links.

Brian O.


On 12/11/06 11:20 AM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:

> I agree. By the way, I'd love a reference to a simple bio-explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.


From cjfields at uiuc.edu  Mon Dec 11 22:20:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 21:20:06 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <E6F0CA09-EF9F-42AF-BF67-35E4FDBCAD8C@uiuc.edu>


On Dec 11, 2006, at 10:20 AM, Amir Karger wrote:

>> I think the use of 'frame' here is meant relative to the DNA
>> sequence (i.e.
>> ORF searching, 6 frames) and the 'phase' is relative to the mRNA  
>> (i.e.
>> translation, three frames).  At least I think that's what is meant!
>
> I agree. By the way, I'd love a reference to a simple bio- 
> explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.

Ah, Brian found some links I see...

>> It could be b/c the location coordinates delineate the exon
>> coding boundary.
>> It's conceivable the first exon in a sequence record is not
>> the first exon
>> of the mRNA (i.e. there may be one or more exons prior to or
>> past the exon
>> of interest that are in 'remote' sequence records).
>
> That's certainly not the case here, because the files have the entire
> genomes in them.
>
>> Also, the ends of the lcoation may be uncertain ('fuzzy'):
>>
>> join(complement(1009..>1260),complement(AF081827.1:<1..177))
>
> Also not the case here. These locations aren't listed as fuzzy.
>
> Any other thoughts?

Which GFF files did you use?  More specifically, which genes in which  
GFF file?  I saw a reference to S. bayanus, but it's hard to work out  
what could be the problem unless we know a bit more.

>>> I guess the real question here, which Jason alludes to, is whether
>>> SeqFeature->spliced_seq ought to take into account the phase
>>> information
>>> of the first exon. Right now, it doesn't, so when you call
>>> SeqFeature->spliced_seq->translate, you get gibberish. Are
>> there cases
>>> where you would want spliced_seq to include the first bp or
>>> two? Should there be an option to spliced_seq for whether you
>>> want to take phase information into account?
>>
>> You can already pass the frame or an offset to
>> PrimarySeqI::translate().
>>  We could add a '-phase' argument for
>> convenience which accepts 0,1,2.
>
> But as Jason pointed out, you should find the problem earlier. What  
> if I
> want to get the RNA sequence that will become the protein? then  
> having a
> phase arg to translate() doesn't help. Should there be a phase arg to
> spliced_seq?

You'll also note Jason mentioned there were possible errors in the  
gene prediction programs which produced the output

spliced_seq() is supposed to return the DNA sequence of a split  
location by splicing together the sublocation sequences in their  
'join' order.  So, if the first exon was out of phase, once spliced  
they should all be out of phase to the same degree, assuming all  
exons are joined together correctly.   Translating this using the  
phase should produce the correct amino acid sequence.

Note that Jason suggested passing the frame/phase of the first exon  
to translate(), not spliced_seq().  I also suggested translate().

> Which raises another bio question: at what point are the first 1 or  
> 2 bp
> dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
>
> -Amir Karger

Any sequence present in the sublocations (exons) would be in the  
spliced sequence.  This would have to include those nucleotides in  
exons skipped b/c of the phase since they are part of the coding region.

chris

From neetisomaiya at gmail.com  Tue Dec 12 07:06:20 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:36:20 +0530
Subject: [Bioperl-l] need help in phredPhrap
Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com>

Hi,

I am running phredPharp, which runs phred, phrap and polyphred. Please refer
to the "Using a reference sequence" section of this link
http://droog.mbt.washington.edu/poly_doc50.html#REFER.
I am using the reference sequence as described in the link above.
With this I am getting the SNP positions on the contig sequence as well as
on the reference sequence.
Does anyone know if there is some output file which can also give me mapping
between contig sequence and reference sequence?
-- 
-Neeti
Even my blood says, B positive

From akarger at CGR.Harvard.edu  Tue Dec 12 11:05:43 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 12 Dec 2006 11:05:43 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>

(sorry if this thread is boring people)

Chris Fields wrote: 

> > I agree. By the way, I'd love a reference to a simple bio- 
> > explanation of
> > what's happening here. Google searches for "coding sequence 
> phase" are
> > not all that relevant.
> 
> Ah, Brian found some links I see...

Thanks, Brian! Amazing how "coding sequence phase" finds nothing but
"intron phase" finds a ton. This is why you need to actually learn
biology, rather than Googling it.

> Which GFF files did you use?  More specifically, which genes 
> in which  
> GFF file?  I saw a reference to S. bayanus, but it's hard to 
> work out  
> what could be the problem unless we know a bit more.

http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus
.20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)

c127 (for example) has two lines in that file:
sbay_c127       AUGUSTUS        mRNA    263     723     .       +
.       ID=sbay_c127-g1.1
sbay_c127       AUGUSTUS        CDS     263     723     .       +
1       Parent=sbay_c127-g1.1

Now go to gbrowse page:
http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
Type "sbay_c127:250-300" in the search box. 

As you can see from the translation track, if you start at bp 263, you
hit a stop codon after just a few aas. But if you use frame2/phase 1,
you get no stop codons all the way to the end of the contig.

> >> You can already pass the frame or an offset to
> >> PrimarySeqI::translate().
> >>  We could add a '-phase' argument for
> >> convenience which accepts 0,1,2.
> >
> >  What if I
> > want to get the RNA sequence that will become the protein? then  
> > having a
> > phase arg to translate() doesn't help. Should there be a 
> phase arg to
> > spliced_seq?
> 
> You'll also note Jason mentioned there were possible errors in the  
> gene prediction programs which produced the output

That's certainly possible. No gene prediction program will be perfect.
In this case, though, it's clear that it found a large region without
stop codons in it, and correctly identified the place to start
translating. I guess I'm just surprised that, if it found just one exon
in a gene (in the whole contig) why it would say the exon starts at 263
with a phase 1, instead of just saying it starts at 264.

> spliced_seq() is supposed to return the DNA sequence of a split  
> location by splicing together the sublocation sequences in their  
> 'join' order.  So, if the first exon was out of phase, once spliced  
> they should all be out of phase to the same degree, assuming all  
> exons are joined together correctly.   Translating this using the  
> phase should produce the correct amino acid sequence.
> 
> Note that Jason suggested passing the frame/phase of the first exon  
> to translate(), not spliced_seq().  I also suggested translate().

You're right. This brings the number of translated polypeptide sequences
that have lots of *s in them to 9 instead of 90. 

I guess I have two requests here. The first is, if a person wants to see
exactly which bps are translated to aas -- a nucelotide sequece of
exactly 3N bp starting (usually) with ATG -- then they might want an
argument to spliced_seq that skips the first one or two bp when
necessary. After all, they might want to study the DNA, not the
peptides.

The second request is for "intelligent objects". If my SeqFeatures know
that they're in phase 1, then when I call spliced_seq I want the
resulting objects to know that they're phase one, such that when I call
translate, Bioperl automatically skips the first bp or two. Admittedly,
there might be big ramifications to this.

Both requests of course made in the knowledge that Bioperl is open
source & developers have a lot to do with their time.

-Amir Karger

> > Which raises another bio question: at what point are the 
> first 1 or  
> > 2 bp
> > dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
> >
> > -Amir Karger
> 
> Any sequence present in the sublocations (exons) would be in the  
> spliced sequence.  This would have to include those nucleotides in  
> exons skipped b/c of the phase since they are part of the 
> coding region.
> 
> chris
> 


From neetisomaiya at gmail.com  Tue Dec 12 07:14:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:44:10 +0530
Subject: [Bioperl-l] needle parser in bioperl?
Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>

Hi,

Does anyone know of a bioperl parser for needle output, basically I won't
where the target sequence aligns on the template (i.e. coordinate on the
template where the taget aligns).

-- 
-Neeti
Even my blood says, B positive

From cjfields at uiuc.edu  Tue Dec 12 11:57:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 10:57:27 -0600
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>


On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

> Hi,
>
> Does anyone know of a bioperl parser for needle output, basically I  
> won't
> where the target sequence aligns on the template (i.e. coordinate  
> on the
> template where the taget aligns).
>
> -- 
> -Neeti
> Even my blood says, B positive

I answered this a number of months back:

http://tinyurl.com/yzlbx5

Basically, newer versions of EMBOSS have changed the output for the  
AlignIO::emboss parser (which parses needle).  I don't believe the  
parser has been fixed to deal with that, but Jason has pointed out  
you can use MSF output when running needle, then parse using AlignIO  
with the format set to 'msf'.

chris

From bosborne11 at verizon.net  Tue Dec 12 11:51:05 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 12 Dec 2006 11:51:05 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C1A44829.BE76%bosborne11@verizon.net>

Neeti,

EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss'
format, so you can use AlignIO to get SimpleAlign objects. The best
description of how to use SimpleAlign is the documentation in the module.

Brian O.


On 12/12/06 7:14 AM, "neeti somaiya" <neetisomaiya at gmail.com> wrote:

> Hi,
> 
> Does anyone know of a bioperl parser for needle output, basically I won't
> where the target sequence aligns on the template (i.e. coordinate on the
> template where the taget aligns).


From kaboroev at sfu.ca  Tue Dec 12 12:14:39 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Tue, 12 Dec 2006 09:14:39 -0800
Subject: [Bioperl-l] BLAST reports
Message-ID: <457EE37F.4020000@sfu.ca>

Hi everyone,

I would like to manipulate my blast results with bioperl but would also
like to have the html output of the blast.  What would be the best way
of going about this, as I don't see any write functions in any of the
blast modules I have looked at.  Would it be better to create my own
html layout from the blast data then attempt to recover this from bioperl?

keith

p.s. - does anyone know what the most informative blast "alignment view"
output is? xml i suppose?

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From cjfields at uiuc.edu  Tue Dec 12 13:45:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 12:45:05 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <E073C68D-F5FD-4C48-A3E4-925B696E956A@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:
...

> http://fungal.genome.duke.edu/annotations/sbay/gff/ 
> saccharomyces_bayanus
> .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)
>
> c127 (for example) has two lines in that file:
> sbay_c127       AUGUSTUS        mRNA    263     723     .       +
> .       ID=sbay_c127-g1.1
> sbay_c127       AUGUSTUS        CDS     263     723     .       +
> 1       Parent=sbay_c127-g1.1
>
> Now go to gbrowse page:
> http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
> Type "sbay_c127:250-300" in the search box.
>
> As you can see from the translation track, if you start at bp 263, you
> hit a stop codon after just a few aas. But if you use frame2/phase 1,
> you get no stop codons all the way to the end of the contig.

Yes, but there are two things.  First, there is no distinct start  
codon.  Second, this is what the top NCBI BLASTX hit for that  
particular exon is:

 >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of  
the exocyst complex (Sec3p, Sec5p,
Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has
the essential function of mediating polarized targeting of
secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces
cerevisiae]
  gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex  
component SEC10
  gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product
  gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex  
component [Saccharomyces cerevisiae]
Length=871

  Score =  285 bits (728),  Expect = 7e-77
  Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152  
(0%)
  Frame = +2

Query  2     
FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY  181
             +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL 
+IEKY
Sbjct  168   
YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY  227

Query  182   
SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  361
             SEMMEN 
+LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE
Sbjct  228   
SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  287

Query  362  NEFENVFIKNVKFKERLVDFESHSVIVEASMQ  457
             NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ
Sbjct  288  NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ  319


Note the query start is well into the predicted coding sequence.   
Both the lack of a start codon and the above BLASTX hit suggest this  
is not actually the first exon in the coding region.  Therefore the  
sequence retrieved from spliced_seq() is only part of the full coding  
region (it seems to lack at least one 3' exon as well).

>>>> You can already pass the frame or an offset to
>>>> PrimarySeqI::translate().
>>>>  We could add a '-phase' argument for
>>>> convenience which accepts 0,1,2.
>>>
>>>  What if I
>>> want to get the RNA sequence that will become the protein? then
>>> having a
>>> phase arg to translate() doesn't help. Should there be a
>> phase arg to
>>> spliced_seq?
>>
>> You'll also note Jason mentioned there were possible errors in the
>> gene prediction programs which produced the output
>
> That's certainly possible. No gene prediction program will be perfect.
> In this case, though, it's clear that it found a large region without
> stop codons in it, and correctly identified the place to start
> translating. I guess I'm just surprised that, if it found just one  
> exon
> in a gene (in the whole contig) why it would say the exon starts at  
> 263
> with a phase 1, instead of just saying it starts at 264.

Maybe the gene prediction didn't find the first exon, or didn't tie  
the predicted exons together.  Not unusual considering the number of  
predictions made.

>> spliced_seq() is supposed to return the DNA sequence of a split
>> location by splicing together the sublocation sequences in their
>> 'join' order.  So, if the first exon was out of phase, once spliced
>> they should all be out of phase to the same degree, assuming all
>> exons are joined together correctly.   Translating this using the
>> phase should produce the correct amino acid sequence.
>>
>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger

You may want to post these as enhancement requests to Bugzilla just  
so we can keep track.  I think passing a phase parameter to  
spliced_seq() can be easily accomplished; it's just a matter of  
returning a subseq of the spliced sequence based on the phase if  
set.  In fact, I am testing it out now.

The second may be more problematic, since there may be a time when  
one would want those extra nucleotides, so I don't think we would  
want removal of said nucleotides to be the default behavior.

Chris

From dmessina at wustl.edu  Tue Dec 12 13:44:29 2006
From: dmessina at wustl.edu (David Messina)
Date: Tue, 12 Dec 2006 12:44:29 -0600
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
References: <457EE37F.4020000@sfu.ca>
Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu>

Hi Keith,

Take a look at:
http://www.bioperl.org/wiki/HOWTO:SearchIO

You can read in a whole bunch of different blast formats (see Table  
1), and it is possible to write out in HTML. See:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output


I'm not sure what you mean by the most informative blast output. If  
you mean which one gives the most information, I'm pretty sure the  
standard Blast report has everything.


Dave


From neetisomaiya at gmail.com  Tue Dec 12 07:09:39 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:39:39 +0530
Subject: [Bioperl-l] problem in running needle
Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>

I am trying to run needle for the attached two sequence files, on a linux
machine. It says "Uncaught exception:  Assertion failed, raised at ajmem.c
:187".
Can anyone tell me what this could be coz of?

-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SEQ_1.REF
Type: application/octet-stream
Size: 44208 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq_of_contig11
Type: application/octet-stream
Size: 44344 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0003.obj 

From cjfields at uiuc.edu  Tue Dec 12 15:55:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 14:55:07 -0600
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <E5BB270E-46D1-4A8C-A268-938FF8235B67@uiuc.edu>


On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

This would be an EMBOSS error, not a BioPerl error.  Maybe the emboss  
list is the best place for this question?

http://emboss.open-bio.org/mailman/listinfo/emboss

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec 12 16:30:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 15:30:30 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:

>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger
...

Amir,

I committed some code to CVS where I added a -phase parameter option  
to SeqFeatureI::spliced_seq().  I also added some tests to SeqFeature.t.

If you run the following after creating the SeqFeature object $sf  
(the seq object is $seq):

$sf->attach_seq($seq);

for my $phase (-1..3) {
     my $spliced = $sf->spliced_seq(-phase => $phase);
     print $spliced->seq,"\n";
     print $spliced->translate->seq,"\n";
}

You should get warnings for any other value than 0, 1, or 2.

I'll also note that the sequence you are having trouble with  
(sbay_c127) is 712 bp, so it doesn't contain the complete coding  
region.  I used it in the test case in SeqFeature.t.

Chris

From boris.steipe at utoronto.ca  Tue Dec 12 16:26:14 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 12 Dec 2006 16:26:14 -0500
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <F0B737D0-8555-4723-8B8D-50DAFF522AC8@utoronto.ca>

Looks like a memory allocation problem. Your whole sequence is in one  
single line, throwing a few linebreaks in there every 80th character  
or so will probably do the trick.

HTH
Boris

On 12-Dec-06, at 7:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Derek.Fairley at bll.n-i.nhs.uk  Wed Dec 13 05:00:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Wed, 13 Dec 2006 10:00:16 -0000
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C657@bllmail.bll.n-i.nhs.uk>

Hi Keith,

>I would like to manipulate my blast results with bioperl but would also
>like to have the html output of the blast.  What would be the best way
>of going about this, as I don't see any write functions in any of the
>blast modules I have looked at.  Would it be better to create my own
>html layout from the blast data then attempt to recover this from bioperl?

Take a look at some of the example scripts here:
http://www.bioperl.org/wiki/Bioperl_scripts
Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point.

>p.s. - does anyone know what the most informative blast "alignment view"
>output is? xml i suppose?

Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls.

Derek.


-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Dec 13 13:02:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 13 Dec 2006 12:02:14 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>

I am working on a few RNA-related things related to structure and  
have a few questions, specifically about Meta data.  This is sort of  
a proposal, but I would like to get everybody's thoughts about this  
to gauge what everyone thinks.  Jason, sorry to bug you but I thought  
it might be something that would be of use phylohackathon-wise.

Heikki has several modules present which adds meta data to sequences  
(Bio::Seq::Meta).  In this case, the meta data is stored as a string  
(Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array).  In both cases  
you can have multiple types of meta data for a sequence based on a  
particular tag.  However, this also assumes that the meta data is  
somehow attached strictly to sequence data of some type.  It also  
doesn't allow for having mixed meta data types for a single sequence,  
such as attaching array data and string data to the same sequence.

Hence, I was thinking of a having a simple, generic meta data type  
(Bio::Meta), one which could encompass simple strings  
(Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other  
structured type of data.  This could be used to annotate any  
PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,  
maybe in a collection (similar to AnnotationCollection).  I thought  
something like this may be of general use for any PrimarySeq  
(quality, structure), alignments like NEXUS and Stockholm,  
SeqFeatures where structure could be stored (tRNA or riboswitches), etc.

However, this also seems to fall into the category of sequence  
annotation.  So, would it be better to have a set of Bio::Annotation  
classes used for this purpose?

Flames and jibes welcome; I'm wearing my asbestos suit today....

chris


From stewarta at nmrc.navy.mil  Wed Dec 13 20:06:14 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 13 Dec 2006 20:06:14 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>

I am trying to StandAloneBlast->blastall an array or Bio::Seq  
objects.  The documentation claims that blastall can be passed a file  
name, a Bio::Seq object, or an array of Bio::Seq objects, while the  
usage suggests that a reference to an array of Bio::Seq objects is  
what must be passed to blastall.

(from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ 
Bio/Tools/Run/StandAloneBlast.html#POD5)
Usage:
	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
Bio::Seq objects
	$blast_report = $factory->blastall(\@seq_array);

Should this be...
$report = $factory->blastall(@seq_array);
or
$report = $factory->blastall(\@seq_array);
???

And if you are blastall'ing an array of Seq objects, then does  
blastall just return one big blast report or should I be expecting an  
array of blast reports?

I've tried $report = $factory->blastall(@seq_array); which seems to  
work ok, except that when I process the results, there are only  
results for the first Seq object in the array.


-Andrew

--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From arareko at campus.iztacala.unam.mx  Wed Dec 13 20:37:27 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 13 Dec 2006 19:37:27 -0600
Subject: [Bioperl-l] BioPerl page in Wikipedia
Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx>

Folks,

I've updated a little bit of the BioPerl page in the Wikipedia. I think 
it would be nice if we expand the article a little bit more since it's 
tagged as a "stub". Here's the link:

http://en.wikipedia.org/wiki/BioPerl

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Thu Dec 14 05:54:07 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Thu, 14 Dec 2006 11:54:07 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>

Hello,
I am new bioperl and I have been trying to run the examples available in
bptutorial.pl and other basic literature. I have installed the latest
release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to
retrieve the SwissProt and EMBL databases it gives me an error. With genbank
it seems to be fine. I wonder if the installation was not successful, as  I
would expect that these databases accesses were included in the modules of
BioPerl Core. In addition, I would like to ask whether to run Clustaw within
the setting of BioPerl I need to download and install it in the same
directory in which I have installed bioperl, or is it included in the module
of Bio::Align.
I am not sure whether this is the best place to ask these very basic
questions. If not, could anyone please refer me to the proper e mail
account?
Thank you very much in advance.

Luba Pardo MD, PhD

From bix at sendu.me.uk  Thu Dec 14 09:10:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:10:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
Message-ID: <45815B63.1020003@sendu.me.uk>

Andrew Stewart wrote:
> I am trying to StandAloneBlast->blastall an array or Bio::Seq  
> objects.  The documentation claims that blastall can be passed a file  
> name,

You're referring to 'In addition, sequence input may be in the form of 
either a Bio::Seq object or or an array of Bio::Seq objects'? I agree 
its not clear, but supplying a reference to an array is still supplying 
an array. Anyway, I'll clarify it.


In any case, the usage for the method is what you should pay attention to:

> Usage:
> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
> Bio::Seq objects
> 	$blast_report = $factory->blastall(\@seq_array);
> 
> Should this be...
> $report = $factory->blastall(@seq_array);
> or
> $report = $factory->blastall(\@seq_array);
> ???

It should be exactly what it says. A reference to the array.


> And if you are blastall'ing an array of Seq objects, then does  
> blastall just return one big blast report or should I be expecting an  
> array of blast reports?

Returns : Reference to a Blast object or BPlite object
            containing the blast report.

That means, just one big object, not an array.

From bix at sendu.me.uk  Thu Dec 14 09:42:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:42:18 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
Message-ID: <458162CA.5030803@sendu.me.uk>

Luba Pardo wrote:
> Hello, I am new bioperl and I have been trying to run the examples
> available in bptutorial.pl and other basic literature. I have
> installed the latest release of bioperl 1.5.2 in a usr/local/src
> directory. Any time I try to retrieve the SwissProt and EMBL
> databases it gives me an error.

What exactly are you trying? Paste some relevant code along with the
exact error message you get when running that code.


> I wonder if the installation was not successful, as  I would expect
> that these databases accesses were included in the modules of BioPerl
> Core.

They should work with just core installed.


  In addition, I would like to ask whether to run Clustaw within
> the setting of BioPerl I need to download and install it in the same 
> directory in which I have installed bioperl, or is it included in the
> module of Bio::Align.

The ClustalW module is in the bioperl-run package, so install that in
the same way you installed bioperl (core). The actual ClustalW program 
you need to download and install according to its own instructions. You 
let Bioperl know about where you installed ClustalW by eg. setting an 
environment variable.

See 
http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION
for details.


> I am not sure whether this is the best place to ask these very basic 
> questions. If not, could anyone please refer me to the proper e mail 
> account?

Its certainly the correct place, I hope we can resolve your problems.


From neetisomaiya at gmail.com  Thu Dec 14 03:02:37 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 14 Dec 2006 13:32:37 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
	<C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>

How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?

On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).  I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.out
Type: application/octet-stream
Size: 204960 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/1416cef5/attachment-0001.obj 

From stewarta at nmrc.navy.mil  Thu Dec 14 11:34:43 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 11:34:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <45815B63.1020003@sendu.me.uk>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>

Thanks for the reply, Sendu.

So I've tried passing a reference to an array of Seq objects with the  
following code...
	
	push @blast_run, $factory->blastall(\@query);  # where @query is an  
array of Bio::Seq objects

(In case you're wondering, I'm pushing the report into an array of  
reports because I'm running several instances of blastall with  
different parameters each time.)

....and it throws me the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/ 
common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ 
Bio/Tools/Run/StandAloneBlast.pm:557
STACK: main::run_blastall ./new_blast_script.pl:215
STACK: ./new_blast_script.pl:115
-----------------------------------------------------------

And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns...
757         my $status = system($commandstring);
758
759         $self->throw("$executable call crashed: $? $commandstring 
\n")
760           unless ($status==0) ;

So it looks like the system call isn't returning a happy $status.  At  
this point I'm pretty much stuck, though.  Blastall works just fine  
if I only send it a single Seq object.  Looking at _setinput, it  
appears a reference to an array of Seq objects should end up creating  
a multi-fasta file.  The only possibilities I can think of to explain  
this is...

- The -i file isn't be created for some reason when an (ref to) array  
of Seqs is passed
- There is something wrong with the -i file that is created and sent  
to blastall.
- Something else is wrong with the $commandstring being sent to the  
system call.

Does anyone see something here that I don't?


Thanks,
Andrew


On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:

> Andrew Stewart wrote:
>> I am trying to StandAloneBlast->blastall an array or Bio::Seq   
>> objects.  The documentation claims that blastall can be passed a  
>> file  name,
>
> You're referring to 'In addition, sequence input may be in the form  
> of either a Bio::Seq object or or an array of Bio::Seq objects'? I  
> agree its not clear, but supplying a reference to an array is still  
> supplying an array. Anyway, I'll clarify it.
>
>
> In any case, the usage for the method is what you should pay  
> attention to:
>
>> Usage:
>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of   
>> Bio::Seq objects
>> 	$blast_report = $factory->blastall(\@seq_array);
>> Should this be...
>> $report = $factory->blastall(@seq_array);
>> or
>> $report = $factory->blastall(\@seq_array);
>> ???
>
> It should be exactly what it says. A reference to the array.
>
>
>> And if you are blastall'ing an array of Seq objects, then does   
>> blastall just return one big blast report or should I be expecting  
>> an  array of blast reports?
>
> Returns : Reference to a Blast object or BPlite object
>            containing the blast report.
>
> That means, just one big object, not an array.


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Thu Dec 14 12:03:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 11:03:12 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>


On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?

The error pops up when the executable returns a bad status, so maybe  
it's choking on too many input sequences (i.e. Bioperl is doing  
everything correctly, but you are attempting to BLAST too many  
sequences in one go).  How many sequences are you attempting to use  
as input?  What happens when you use fewer input sequences?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 12:49:45 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 12:49:45 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>

> So can you look at the tempfile that is created and see if it is sane?
>
> Set -save_tempfiles => 1 whene you initialize the factory object or do
> $factory->save_tempfiles(1)
> before calling the blastall.
>
> -jason
>

Jason,
I was actually wondering how to do that.  Thanks.  Odd though, it  
still doesn't seem to be saving the tempfiles.  Might not matter  
though, because...

> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>

I was processing 738 sequences for input.  I cut that down to 20  
sequences and I'm getting some other exception thrown further  
downstream, so it appears you may be correct.  You don't happen to  
know what the max number of sequences that blastall allows for input,  
would ya? ;)  I suppose I'll have to break @query down into smaller  
doses or something.

Thanks,
Andrew


On Dec 14, 2006, at 12:03 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:
>
>> Thanks for the reply, Sendu.
>>
>> So I've tried passing a reference to an array of Seq objects with the
>> following code...
>> 	
>> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
>> array of Bio::Seq objects
>>
>> (In case you're wondering, I'm pushing the report into an array of
>> reports because I'm running several instances of blastall with
>> different parameters each time.)
>>
>> ....and it throws me the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  - 
>> d  "/
>> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
>> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ 
>> lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
>> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ 
>> perl5/5.8.6/
>> Bio/Tools/Run/StandAloneBlast.pm:557
>> STACK: main::run_blastall ./new_blast_script.pl:215
>> STACK: ./new_blast_script.pl:115
>> -----------------------------------------------------------
>>
>> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
>> returns...
>> 757         my $status = system($commandstring);
>> 758
>> 759         $self->throw("$executable call crashed: $? $commandstring
>> \n")
>> 760           unless ($status==0) ;
>>
>> So it looks like the system call isn't returning a happy $status.  At
>> this point I'm pretty much stuck, though.  Blastall works just fine
>> if I only send it a single Seq object.  Looking at _setinput, it
>> appears a reference to an array of Seq objects should end up creating
>> a multi-fasta file.  The only possibilities I can think of to explain
>> this is...
>>
>> - The -i file isn't be created for some reason when an (ref to) array
>> of Seqs is passed
>> - There is something wrong with the -i file that is created and sent
>> to blastall.
>> - Something else is wrong with the $commandstring being sent to the
>> system call.
>>
>> Does anyone see something here that I don't?
>
> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From Derek.Fairley at bll.n-i.nhs.uk  Thu Dec 14 12:58:10 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Thu, 14 Dec 2006 17:58:10 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>

Neeti,

 
>From http://emboss.sourceforge.net/apps/cvs/needle.html:

 
"The results can be output in one of several styles by using the
command-line qualifier -aformat xxx, where 'xxx' is replaced by the name
of the required format. Some of the alignment formats can cope with an
unlimited number of sequences, while others are only for pairs of
sequences. 

 
The available multiple alignment format names are: unknown, multiple,
simple, fasta, msf, trace, srs 

 
The available pairwise alignment format names are: pair, markx0, markx1,
markx2, markx3, markx10, srspair, score 

 
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
information on alignment formats."

 
Not sure based on this whether you can get pairwise alignment in .msf
format; can't think of a good reason why not. The BioPerl Align::IO
module will allow you to parse alignments in .msf format.

 
HTH,

 
Derek.

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

 
How do I run needle specifying that I want the MSF format, on a linux
box?

The help doesnt show me any format option. Is there anything available
to

pasre MSF format?

Please find an example alignment file attached. Here the seq_of_contig

aligns with the reference sequence (i.e. SEQ_1.REF) starting at position

(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from
the

output alignment, how can I parse the result to get this?

 
On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:

>

>

> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

>

> > Hi,

> >

> > Does anyone know of a bioperl parser for needle output, basically I

> > won't

> > where the target sequence aligns on the template (i.e. coordinate

> > on the

> > template where the taget aligns).

> >

> > --

> > -Neeti

> > Even my blood says, B positive

>

> I answered this a number of months back:

>

> http://tinyurl.com/yzlbx5

>

> Basically, newer versions of EMBOSS have changed the output for the

> AlignIO::emboss parser (which parses needle).  I don't believe the

> parser has been fixed to deal with that, but Jason has pointed out

> you can use MSF output when running needle, then parse using AlignIO

> with the format set to 'msf'.

>

> chris

>

 
-- 

-Neeti

Even my blood says, B positive


From cjfields at uiuc.edu  Thu Dec 14 13:36:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 12:36:09 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>


On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:

>> So can you look at the tempfile that is created and see if it is  
>> sane?
>>
>> Set -save_tempfiles => 1 whene you initialize the factory object  
>> or do
>> $factory->save_tempfiles(1)
>> before calling the blastall.
>>
>> -jason
>>
>
> Jason,
> I was actually wondering how to do that.  Thanks.  Odd though, it
> still doesn't seem to be saving the tempfiles.  Might not matter

That needs to be checked out.  Can anyone verify that?

>> The error pops up when the executable returns a bad status, so
>> maybe it's choking on too many input sequences (i.e. Bioperl is
>> doing everything correctly, but you are attempting to BLAST too
>> many sequences in one go).  How many sequences are you attempting
>> to use as input?  What happens when you use fewer input sequences?
>>
>> chris
>>
>
> I was processing 738 sequences for input.  I cut that down to 20
> sequences and I'm getting some other exception thrown further
> downstream, so it appears you may be correct.  You don't happen to
> know what the max number of sequences that blastall allows for input,
> would ya? ;)  I suppose I'll have to break @query down into smaller
> doses or something.
>
> Thanks,
> Andrew

It was a shot in the dark, really.  The fact that the return status  
was bad could be due to a number of problems (permissions issues, bad  
data, etc).  The fact that a single sequence worked indicated that  
permissions and output format likely weren't to blame.  The only  
other thing left was a problem with blastall itself.

BTW, the blast docs do not indicate whether there is a maximum number  
of sequences.  There may be a point where available memory becomes  
the limiting issue.

chris


From vaughn at cshl.edu  Thu Dec 14 14:09:34 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Thu, 14 Dec 2006 14:09:34 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking
Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>

Dear all,

I'm trying to bring some of my code into compliance with the BioPerl  
1.5.2 and am running into some design decisions that I am unclear on.  
Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking  
of the 'type' against SOFA? It seems to me that this should be  
optional behavior as is the case with the Bio::FeatureIO family. I'd  
be happy to write the patch if there is any agreement with me on this  
case.

Thanks,

Matt

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2413 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/59a9ac32/attachment.bin 

From jason at bioperl.org  Thu Dec 14 11:59:20 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Dec 2006 11:59:20 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org>

So can you look at the tempfile that is created and see if it is sane?

Set -save_tempfiles => 1 whene you initialize the factory object or do
$factory->save_tempfiles(1)
before calling the blastall.

-jason
On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?
>
>
> Thanks,
> Andrew
>
>
>
> On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:
>
>> Andrew Stewart wrote:
>>> I am trying to StandAloneBlast->blastall an array or Bio::Seq
>>> objects.  The documentation claims that blastall can be passed a
>>> file  name,
>>
>> You're referring to 'In addition, sequence input may be in the form
>> of either a Bio::Seq object or or an array of Bio::Seq objects'? I
>> agree its not clear, but supplying a reference to an array is still
>> supplying an array. Anyway, I'll clarify it.
>>
>>
>> In any case, the usage for the method is what you should pay
>> attention to:
>>
>>> Usage:
>>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of
>>> Bio::Seq objects
>>> 	$blast_report = $factory->blastall(\@seq_array);
>>> Should this be...
>>> $report = $factory->blastall(@seq_array);
>>> or
>>> $report = $factory->blastall(\@seq_array);
>>> ???
>>
>> It should be exactly what it says. A reference to the array.
>>
>>
>>> And if you are blastall'ing an array of Seq objects, then does
>>> blastall just return one big blast report or should I be expecting
>>> an  array of blast reports?
>>
>> Returns : Reference to a Blast object or BPlite object
>>            containing the blast report.
>>
>> That means, just one big object, not an array.
>
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stewarta at nmrc.navy.mil  Thu Dec 14 16:23:07 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 16:23:07 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
	<97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
Message-ID: <E1CF879B-7A07-4CE7-A0D0-C7749ECFF8FC@nmrc.navy.mil>

> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris

Interesting.  I ran the 738-sequence dataset through blastall  
manually and the report only returned 198 of the 738 expected  
results.  Not only that, it seems to have just cut off right in the  
middle of the 198th result and a Segmentation fault was reported.   I  
removed the 198th sequence, wondering if it might be some issue with  
the input, and the segmentation fault occured again with the results  
ending on the 210th result.  I stuck the 198th sequence back in, but  
at the start of the file and sure enough the Segmentation error  
occurred earlier.  I think we can rule out the size of the input or  
number of sequences as the source of error here.  I'm more inclined  
to think it has something to do with the blast databases being  
queried against.

I found an old discussion on a problem that sounds fairly similar to  
this one, for anyone interested.
http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html

I think I'll try to work around the problem for now.

andrew


On Dec 14, 2006, at 1:36 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:
>
>>> So can you look at the tempfile that is created and see if it is  
>>> sane?
>>>
>>> Set -save_tempfiles => 1 whene you initialize the factory object  
>>> or do
>>> $factory->save_tempfiles(1)
>>> before calling the blastall.
>>>
>>> -jason
>>>
>>
>> Jason,
>> I was actually wondering how to do that.  Thanks.  Odd though, it
>> still doesn't seem to be saving the tempfiles.  Might not matter
>
> That needs to be checked out.  Can anyone verify that?
>
>>> The error pops up when the executable returns a bad status, so
>>> maybe it's choking on too many input sequences (i.e. Bioperl is
>>> doing everything correctly, but you are attempting to BLAST too
>>> many sequences in one go).  How many sequences are you attempting
>>> to use as input?  What happens when you use fewer input sequences?
>>>
>>> chris
>>>
>>
>> I was processing 738 sequences for input.  I cut that down to 20
>> sequences and I'm getting some other exception thrown further
>> downstream, so it appears you may be correct.  You don't happen to
>> know what the max number of sequences that blastall allows for input,
>> would ya? ;)  I suppose I'll have to break @query down into smaller
>> doses or something.
>>
>> Thanks,
>> Andrew
>
> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From lincoln.stein at gmail.com  Thu Dec 14 15:24:56 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 15:24:56 -0500
Subject: [Bioperl-l] Bio::Graphics xyplot
In-Reply-To: <4578951B.5050206@sfu.ca>
References: <4578951B.5050206@sfu.ca>
Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com>

Hi,

The way it works is that you create a single feature that spans the entire
range of the xyplot. It contains subfeatures, each of which has a score. The
graph points correspond to each of the subfeatures.

Lincoln

On 12/7/06, Keith Anthony Boroevich <kaboroev at sfu.ca> wrote:
>
> Hi everyone,
>
> I'm attempting to add an xyplot of the phred quality scores to an
> Bio::Graphics image, and cannot get it to work.
> I have the panel with a track for both the scale and the DNA displaying
> properly.  When I attempt to add the xyplot i just get a garbled track
> of, what looks like, timy xyplots for each datapoint.  I have the cvs
> (updated today) of bioperl-live running.  I think what I am missing is
> the creation of a "Sequence Feature Group" to hold the individual points
> of the plot.  However, I cannot seem to find such an object. This is
> what I attempted:
>
> -------BEGIN---CODE-----------
> # start panel
> my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
>                       -width     => $f_seqlen*10,
>                       -pad_left  => 10,
>                       -pad_right => 10,
>                       -grid      => 1
>                       );
> # add scale
> $panel->add_track(arrow =>
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
>               -double  => 1,
>               -tick    => 2,
>               -fgcolor => 'black');
> # add DNA ($feature is of type Bio::SeqFeature::Annotated)
> $panel->add_track(dna => $feature);
> # get list of quality scores from database
> my ($pqs_value) = $dbh->selectrow_array($sql);
> my @pqs_value = split(/\s/,$pqs_value);
> # create track
> my $track =  $panel->add_track(-glyph        => 'xyplot',
>                    -graph_type   => 'points',
>                    -point_symbol => 'point',
>                    -max_score    => 100,
>                    -min_score    => 0,
>                    -scale        => 'none');
> # add "subfeatures" to
> for (my $i=0;$i<$f_seqlen;$i++) {
>
>
> $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));
>
> }
> print $panel->png();
> $panel->finished;
> ------END---CODE----------
>
> I also attempted to create an array of the point features and passed
> that by reference to the panel "add_track" as it describes in the xyplot
> documentation, but that resulted in the exact same image.
>
> keith
>
> --
> ><)))?> -cGRASP- <?(((><
> Keith Anthony Boroevich
> Davidson Lab
> Dept of Molecular Biology
> Simon Fraser University
> Tel: 604-268-7276
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Thu Dec 14 17:15:07 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 17:15:07 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
Message-ID: <4581CCEB.20206@sendu.me.uk>

Matthew Vaughn wrote:
> Dear all,
> 
> I'm trying to bring some of my code into compliance with the BioPerl 
> 1.5.2 and am running into some design decisions that I am unclear on. 
> Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> the 'type' against SOFA? It seems to me that this should be optional 
> behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> write the patch if there is any agreement with me on this case.

Lots of people seem to have worked on it over the years, but perhaps 
Scott Cain is the person to talk to?

revision 1.4
date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
two things:
   * adding SOFA as an available ontology to DocumentRegistry.pm
   * modifying FeatureIO::gff to use SOFA to validate, and to parse 
Ontology_term

From lincoln.stein at gmail.com  Thu Dec 14 16:56:41 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 16:56:41 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>

Hi All,

I'm afraid that the xyplot glyph that is in the recent bioperl release has
an error that causes the content to be printed to the right of the correct
position. Unfortunately this wasn't caught before the release because the
glyph was only tested on very large (whole genome) features.

You will need to do a CVS update to get a fixed version from bioperl-live. A
future bugfix release of gbrowse will patch this glyph for you
automatically.

Lincoln

On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>
> Hi,
> I'm having a problem getting features and an xyplot properly aligned in
> Gbrowse.  For example, see this page:
>
> http://tinyurl.com/ylbq3q
>
> The feature in the "CENPK SNPs" track should actually be around the peak
> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP feature
> is at position 79, and the xyplot axes and data should span from 61 - 95.
> However, as you can see, the data in the xyplot are oddly separated from
> the axes (which seem to be in the correct place), with the data shifted over
> to about position 120-155.
> This occurs elsewhere, not just at the ends of the chromosomes.
>
> When I zoom to ~80 bp, all is well, see:
>
> http://tinyurl.com/yzav8k
>
> The relevant snippets from the GFF and the config files are below.
>
> Thanks!
> Kara
>
> GFF:
>
> chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> is 2.24506
> chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> is 3.26837
> chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> is 1.39938
> chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> is 1.4039
> chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> is 9.16134
> chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> is 10.1413
> chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> is 12.9256
> chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> is 13.195
> chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> is 22.7127
> chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> is 23.8289
> chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> is 21.9123
> chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> is 28.3344
> chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> is 35.0436
> chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> is 37.361
> chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> is 39.5408
> chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> is 28.2008
> chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> is 32.6254
> chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> is 36.0832
> chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> is 32.1205
> chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> is 41.3048
> chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> is 30.7975
> chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> is 29.4282
> chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> is 35.3586
> chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> is 34.1426
> chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> is 30.2966
> chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> is 17.8402
> chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> is 15.2637
> chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> is 12.657
> chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> is 10.2033
> chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> is 9.40143
> chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> is 6.56273
> chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> is 3.66211
> chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> is 0.394194
>
> CONFIG:
>
>
> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>
> [CENPK_all_scores_graph]
> feature = GRAPH_CENPK:SNPScanner
> glyph = xyplot
> graph_type = boxes
> fgcolor = purple
> bgcolor = purple
> height = 100
> min_score = 0
> max_score = 110
> label = 0
> key = CENPK prediction signal
> link =
> category = SNPs: signal graphs
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From dmessina at wustl.edu  Thu Dec 14 20:45:24 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 19:45:24 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>

Hey Chris,

My thoughts below.

> [Chris]
> This could be used to annotate any
> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
> maybe in a collection (similar to AnnotationCollection).  I thought
> something like this may be of general use for any PrimarySeq
> (quality, structure), alignments like NEXUS and Stockholm,
> SeqFeatures where structure could be stored (tRNA or riboswitches),  
> etc.
>
> However, this also seems to fall into the category of sequence
> annotation.  So, would it be better to have a set of Bio::Annotation
> classes used for this purpose?


To me, all meta data is equal. That is, your classic Genbank feature  
annotation and a user's arbitrary meta-tag like "Bob thinks this is a  
kinase domain" aren't different in kind even if they are different in  
content.

As resequencing projects multiply, the ability to create arbitrary  
meta tags, attach them to different types of objects, and use those  
tags to link them together will become desirable, if not essential.

Keeping a common interface to all of these meta data types would be  
advantageous, plus new users won't have to determine whether they  
need to use Bio::Meta objects or Bio::Annotation objects.

So I would argue for all of the meta data types to live "under one  
roof". Which roof isn't as important. Bio::Annotation, since it  
already exists for today's meta data, seems like a reasonable choice.  
(assuming Annotation objects are flexible enough to be extended as  
you propose)

There, and no flames or jibes even. :)

Dave

From cjfields at uiuc.edu  Thu Dec 14 21:21:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 20:21:10 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>


On Dec 14, 2006, at 7:45 PM, David Messina wrote:

> Hey Chris,
>
> My thoughts below.
>
>> [Chris]
>> This could be used to annotate any
>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>> maybe in a collection (similar to AnnotationCollection).  I thought
>> something like this may be of general use for any PrimarySeq
>> (quality, structure), alignments like NEXUS and Stockholm,
>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>> etc.
>>
>> However, this also seems to fall into the category of sequence
>> annotation.  So, would it be better to have a set of Bio::Annotation
>> classes used for this purpose?
>
>
> To me, all meta data is equal. That is, your classic Genbank feature
> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
> kinase domain" aren't different in kind even if they are different in
> content.
>
> As resequencing projects multiply, the ability to create arbitrary
> meta tags, attach them to different types of objects, and use those
> tags to link them together will become desirable, if not essential.
>
> Keeping a common interface to all of these meta data types would be
> advantageous, plus new users won't have to determine whether they
> need to use Bio::Meta objects or Bio::Annotation objects.
>
> So I would argue for all of the meta data types to live "under one
> roof". Which roof isn't as important. Bio::Annotation, since it
> already exists for today's meta data, seems like a reasonable choice.
> (assuming Annotation objects are flexible enough to be extended as
> you propose)
>
> There, and no flames or jibes even. :)

I guess what I want to know is whether there should to be a  
distinction between 'normal' sequence annotation (comments,  
references, and so on) and annotation that could be best described as  
position-specific (like RNA or protein structural annotation).  The  
current meta implementation is for sequence data only; I felt it  
would be nice to have a generic implementation that would be  
applicable to any object data.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu>

And it all seemed so clear to me when I wrote it. :)

> whether there should to be a distinction

I would argue no because it would contravene a s


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <E4629E7B-E42C-4B93-869F-FE26035052A0@wustl.edu>

[oops, accidentally hit send midsentence]


And it all seemed so clear to me when I wrote it. :)


> whether there should to be a distinction

I would argue no because it would contravene a standard interface.


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


Dave

From neetisomaiya at gmail.com  Fri Dec 15 00:21:42 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 10:51:42 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>

Hi,

Thanks a lot for your response.
I ran needle like this
 /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
get the alignment start and stop coordinates on the sequence. I mean
something like hsp->query->start which gives us the alignment start position
on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate
where the alignment starts on the sequence.

~Neeti.

On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
>  Neeti,
>
>
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html:
>
>
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
>
>
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
>
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
>
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
>
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
>
>
> HTH,
>
>
>
> Derek.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
>
>
> How do I run needle specifying that I want the MSF format, on a linux box?
>
> The help doesnt show me any format option. Is there anything available to
>
> pasre MSF format?
>
> Please find an example alignment file attached. Here the seq_of_contig
>
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
>
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
>
> output alignment, how can I parse the result to get this?
>
>
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
> >
>
> >
>
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> >
>
> > > Hi,
>
> > >
>
> > > Does anyone know of a bioperl parser for needle output, basically I
>
> > > won't
>
> > > where the target sequence aligns on the template (i.e. coordinate
>
> > > on the
>
> > > template where the taget aligns).
>
> > >
>
> > > --
>
> > > -Neeti
>
> > > Even my blood says, B positive
>
> >
>
> > I answered this a number of months back:
>
> >
>
> > http://tinyurl.com/yzlbx5
>
> >
>
> > Basically, newer versions of EMBOSS have changed the output for the
>
> > AlignIO::emboss parser (which parses needle).  I don't believe the
>
> > parser has been fixed to deal with that, but Jason has pointed out
>
> > you can use MSF output when running needle, then parse using AlignIO
>
> > with the format set to 'msf'.
>
> >
>
> > chris
>
> >
>
>
>
>
>
>
>
> --
>
> -Neeti
>
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive

From Derek.Fairley at bll.n-i.nhs.uk  Fri Dec 15 04:57:35 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Fri, 15 Dec 2006 09:57:35 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>

Neeti,

In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. 

Derek.


-----Original Message-----
From: neeti somaiya [mailto:neetisomaiya at gmail.com] 
Sent: 15 December 2006 05:22
To: Fairley, Derek; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

Hi,

Thanks a lot for your response.
I ran needle like this 
?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence.

~Neeti.
On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
Neeti,
?
>From http://emboss.sourceforge.net/apps/cvs/needle.html :
?
"The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. 
?
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs 
?
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score 
?
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats."
?
Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format.
?
HTH,
?
Derek.
?
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?
?
How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?
?
On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5 
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).? I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>
?
?
?
-- 
-Neeti
Even my blood says, B positive


-- 
-Neeti
Even my blood says, B positive 


From cain at cshl.edu  Fri Dec 15 00:01:36 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 15 Dec 2006 00:01:36 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <4581CCEB.20206@sendu.me.uk>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
Message-ID: <1166158897.2569.335.camel@localhost.localdomain>

As much as I would like to take credit for this :-)  Allen Day wrote the
original code, and then Chris Fields tried to fix it so that it actually
worked :-)  I think it would be a good idea to have a validate_terms
option like Bio::FeatureIO::gff.

Scott

On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote:
> Matthew Vaughn wrote:
> > Dear all,
> > 
> > I'm trying to bring some of my code into compliance with the BioPerl 
> > 1.5.2 and am running into some design decisions that I am unclear on. 
> > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> > the 'type' against SOFA? It seems to me that this should be optional 
> > behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> > write the patch if there is any agreement with me on this case.
> 
> Lots of people seem to have worked on it over the years, but perhaps 
> Scott Cain is the person to talk to?
> 
> revision 1.4
> date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
> two things:
>    * adding SOFA as an available ontology to DocumentRegistry.pm
>    * modifying FeatureIO::gff to use SOFA to validate, and to parse 
> Ontology_term
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/021ec42f/attachment.bin 

From neetisomaiya at gmail.com  Fri Dec 15 07:46:08 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 18:16:08 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>

I ran needle like this

/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out

Please find the output attached.

When I run the following :-

use Bio::SearchIO;

my $io = Bio::SearchIO->new(-file   => "1.out",
                           -format => "fasta" );

while ( my $result = $io->next_result() )
{
       while( my $hit = $result->next_hit)
      {

               print "yes\n";
       }
}


It says :-

-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------

What should I do?

~Neeti.

On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
> Neeti,
>
> In lieu of a response from a BioPerl guru... why not use Needle to
> generate your pairwise alignment in fasta format, rather than msf format?
> The sequence you want should correspond to a single HSP which you can get
> directly from the fasta alignment with Bio::SearchIO:
> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use
> Bio::AlignIO at all.
>
> Derek.
>
>
> -----Original Message-----
> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
> Sent: 15 December 2006 05:22
> To: Fairley, Derek; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> Hi,
>
> Thanks a lot for your response.
> I ran needle like this
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
> It gave me the output in format msf.
> But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
> get the alignment start and stop coordinates on the sequence. I mean
> something like hsp->query->start which gives us the alignment start position
> on query sequence in a blast output when using Bio::SearchIO.
> Please help.
> Like I explained with an example in my previous mail, I want the
> coordinate where the alignment starts on the sequence.
>
> ~Neeti.
> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
> Neeti,
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
> HTH,
>
> Derek.
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> How do I run needle specifying that I want the MSF format, on a linux box?
> The help doesnt show me any format option. Is there anything available to
> pasre MSF format?
> Please find an example alignment file attached. Here the seq_of_contig
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
> output alignment, how can I parse the result to get this?
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
> >
> >
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > Does anyone know of a bioperl parser for needle output, basically I
> > > won't
> > > where the target sequence aligns on the template (i.e. coordinate
> > > on the
> > > template where the taget aligns).
> > >
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> >
> > I answered this a number of months back:
> >
> > http://tinyurl.com/yzlbx5
> >
> > Basically, newer versions of EMBOSS have changed the output for the
> > AlignIO::emboss parser (which parses needle). I don't believe the
> > parser has been fixed to deal with that, but Jason has pointed out
> > you can use MSF output when running needle, then parse using AlignIO
> > with the format set to 'msf'.
> >
> > chris
> >
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.out
Type: application/octet-stream
Size: 90277 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/34b05d03/attachment-0001.obj 

From jason at bioperl.org  Fri Dec 15 09:28:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:28:13 -0500
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>


On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>
>> Hey Chris,
>>
>> My thoughts below.
>>
>>> [Chris]
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> etc.
>>>
>>> However, this also seems to fall into the category of sequence
>>> annotation.  So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>>
>>
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> content.
>>
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>>
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>>
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>>
>> There, and no flames or jibes even. :)
>
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation).  The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.

my stream-of-consciousness for right now:

I was thinking Bio::Annotation is where this should go - that system  
doesn't have anything about it that makes it explicitly sequence  
related. What we're trying to hammer out here on the Alignment side -  
which fits with your RNA example - is have features, basically  
SeqFeatures - associated with alignments so columns can be annotated  
to cover things like character sets and partitions for phylogenetic  
analyses.  As for data which annotates non-contiguous things like  
RNAstems we may have  to be more creative about that or model it with  
a splitLocation.

So currently we've added code so that an Alignment is-a  
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
end, with the goal of being able to capture more of the data that can  
be represented in a NEXUS file.

It feels more like a hack than an elegant Meta-data solution, but I  
am totally sure whether the data you are thinking about doing at this  
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of  
the data into features or annotations is confusing users?


From jason at bioperl.org  Fri Dec 15 09:48:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:48:32 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
	<764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org>

I get the impression you are trying to use the wrong tool for the  
job.  Can you explain a little more generally what you want to do?

Semantically FASTA in Bio::SearchIO is much different from FASTA in  
Bio::AlignIO.  We explain this on the wiki, please have a look on the  
FASTA page.

  do not use Bio::SearchIO to parse multi-fasta alignment output  
Bio::SearchIO is for pairwise alignment reports
  use Bio::AlignIO for a multi-fasta format or for msf - you just  
provide a different field to '-format'.

But none of that is going to help you get start/end for your  
alignment because that is not part of the output format - do the  
experiment of looking at the file and figuring out what are the  
actual fields you want output, if they don't exist then you either  
have a format that won't work for your question, or you will have to  
calculate additional .  If you trying to align transcripts to genome  
please consider tools that are built for it (and referenced on the  
wiki like Sim4, est2genome, exonerate, BLAT).

-jason
On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote:

> I ran needle like this
>
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out
>
> Please find the output attached.
>
> When I run the following :-
>
> use Bio::SearchIO;
>
> my $io = Bio::SearchIO->new(-file   => "1.out",
>                           -format => "fasta" );
>
> while ( my $result = $io->next_result() )
> {
>       while( my $hit = $result->next_hit)
>      {
>
>               print "yes\n";
>       }
> }
>
>
> It says :-
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> What should I do?
>
> ~Neeti.
>
> On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>>
>> Neeti,
>>
>> In lieu of a response from a BioPerl guru... why not use Needle to
>> generate your pairwise alignment in fasta format, rather than msf  
>> format?
>> The sequence you want should correspond to a single HSP which you  
>> can get
>> directly from the fasta alignment with Bio::SearchIO:
>> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need  
>> to use
>> Bio::AlignIO at all.
>>
>> Derek.
>>
>>
>> -----Original Message-----
>> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
>> Sent: 15 December 2006 05:22
>> To: Fairley, Derek; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> Hi,
>>
>> Thanks a lot for your response.
>> I ran needle like this
>> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
>> It gave me the output in format msf.
>> But now my problem is, if I use Bio::AlignIO module of Bioperl,  
>> how can I
>> get the alignment start and stop coordinates on the sequence. I mean
>> something like hsp->query->start which gives us the alignment  
>> start position
>> on query sequence in a blast output when using Bio::SearchIO.
>> Please help.
>> Like I explained with an example in my previous mail, I want the
>> coordinate where the alignment starts on the sequence.
>>
>> ~Neeti.
>> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>> Neeti,
>>
>> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>>
>> "The results can be output in one of several styles by using the
>> command-line qualifier -aformat xxx, where 'xxx' is replaced by  
>> the name of
>> the required format. Some of the alignment formats can cope with an
>> unlimited number of sequences, while others are only for pairs of  
>> sequences.
>>
>> The available multiple alignment format names are: unknown, multiple,
>> simple, fasta, msf, trace, srs
>>
>> The available pairwise alignment format names are: pair, markx0,  
>> markx1,
>> markx2, markx3, markx10, srspair, score
>>
>> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
>> information on alignment formats."
>>
>> Not sure based on this whether you can get pairwise alignment in .msf
>> format; can't think of a good reason why not. The BioPerl  
>> Align::IO module
>> will allow you to parse alignments in .msf format.
>>
>> HTH,
>>
>> Derek.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:
>> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
>> Sent: 14 December 2006 08:03
>> To: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> How do I run needle specifying that I want the MSF format, on a  
>> linux box?
>> The help doesnt show me any format option. Is there anything  
>> available to
>> pasre MSF format?
>> Please find an example alignment file attached. Here the  
>> seq_of_contig
>> aligns with the reference sequence (i.e. SEQ_1.REF) starting at  
>> position
>> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate  
>> from the
>> output alignment, how can I parse the result to get this?
>>
>> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>> >
>> >
>> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone know of a bioperl parser for needle output,  
>> basically I
>> > > won't
>> > > where the target sequence aligns on the template (i.e. coordinate
>> > > on the
>> > > template where the taget aligns).
>> > >
>> > > --
>> > > -Neeti
>> > > Even my blood says, B positive
>> >
>> > I answered this a number of months back:
>> >
>> > http://tinyurl.com/yzlbx5
>> >
>> > Basically, newer versions of EMBOSS have changed the output for the
>> > AlignIO::emboss parser (which parses needle). I don't believe the
>> > parser has been fixed to deal with that, but Jason has pointed out
>> > you can use MSF output when running needle, then parse using  
>> AlignIO
>> > with the format set to 'msf'.
>> >
>> > chris
>> >
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive
> <1.out>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From lubapardo at gmail.com  Fri Dec 15 11:39:11 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 15 Dec 2006 17:39:11 +0100
Subject: [Bioperl-l] NO BLAST
Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>

*Hello,*
*I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
**
*I got the following error message: cannot find path to blastall.*
*The code I used is (modified from HOWTObeginners):
*

#! /local/bin/perl -w

#use strict;

use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use
Bio::Tools::Run::StandAloneBlast;

my $db_object = Bio::DB::GenBank-> new;

#my $seq_ob = $db_object->get_Seq_by_id('NM_004043');

#$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta');

#$seq ->write_seq($seq_ob);

#print $seq;

@params = (program =>'blastn',
   database =>'db.fa');

$blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params);


$seq_obj = Bio::Seq->new(-id =>"testquery",
   -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT");

$report_obj = $blast_obj->blastall($seq_obj);

$result_obj =$report_obj->next_result;

print $result_obj->num_hits;

*Whether I create a sequence the novo or retrieve one from internet I got
the same message.*

From cjfields at uiuc.edu  Fri Dec 15 12:23:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:23:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
Message-ID: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>


On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:

>
> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>
>>
>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>
>>> Hey Chris,
>>>
>>> My thoughts below.
>>>
>>>> [Chris]
>>>> This could be used to annotate any
>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- 
>>>> you,
>>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>>> something like this may be of general use for any PrimarySeq
>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>>> etc.
>>>>
>>>> However, this also seems to fall into the category of sequence
>>>> annotation.  So, would it be better to have a set of  
>>>> Bio::Annotation
>>>> classes used for this purpose?
>>>
>>>
>>> To me, all meta data is equal. That is, your classic Genbank feature
>>> annotation and a user's arbitrary meta-tag like "Bob thinks this  
>>> is a
>>> kinase domain" aren't different in kind even if they are  
>>> different in
>>> content.
>>>
>>> As resequencing projects multiply, the ability to create arbitrary
>>> meta tags, attach them to different types of objects, and use those
>>> tags to link them together will become desirable, if not essential.
>>>
>>> Keeping a common interface to all of these meta data types would be
>>> advantageous, plus new users won't have to determine whether they
>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>
>>> So I would argue for all of the meta data types to live "under one
>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>> already exists for today's meta data, seems like a reasonable  
>>> choice.
>>> (assuming Annotation objects are flexible enough to be extended as
>>> you propose)
>>>
>>> There, and no flames or jibes even. :)
>>
>> I guess what I want to know is whether there should to be a
>> distinction between 'normal' sequence annotation (comments,
>> references, and so on) and annotation that could be best described as
>> position-specific (like RNA or protein structural annotation).  The
>> current meta implementation is for sequence data only; I felt it
>> would be nice to have a generic implementation that would be
>> applicable to any object data.
>
> my stream-of-consciousness for right now:
>
> I was thinking Bio::Annotation is where this should go - that  
> system doesn't have anything about it that makes it explicitly  
> sequence related. What we're trying to hammer out here on the  
> Alignment side - which fits with your RNA example - is have  
> features, basically SeqFeatures - associated with alignments so  
> columns can be annotated to cover things like character sets and  
> partitions for phylogenetic analyses.  As for data which annotates  
> non-contiguous things like RNAstems we may have  to be more  
> creative about that or model it with a splitLocation.
>
> So currently we've added code so that an Alignment is-a  
> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
> end, with the goal of being able to capture more of the data that  
> can be represented in a NEXUS file.
>
> It feels more like a hack than an elegant Meta-data solution, but I  
> am totally sure whether the data you are thinking about doing at  
> this point, perhaps I need to spend more time thinking about it.
> Or are you worried about the idea of whether the semantic mapping  
> of the data into features or annotations is confusing users?

Sorry in advance for the longish response here...

My original thought was to have a generic abstract class capable of  
positionally describing data in any another class, similar to  
Heikki's Bio::Seq::MetaI but not constrained to sequence data only.   
Implementing classes would be capable of having different data  
structures based on their use (simple string, array, AoA, AoH, AoO).   
One MetaCollection class to contain them all in a tag-like system, so  
you could have mixed data types describe the same object.  The latter  
Collection class is so similar to AnnotationCollection that I agree  
Bio::Annotation would be the best place for this.

The way I reconfigured Stockholm alignment parsing/writing is to use  
Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is  
capable of holding a sequence and several meta strings, stored as  
tags or 'names'.  However, there is no Meta object for alignments  
(for RNA/protein structure consensus and other Rfam/Pfam markup); I  
hacked around this by using a Bio::Seq::Meta w/o a seq, but I would  
rather have a generic Meta object independent of the sequence cruft.

So for this partial Pfam alignment,

Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
#=GR Q92SV1_RHIME/122-299 pAS .........................
Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
#=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
#=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
#=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
#=GC SA_cons                 03002200312...1312414..676
#=GC seq_cons                luhhLuhsRpl...hthppth..+pG
//

'#=GC' lines would be in generic meta string objects in the  
alignment, while '#=GR' tags would be in similar meta objects in the  
relevant sequences.  As long as both aren't AnnotatableI this isn't  
an issue.

Similarly, NEXUS files which contained any position-based values  
could hold a meta string/array object in a similar tag.

The basic scheme is:

                     |--String
                     |
Annotation::Meta----|--Array
                     |
                     |--HorriblyComplexDataStruct

Then I started thinking about where this could be applied, and  
whether a true Meta object needs to be constrained only to describing  
position-based data.  This somewhat relates to this bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1825

which seems to need a simple but unconstrained hash-of-arrays-based  
meta object.

Then my head appropriately exploded...

Hope everything is going well at the hackathon!  Looks like some  
interesting stuff coming out of it.

chris

From cjfields at uiuc.edu  Fri Dec 15 12:49:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:49:45 -0600
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>

On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:

> As much as I would like to take credit for this :-)  Allen Day  
> wrote the
> original code, and then Chris Fields tried to fix it so that it  
> actually
> worked :-)  I think it would be a good idea to have a validate_terms
> option like Bio::FeatureIO::gff.
>
> Scott

I did ?!?  I committed a bug fix a while back:

Revision 1.34 / (view) - annotate - [select for diffs] ,
Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
Branch: MAIN
CVS Tags: branch-experimental
Branch point for: branch-1-5-2
Changes since 1.33: +155 -33 lines
Diff to previous 1.33

Bug 2026; Robert's enhancements

To tell the truth I don't know if this is where the mandatory checks  
were added in; I'm not too familiar with SeqFeature::Annotation yet.

I agree with Scott (and Matthew) that SOFA checks should be  
optional.  Matthew, can you write up a patch and maybe some tests?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 18:30:11 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 18:30:11 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>

I'm getting the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ 
SearchIO/blast.pm:1172
STACK: main::process_reports ./new_blast_script.pl:254
STACK: ./new_blast_script.pl:132
-----------------------------------------------------------


next_result is a pretty dense chunk of code to decipher.  I was  
wondering if anyone more familiar with that code might know what the  
"no data for midline $_" exception is referring to?

For context:

    1161                if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ 
(\-?\d+)/ ) {
    1162                    my ($full,$type,$start,$str,$end) = ($1, 
$2,$3,$4,$5);
    1163                    if( $str eq '-' ) {
    1164                        $i = 3 if $type eq 'Sbjct';
    1165                    } else {
    1166                        $data{$type} = $str;
    1167                    }
    1168                    $len = length($full);
    1169                    $self->{"\_$type"}->{'begin'} = $start  
unless $self->{"_$type"}->{'begin'};
    1170                    $self->{"\_$type"}->{'end'} = $end;
    1171                } else {
    1172                    $self->throw("no data for midline $_")
    1173                        unless (defined $_ && defined $len);
    1174                    $data{'Mid'} = substr($_,$len);
    1175                }


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason at bioperl.org  Fri Dec 15 13:56:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 13:56:13 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
Message-ID: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>

It means it is expecting alignment block of data and there is none  
(or there is none in the context it is expecting it) - so something  
is wrong with the report as it gets tripped up.

I'm not sure reading the code is going to help you - what someone  
will have to do is figure out what is different about this report  
than reports that do work for the parser.
You'll do better if you just provide an example report that is  
failing as a bug report.

Providing the version of BLAST you are using and version of bioperl  
will help.  I seem to remember NCBI changing the BLAST text format so  
that will break the parser if it is a significant change.

As has been mentioned in the past, this playing cat and mouse with  
format changes means things will periodically break. If you need rock- 
solid always going to work, I guess the XML is better route to go.

-jason
On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:

> I'm getting the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
> SearchIO/blast.pm:1172
> STACK: main::process_reports ./new_blast_script.pl:254
> STACK: ./new_blast_script.pl:132
> -----------------------------------------------------------
>
>
> next_result is a pretty dense chunk of code to decipher.  I was
> wondering if anyone more familiar with that code might know what the
> "no data for midline $_" exception is referring to?
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Dec 15 14:21:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 13:21:32 -0600
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
	<B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu>


On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote:

> It means it is expecting alignment block of data and there is none
> (or there is none in the context it is expecting it) - so something
> is wrong with the report as it gets tripped up.
>
> I'm not sure reading the code is going to help you - what someone
> will have to do is figure out what is different about this report
> than reports that do work for the parser.
> You'll do better if you just provide an example report that is
> failing as a bug report.
>
> Providing the version of BLAST you are using and version of bioperl
> will help.  I seem to remember NCBI changing the BLAST text format so
> that will break the parser if it is a significant change.
>
> As has been mentioned in the past, this playing cat and mouse with
> format changes means things will periodically break. If you need rock-
> solid always going to work, I guess the XML is better route to go.
>
> -jason

I agree that XML is the only reliable way to go, though I have been  
reading on the BioPython group about some issues with newer (2.2.13  
or greater) BLAST XML output when reports with multiple BLAST  
queries.  Don't know if this affects Bioperl or not.

As for the 'midline' error, there was a similar error a while back  
(fixed for the 1.5.2 release) that had to do with extra lines in the  
alignment section in some BLAST reports.  Unless we have a demo BLAST  
report and sample code we can't do much about it (we need to  
reproduce the error in order to fix it), so the best thing to do it  
file a bug report.

chris

> On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:
>
>> I'm getting the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
>> SearchIO/blast.pm:1172
>> STACK: main::process_reports ./new_blast_script.pl:254
>> STACK: ./new_blast_script.pl:132
>> -----------------------------------------------------------
>>
>>
>> next_result is a pretty dense chunk of code to decipher.  I was
>> wondering if anyone more familiar with that code might know what the
>> "no data for midline $_" exception is referring to?
>>
>>
>> --
>> Andrew Stewart
>> Research Assistant, Genomics Team
>> Navy Medical Research Center (NMRC)
>> Biological Defense Research Directorate (BDRD)
>> BDRD Annex
>> 12300 Washington Avenue, 2nd Floor
>> Rockville, MD 20852
>>
>> email: stewarta at nmrc.navy.mil
>> phone: 301-231-6700 Ext 270
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From vaughn at cshl.edu  Fri Dec 15 13:05:47 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Fri, 15 Dec 2006 13:05:47 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <ed625e0e0612151005o2641f019ndb5cf0ac6582e2d6@mail.gmail.com>

Yes, I will. I am working on it today. It's a little more complicated
to fix this than I expected because SeqFeature::Annotation->type()
returns a Bio::AnnotationI rather than a simple scalar like it used
to.

On 12/15/06, Chris Fields <cjfields at uiuc.edu> wrote:
> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:
>
> > As much as I would like to take credit for this :-)  Allen Day
> > wrote the
> > original code, and then Chris Fields tried to fix it so that it
> > actually
> > worked :-)  I think it would be a good idea to have a validate_terms
> > option like Bio::FeatureIO::gff.
> >
> > Scott
>
> I did ?!?  I committed a bug fix a while back:
>
> Revision 1.34 / (view) - annotate - [select for diffs] ,
> Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
> Branch: MAIN
> CVS Tags: branch-experimental
> Branch point for: branch-1-5-2
> Changes since 1.33: +155 -33 lines
> Diff to previous 1.33
>
> Bug 2026; Robert's enhancements
>
> To tell the truth I don't know if this is where the mandatory checks
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
>
> I agree with Scott (and Matthew) that SOFA checks should be
> optional.  Matthew, can you write up a patch and maybe some tests?
>
> chris
>
>
>
>

From valiente at lsi.upc.edu  Fri Dec 15 19:45:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Sat, 16 Dec 2006 01:45:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577EFD3.7090904@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>

> I don't think that can be true. Your error message contains 'Must  
> supply
> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>
> If you uninstall the fink installation and install 1.5.2 using cpan  
> (with root privileges by going sudo cpan) that should at least get  
> rid of the error messages...
>
>
>> The tree is not correct (I've parsed it from R to have a double
>> check) but don't know yet what the problem is with it.
>
> ... But if the tree is wrong anyway... Let me know what you find out.

I've uninstalled the fink installation and used the cvs instead, and  
the error message is gone. However, on a larger set of 190 species,  
which are all present in the NCBI taxonomy, the resulting tree has  
only 178 taxa. I suspect, something must be wrong with the  
merge_lineage method in the major rewrite of the taxonomy2tree  
script. Can someone please check this? I'm attaching the 190 species  
call to the script. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061216/5e392593/attachment.obj 

From lincoln.stein at gmail.com  Fri Dec 15 11:02:27 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Fri, 15 Dec 2006 11:02:27 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>

This is very embarassing for me, particularly since I spent a lot of time
validating that Bio::Graphics was working properly before the 1.5.2 release
went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

Lincoln

On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>
> Hi All,
>
> I'm afraid that the xyplot glyph that is in the recent bioperl release has
> an error that causes the content to be printed to the right of the correct
> position. Unfortunately this wasn't caught before the release because the
> glyph was only tested on very large (whole genome) features.
>
> You will need to do a CVS update to get a fixed version from bioperl-live.
> A future bugfix release of gbrowse will patch this glyph for you
> automatically.
>
> Lincoln
>
> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
> >
> > Hi,
> > I'm having a problem getting features and an xyplot properly aligned in
> > Gbrowse.  For example, see this page:
> >
> > http://tinyurl.com/ylbq3q
> >
> > The feature in the "CENPK SNPs" track should actually be around the peak
> > of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
> > feature is at position 79, and the xyplot axes and data should span from
> > 61 - 95.  However, as you can see, the data in the xyplot are oddly
> > separated from the axes (which seem to be in the correct place), with the
> > data shifted over to about position 120-155.
> > This occurs elsewhere, not just at the ends of the chromosomes.
> >
> > When I zoom to ~80 bp, all is well, see:
> >
> > http://tinyurl.com/yzav8k
> >
> > The relevant snippets from the GFF and the config files are below.
> >
> > Thanks!
> > Kara
> >
> > GFF:
> >
> > chrI SNPScanner
> > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> > is 2.24506
> > chrI SNPScanner
> > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> > is 3.26837
> > chrI SNPScanner
> > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> > is 1.39938
> > chrI SNPScanner
> > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> > is 1.4039
> > chrI SNPScanner
> > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> > is 9.16134
> > chrI SNPScanner
> > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> > is 10.1413
> > chrI SNPScanner
> > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> > is 12.9256
> > chrI SNPScanner
> > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> > is 13.195
> > chrI SNPScanner
> > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> > is 22.7127
> > chrI SNPScanner
> > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> > is 23.8289
> > chrI SNPScanner
> > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> > is 21.9123
> > chrI SNPScanner
> > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> > is 28.3344
> > chrI SNPScanner
> > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> > is 35.0436
> > chrI SNPScanner
> > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> > is 37.361
> > chrI SNPScanner
> > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> > is 39.5408
> > chrI SNPScanner
> > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> > is 28.2008
> > chrI SNPScanner
> > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> > is 32.6254
> > chrI SNPScanner
> > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> > is 36.0832
> > chrI SNPScanner
> > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> > is 32.1205
> > chrI SNPScanner
> > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> > is 41.3048
> > chrI SNPScanner
> > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> > is 30.7975
> > chrI SNPScanner
> > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> > is 29.4282
> > chrI SNPScanner
> > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> > is 35.3586
> > chrI SNPScanner
> > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> > is 34.1426
> > chrI SNPScanner
> > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> > is 30.2966
> > chrI SNPScanner
> > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> > is 17.8402
> > chrI SNPScanner
> > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> > is 15.2637
> > chrI SNPScanner
> > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> > is 12.657
> > chrI SNPScanner
> > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> > is 10.2033
> > chrI SNPScanner
> > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> > is 9.40143
> > chrI SNPScanner
> > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> > is 6.56273
> > chrI SNPScanner
> > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> > is 3.66211
> > chrI SNPScanner
> > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> > is 0.394194
> >
> > CONFIG:
> >
> >
> > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
> >
> > [CENPK_all_scores_graph]
> > feature = GRAPH_CENPK:SNPScanner
> > glyph = xyplot
> > graph_type = boxes
> > fgcolor = purple
> > bgcolor = purple
> > height = 100
> > min_score = 0
> > max_score = 110
> > label = 0
> > key = CENPK prediction signal
> > link =
> > category = SNPs: signal graphs
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> >
> > _______________________________________________
> > Gmod-gbrowse mailing list
> > Gmod-gbrowse at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Sat Dec 16 01:10:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:10:07 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu>

We could feasibly have regular point releases of the 1.5 dev. series  
for bug fixes; I guess it just depends on how often these should come  
out and what critical tests must pass for a release to go forward.   
Sendu's already done a ton of work towards getting BioPerl switched  
over to Module::Build and Test::More, and fixing bugs.  As Hilmar has  
pointed out in the past, this is a developer's series, so not every  
test needs to pass before a release goes out.

When would you like this to go out?

chris

On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote:

> This is very embarassing for me, particularly since I spent a lot  
> of time
> validating that Bio::Graphics was working properly before the 1.5.2  
> release
> went out. How long before there is a 1.5.3 release? How about a  
> 1.5.2.1release?
>
> Lincoln
>
> On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>>
>> Hi All,
>>
>> I'm afraid that the xyplot glyph that is in the recent bioperl  
>> release has
>> an error that causes the content to be printed to the right of the  
>> correct
>> position. Unfortunately this wasn't caught before the release  
>> because the
>> glyph was only tested on very large (whole genome) features.
>>
>> You will need to do a CVS update to get a fixed version from  
>> bioperl-live.
>> A future bugfix release of gbrowse will patch this glyph for you
>> automatically.
>>
>> Lincoln
>>
>> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>>>
>>> Hi,
>>> I'm having a problem getting features and an xyplot properly  
>>> aligned in
>>> Gbrowse.  For example, see this page:
>>>
>>> http://tinyurl.com/ylbq3q
>>>
>>> The feature in the "CENPK SNPs" track should actually be around  
>>> the peak
>>> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
>>> feature is at position 79, and the xyplot axes and data should  
>>> span from
>>> 61 - 95.  However, as you can see, the data in the xyplot are oddly
>>> separated from the axes (which seem to be in the correct place),  
>>> with the
>>> data shifted over to about position 120-155.
>>> This occurs elsewhere, not just at the ends of the chromosomes.
>>>
>>> When I zoom to ~80 bp, all is well, see:
>>>
>>> http://tinyurl.com/yzav8k
>>>
>>> The relevant snippets from the GFF and the config files are below.
>>>
>>> Thanks!
>>> Kara
>>>
>>> GFF:
>>>
>>> chrI SNPScanner
>>> CENPK_GRAPH 61 95 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_CALL 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 61 61 2.24506 . .  
>>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
>>> is 2.24506
>>> chrI SNPScanner
>>> CENPK_SCORE 62 62 3.26837 . .  
>>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
>>> is 3.26837
>>> chrI SNPScanner
>>> CENPK_SCORE 63 63 1.39938 . .  
>>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
>>> is 1.39938
>>> chrI SNPScanner
>>> CENPK_SCORE 64 64 1.4039 . .  
>>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
>>> is 1.4039
>>> chrI SNPScanner
>>> CENPK_SCORE 65 65 9.16134 . .  
>>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
>>> is 9.16134
>>> chrI SNPScanner
>>> CENPK_SCORE 66 66 10.1413 . .  
>>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
>>> is 10.1413
>>> chrI SNPScanner
>>> CENPK_SCORE 67 67 12.9256 . .  
>>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
>>> is 12.9256
>>> chrI SNPScanner
>>> CENPK_SCORE 68 68 13.195 . .  
>>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
>>> is 13.195
>>> chrI SNPScanner
>>> CENPK_SCORE 69 69 22.7127 . .  
>>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
>>> is 22.7127
>>> chrI SNPScanner
>>> CENPK_SCORE 70 70 23.8289 . .  
>>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
>>> is 23.8289
>>> chrI SNPScanner
>>> CENPK_SCORE 71 71 21.9123 . .  
>>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
>>> is 21.9123
>>> chrI SNPScanner
>>> CENPK_SCORE 72 72 28.3344 . .  
>>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
>>> is 28.3344
>>> chrI SNPScanner
>>> CENPK_SCORE 73 73 35.0436 . .  
>>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
>>> is 35.0436
>>> chrI SNPScanner
>>> CENPK_SCORE 74 74 37.361 . .  
>>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
>>> is 37.361
>>> chrI SNPScanner
>>> CENPK_SCORE 75 75 39.5408 . .  
>>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
>>> is 39.5408
>>> chrI SNPScanner
>>> CENPK_SCORE 76 76 28.2008 . .  
>>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
>>> is 28.2008
>>> chrI SNPScanner
>>> CENPK_SCORE 77 77 32.6254 . .  
>>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
>>> is 32.6254
>>> chrI SNPScanner
>>> CENPK_SCORE 78 78 36.0832 . .  
>>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
>>> is 36.0832
>>> chrI SNPScanner
>>> CENPK_SCORE 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 80 80 32.1205 . .  
>>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
>>> is 32.1205
>>> chrI SNPScanner
>>> CENPK_SCORE 81 81 41.3048 . .  
>>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
>>> is 41.3048
>>> chrI SNPScanner
>>> CENPK_SCORE 82 82 30.7975 . .  
>>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
>>> is 30.7975
>>> chrI SNPScanner
>>> CENPK_SCORE 83 83 29.4282 . .  
>>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
>>> is 29.4282
>>> chrI SNPScanner
>>> CENPK_SCORE 84 84 35.3586 . .  
>>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
>>> is 35.3586
>>> chrI SNPScanner
>>> CENPK_SCORE 85 85 34.1426 . .  
>>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
>>> is 34.1426
>>> chrI SNPScanner
>>> CENPK_SCORE 86 86 30.2966 . .  
>>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
>>> is 30.2966
>>> chrI SNPScanner
>>> CENPK_SCORE 87 87 17.8402 . .  
>>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
>>> is 17.8402
>>> chrI SNPScanner
>>> CENPK_SCORE 88 88 15.2637 . .  
>>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
>>> is 15.2637
>>> chrI SNPScanner
>>> CENPK_SCORE 89 89 12.657 . .  
>>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
>>> is 12.657
>>> chrI SNPScanner
>>> CENPK_SCORE 90 90 10.2033 . .  
>>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
>>> is 10.2033
>>> chrI SNPScanner
>>> CENPK_SCORE 91 91 9.40143 . .  
>>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
>>> is 9.40143
>>> chrI SNPScanner
>>> CENPK_SCORE 92 92 6.56273 . .  
>>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
>>> is 6.56273
>>> chrI SNPScanner
>>> CENPK_SCORE 93 93 3.66211 . .  
>>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
>>> is 3.66211
>>> chrI SNPScanner
>>> CENPK_SCORE 94 94 0.394194 . .  
>>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
>>> is 0.394194
>>>
>>> CONFIG:
>>>
>>>
>>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>>>
>>> [CENPK_all_scores_graph]
>>> feature = GRAPH_CENPK:SNPScanner
>>> glyph = xyplot
>>> graph_type = boxes
>>> fgcolor = purple
>>> bgcolor = purple
>>> height = 100
>>> min_score = 0
>>> max_score = 110
>>> label = 0
>>> key = CENPK prediction signal
>>> link =
>>> category = SNPs: signal graphs
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -----
>>> Take Surveys. Earn Cash. Influence the Future of IT
>>> Join SourceForge.net's Techsay panel and you'll get the chance to  
>>> share
>>> your
>>> opinions on IT & business topics through brief surveys - and earn  
>>> cash
>>> http://www.techsay.com/default.php? 
>>> page=join.php&p=sourceforge&CID=DEVDEV
>>>
>>>
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>>
>> --
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Dec 16 01:28:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:28:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>


On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:

>> I don't think that can be true. Your error message contains 'Must  
>> supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using  
>> cpan (with root privileges by going sudo cpan) that should at  
>> least get rid of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
>
> I've uninstalled the fink installation and used the cvs instead,  
> and the error message is gone. However, on a larger set of 190  
> species, which are all present in the NCBI taxonomy, the resulting  
> tree has only 178 taxa. I suspect, something must be wrong with the  
> merge_lineage method in the major rewrite of the taxonomy2tree  
> script. Can someone please check this? I'm attaching the 190  
> species call to the script. Thanks,
>
> Gabriel

I can confirm that.  It is definitely dropping them in merge_lineage 
(); if you add a call to get_leaf_nodes to check how many are present  
after each merge_lineage() call, you can see it dropping nodes along  
the trace.

in taxonomy2tree.pl:

my $ct;
my ($treect, $mergect) = 0;
for my $name (@species) {
   my $ncbi_id = $db->get_taxonid($name);
   if ($ncbi_id) {
     #print "Species: $name\n\tTaxID: $ncbi_id\n";
     #$ids{$ncbi_id}++;
     my $node = $db->get_taxon(-taxonid => $ncbi_id);

     if ($tree) {
       $tree->merge_lineage($node);

     }
     else {
       $tree = Bio::Tree::Tree->new(-node => $node);
     }
     printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes));
   }
   else {
     warn "no NCBI Taxonomy node for species ",$name,"\n";
   }
   $ct++;
}

chris


From bix at sendu.me.uk  Sat Dec 16 09:37:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:37:49 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <458404BD.8030908@sendu.me.uk>

Lincoln Stein wrote:
> This is very embarassing for me, particularly since I spent a lot of time
> validating that Bio::Graphics was working properly before the 1.5.2 release
> went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

I'm happy to try a point release for critical bug fixes. Why don't you 
commit the necessary fixes to branch-1-5-2 and let me know when you're 
happy, and I'll do 1.5.2.1.


Cheers,
Sendu.

From bix at sendu.me.uk  Sat Dec 16 09:47:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:47:57 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <4584071D.3070005@sendu.me.uk>

Gabriel Valiente wrote:
>> I don't think that can be true. Your error message contains 'Must supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using cpan 
>> (with root privileges by going sudo cpan) that should at least get rid 
>> of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
> 
> I've uninstalled the fink installation and used the cvs instead, and the 
> error message is gone. However, on a larger set of 190 species, which 
> are all present in the NCBI taxonomy, the resulting tree has only 178 
> taxa. I suspect, something must be wrong with the merge_lineage method 
> in the major rewrite of the taxonomy2tree script. Can someone please 
> check this? I'm attaching the 190 species call to the script. Thanks,

Ok, I'll look into it. You're also welcome to see if you can take your 
own code from your original taxonomy2tree script and see if you can 
merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with 
your algorithms to get it working correctly. Indeed, does your original 
version of the script work on this data set?


Cheers,
Sendu.

From cjfields at uiuc.edu  Sat Dec 16 10:18:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 09:18:50 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4584071D.3070005@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<4584071D.3070005@sendu.me.uk>
Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu>


On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>>> I don't think that can be true. Your error message contains 'Must  
>>> supply
>>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>>
>>> If you uninstall the fink installation and install 1.5.2 using cpan
>>> (with root privileges by going sudo cpan) that should at least  
>>> get rid
>>> of the error messages...
>>>
>>>
>>>> The tree is not correct (I've parsed it from R to have a double
>>>> check) but don't know yet what the problem is with it.
>>>
>>> ... But if the tree is wrong anyway... Let me know what you find  
>>> out.
>>
>> I've uninstalled the fink installation and used the cvs instead,  
>> and the
>> error message is gone. However, on a larger set of 190 species, which
>> are all present in the NCBI taxonomy, the resulting tree has only 178
>> taxa. I suspect, something must be wrong with the merge_lineage  
>> method
>> in the major rewrite of the taxonomy2tree script. Can someone please
>> check this? I'm attaching the 190 species call to the script. Thanks,
>
> Ok, I'll look into it. You're also welcome to see if you can take your
> own code from your original taxonomy2tree script and see if you can
> merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with
> your algorithms to get it working correctly. Indeed, does your  
> original
> version of the script work on this data set?
>
>
> Cheers,
> Sendu.

Sendu,

Don't know if it helps, but when I tried Gabriel's shell script last  
night I ran a modification of taxonomy2tree to see what would pop  
up.  Everything is fine up to about 100 iterations, then merge_lineage 
() starts dropping leaf nodes.

chris 
  

From bix at sendu.me.uk  Sat Dec 16 10:33:35 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 15:33:35 +0000
Subject: [Bioperl-l] NO BLAST
In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
Message-ID: <458411CF.8000707@sendu.me.uk>

Luba Pardo wrote:
> *Hello,*
> *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
> **
> *I got the following error message: cannot find path to blastall.*
> *The code I used is (modified from HOWTObeginners):

Bioperl doesn't know where you installed blast. If you've actually 
installed it, you can set the environment variable BLASTDIR to point to 
the directory that contains the blastall executable.

From cain.cshl at gmail.com  Fri Dec 15 13:09:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 15 Dec 2006 13:09:48 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and
	mandatory	type	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <1166206188.2569.380.camel@localhost.localdomain>

On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote:
> 
> To tell the truth I don't know if this is where the mandatory checks  
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
> 
> I agree with Scott (and Matthew) that SOFA checks should be  
> optional.  Matthew, can you write up a patch and maybe some tests?
> 
> chris
> 
That's not where they were added in, it just that they hadn't been fully
implemented before then, so they didn't work (which probably meant they
weren't mandatory, though I don't remember (it could be that it just
croaked)).

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/b248a096/attachment.bin 

From hlapp at gmx.net  Sun Dec 17 01:02:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 17 Dec 2006 01:02:04 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <458404BD.8030908@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>


On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:

> Lincoln Stein wrote:
>> This is very embarassing for me, particularly since I spent a lot  
>> of time
>> validating that Bio::Graphics was working properly before the  
>> 1.5.2 release
>> went out. How long before there is a 1.5.3 release? How about a  
>> 1.5.2.1release?
>
> I'm happy to try a point release for critical bug fixes. Why don't you
> commit the necessary fixes to branch-1-5-2 and let me know when you're
> happy, and I'll do 1.5.2.1.

Feel free to do that, but why not make a 1.5.3 off the main trunk?  
1.5.2.1 may be adding more to the version confusion (developer/stable/ 
point-release/etc) than it is worth, and there is no shame in  
releasing new developer versions every few weeks.

My $0.02 ...

	-hilmar


>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From fgarret at ub.edu  Mon Dec 18 07:07:02 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 13:07:02 +0100
Subject: [Bioperl-l] codeml
Message-ID: <45868466.508@ub.edu>

Hi all,

I've been using bioperl's PAML module (specifically the codeml part) but 
with just one tree.

Since the program accepts several trees as input (and runs the analysis 
for each tree outputting the difference in likelihoods for each one) I 
was wondering if there's some way to do it through bioperl?

thanks in adv,
FG

From heikki at sanbi.ac.za  Mon Dec 18 08:51:50 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 18 Dec 2006 15:51:50 +0200
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
Message-ID: <200612181551.51277.heikki@sanbi.ac.za>


Reading the discussion, I think it is time to draw some guidelines.

1. Base the Meta implementation to a real use cases.

   MSA is a good example.

2. Allow generalisations

   If you can see an other implementation of the same idea that can be merged 
   with the first do it but do not hurt yourself if you can not.


The most difficult question is how to separate case-specific attributes that 
are best implemented by subclassing with additional methods from truly widely 
variable meta data that is best done as a parallel track meta information 
holding class.

The problem I see with undefined, totally open meta annotation, is that if you 
can put anything in there, it is also totally confusing to a user. If you can 
put anything in, how do you know what to get get out and know that it is 
there?

That leads to the the third guideline:

3. Use separate meta classes only when there are several different ways of 
encoding data that is present in large numbers *and* when you are expecting 
to be assessing the data computationally rather than just checking if an 
attribute is there. 


	-Heikki


On Friday 15 December 2006 19:23, Chris Fields wrote:
> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
> > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
> >>> Hey Chris,
> >>>
> >>> My thoughts below.
> >>>
> >>>> [Chris]
> >>>> This could be used to annotate any
> >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
> >>>> you,
> >>>> maybe in a collection (similar to AnnotationCollection).  I thought
> >>>> something like this may be of general use for any PrimarySeq
> >>>> (quality, structure), alignments like NEXUS and Stockholm,
> >>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
> >>>> etc.
> >>>>
> >>>> However, this also seems to fall into the category of sequence
> >>>> annotation.  So, would it be better to have a set of
> >>>> Bio::Annotation
> >>>> classes used for this purpose?
> >>>
> >>> To me, all meta data is equal. That is, your classic Genbank feature
> >>> annotation and a user's arbitrary meta-tag like "Bob thinks this
> >>> is a
> >>> kinase domain" aren't different in kind even if they are
> >>> different in
> >>> content.
> >>>
> >>> As resequencing projects multiply, the ability to create arbitrary
> >>> meta tags, attach them to different types of objects, and use those
> >>> tags to link them together will become desirable, if not essential.
> >>>
> >>> Keeping a common interface to all of these meta data types would be
> >>> advantageous, plus new users won't have to determine whether they
> >>> need to use Bio::Meta objects or Bio::Annotation objects.
> >>>
> >>> So I would argue for all of the meta data types to live "under one
> >>> roof". Which roof isn't as important. Bio::Annotation, since it
> >>> already exists for today's meta data, seems like a reasonable
> >>> choice.
> >>> (assuming Annotation objects are flexible enough to be extended as
> >>> you propose)
> >>>
> >>> There, and no flames or jibes even. :)
> >>
> >> I guess what I want to know is whether there should to be a
> >> distinction between 'normal' sequence annotation (comments,
> >> references, and so on) and annotation that could be best described as
> >> position-specific (like RNA or protein structural annotation).  The
> >> current meta implementation is for sequence data only; I felt it
> >> would be nice to have a generic implementation that would be
> >> applicable to any object data.
> >
> > my stream-of-consciousness for right now:
> >
> > I was thinking Bio::Annotation is where this should go - that
> > system doesn't have anything about it that makes it explicitly
> > sequence related. What we're trying to hammer out here on the
> > Alignment side - which fits with your RNA example - is have
> > features, basically SeqFeatures - associated with alignments so
> > columns can be annotated to cover things like character sets and
> > partitions for phylogenetic analyses.  As for data which annotates
> > non-contiguous things like RNAstems we may have  to be more
> > creative about that or model it with a splitLocation.
> >
> > So currently we've added code so that an Alignment is-a
> > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
> > end, with the goal of being able to capture more of the data that
> > can be represented in a NEXUS file.
> >
> > It feels more like a hack than an elegant Meta-data solution, but I
> > am totally sure whether the data you are thinking about doing at
> > this point, perhaps I need to spend more time thinking about it.
> > Or are you worried about the idea of whether the semantic mapping
> > of the data into features or annotations is confusing users?
>
> Sorry in advance for the longish response here...
>
> My original thought was to have a generic abstract class capable of
> positionally describing data in any another class, similar to
> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
> Implementing classes would be capable of having different data
> structures based on their use (simple string, array, AoA, AoH, AoO).
> One MetaCollection class to contain them all in a tag-like system, so
> you could have mixed data types describe the same object.  The latter
> Collection class is so similar to AnnotationCollection that I agree
> Bio::Annotation would be the best place for this.
>
> The way I reconfigured Stockholm alignment parsing/writing is to use
> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
> capable of holding a sequence and several meta strings, stored as
> tags or 'names'.  However, there is no Meta object for alignments
> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
> rather have a generic Meta object independent of the sequence cruft.
>
> So for this partial Pfam alignment,
>
> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
> #=GR Q92SV1_RHIME/122-299 pAS .........................
> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
> #=GC SA_cons                 03002200312...1312414..676
> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
> //
>
> '#=GC' lines would be in generic meta string objects in the
> alignment, while '#=GR' tags would be in similar meta objects in the
> relevant sequences.  As long as both aren't AnnotatableI this isn't
> an issue.
>
> Similarly, NEXUS files which contained any position-based values
> could hold a meta string/array object in a similar tag.
>
> The basic scheme is:
>                      |--String
>
> Annotation::Meta----|--Array
>
>                      |--HorriblyComplexDataStruct
>
> Then I started thinking about where this could be applied, and
> whether a true Meta object needs to be constrained only to describing
> position-based data.  This somewhat relates to this bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>
> which seems to need a simple but unconstrained hash-of-arrays-based
> meta object.
>
> Then my head appropriately exploded...
>
> Hope everything is going well at the hackathon!  Looks like some
> interesting stuff coming out of it.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From fgarret at ub.edu  Mon Dec 18 11:18:31 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 17:18:31 +0100
Subject: [Bioperl-l] PAML files
Message-ID: <4586BF57.4090002@ub.edu>

Hi all,

does anyone knows how to get the name of the .ctl file created by the 
PAML module? Inside the tmp directory there are 2 files with random 
names (tree and ctl). Why do they have random names?? Wouldn't it be 
easier to assign them a fixed name?? For instance "codeml.ctl" and 
"tree.nwk"??

thanks in adv,
FG

From bix at sendu.me.uk  Mon Dec 18 11:15:21 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 16:15:21 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
Message-ID: <4586BE99.7020308@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
> 
>> Lincoln Stein wrote:
>>> This is very embarassing for me, particularly since I spent a lot
>>> of time validating that Bio::Graphics was working properly before
>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>> release? How about a 1.5.2.1release?
>> 
>> I'm happy to try a point release for critical bug fixes. Why don't
>> you commit the necessary fixes to branch-1-5-2 and let me know when
>> you're happy, and I'll do 1.5.2.1.
> 
> Feel free to do that, but why not make a 1.5.3 off the main trunk? 
> 1.5.2.1 may be adding more to the version confusion 
> (developer/stable/point-release/etc) than it is worth,

My feeling is that 1.5.3 should be reserved for some significant changes
and new features, and not just a few bug fixes. I'd say this causes less
confusion amongst users - they can associate '1.5.2' with a certain API
and feature-set, and the specific name of the file they download and
install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
matter at all to them.

I also won't have to make some major announcement about it; it will
simply be the most recent developer version of bioperl available so new
users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
1.5.2 users will only feel compelled to get it if they suffer from the
bugs fixed.


> and there is no shame in releasing new developer versions every few
> weeks.

I think doing frequent releases are inadvisable; such a quick release
won't have had much testing so we shouldn't encourage people to install
it: encouragement is implicit when a major new version comes out like
1.5.3. People who want to live on the edge can and should be using a
CVS checkout.


From bix at sendu.me.uk  Mon Dec 18 14:15:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 19:15:16 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
Message-ID: <4586E8C4.6030306@sendu.me.uk>

Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
> 
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>> 
>> Gabriel
> 
> I can confirm that.  It is definitely dropping them in merge_lineage
>  (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.

I confirm the 'dropped' nodes, but also claim that this is no bug.

For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.

The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.

Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.


PS. I changed the merge_lineage() algorithm to be 18x faster (from the 
absurd 3mins for making the 190 species tree to a more reasonable 10s), 
without changing the tree produced.

From fgarret at ub.edu  Mon Dec 18 15:01:38 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:01:38 +0100
Subject: [Bioperl-l] PAML files
In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
Message-ID: <4586F3A2.4010607@ub.edu>


Hi Jason,

This question is related with the one I made previously today.
I need to run codeml with 3 tree topologies. I looked on codeml module 
but it only accepts one tree as input so I thought of using the codeml 
module to prepare all the files and then I would just have to run the 
codeml with the new tree file in batch. But for that I need to know 
which one is the ctl file.

any idea?
FG

Jason Stajich wrote:
> They are temporary names so they are deliberately random and there is no 
> intention of you needing them after a run since it to be cleaned up on 
> the fly. We use an internal method for generating tempfiles that takes 
> care of cleanup afterwards.  I suppose since we do all the work within a 
> temp directory that is cleaned up, one could have a fixed name for the 
> tree, alignment, and ctl files but honestly we never expect people to be 
> reading these filenames as they are intended to be transient.
> 
> What problem are you having that you need the filename?
> 
> -jason
> On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> does anyone knows how to get the name of the .ctl file created by the 
>> PAML module? Inside the tmp directory there are 2 files with random 
>> names (tree and ctl). Why do they have random names?? Wouldn't it be 
>> easier to assign them a fixed name?? For instance "codeml.ctl" and 
>> "tree.nwk"??
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
> 
> 

From fgarret at ub.edu  Mon Dec 18 15:07:46 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:07:46 +0100
Subject: [Bioperl-l] codeml
In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
References: <45868466.508@ub.edu>
	<7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
Message-ID: <4586F512.1030209@ub.edu>


Right now it's impossible for me to write it.
By February or March I should have more time but I'll let you know.

FG

Jason Stajich wrote:
> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I 
> guess we'll need to allow the -tree option to accept and arrayref of trees.
> Are you willing to try write this patch?  It should be added as a 
> bug/feature request to bugzilla so it can be corrected in short order.
> 
> -jason
> On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> I've been using bioperl's PAML module (specifically the codeml part) but 
>> with just one tree.
>>
>> Since the program accepts several trees as input (and runs the analysis 
>> for each tree outputting the difference in likelihoods for each one) I 
>> was wondering if there's some way to do it through bioperl?
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich 
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 

From cjfields at uiuc.edu  Mon Dec 18 15:55:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 14:55:55 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4586E8C4.6030306@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>


On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that.  It is definitely dropping them in merge_lineage
>>  (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'  
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue  
> arises
> in all cases.

Makes sense now.  I personally would consider this a bug since the  
results are unexpected (so the docs need to be modified in order to  
clarify).  Some say tomato...

I suppose this is one of the issues one might run into when using  
NCBI taxonomy to build trees.

> Gabriel, please confirm this isn't a bug, or suggest how you  
> propose to
> see your taxa when they are not all leaves of the tree.

Having the nodes appear internally seems semantically correct to me.   
Is there any other way?

> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable  
> 10s),
> without changing the tree produced.

Definitely an improvement!

chris

From jason at bioperl.org  Mon Dec 18 14:33:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:33:32 -0500
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586BF57.4090002@ub.edu>
References: <4586BF57.4090002@ub.edu>
Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>

They are temporary names so they are deliberately random and there is  
no intention of you needing them after a run since it to be cleaned  
up on the fly. We use an internal method for generating tempfiles  
that takes care of cleanup afterwards.  I suppose since we do all the  
work within a temp directory that is cleaned up, one could have a  
fixed name for the tree, alignment, and ctl files but honestly we  
never expect people to be reading these filenames as they are  
intended to be transient.

What problem are you having that you need the filename?

-jason
On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:

> Hi all,
>
> does anyone knows how to get the name of the .ctl file created by the
> PAML module? Inside the tmp directory there are 2 files with random
> names (tree and ctl). Why do they have random names?? Wouldn't it be
> easier to assign them a fixed name?? For instance "codeml.ctl" and
> "tree.nwk"??
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjm at fruitfly.org  Mon Dec 18 16:50:00 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 18 Dec 2006 13:50:00 -0800
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>


I agree with everything Heikki is saying, I just wanted to highlight  
one paragraph:

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?

One solution is to give your annotation/metadata-model formal  
computational semantics and use ontologies to give additional  
semantics to your metadata tags. This provides both user information  
in the form of documentation, and a means of specifying to the  
computer exactly what should be done with the tags.

This is probably overkill for bioperl; but if the use cases being  
proposed do lean in the direction of a new metadata system that is  
not necessarily backwards compatible with the existing one, then I'd  
recommend checking out what's already out there before re-inventing  
the wheel. Perl RDF libraries are getting a little better.

If anyone is interested in pursuing this sort of thing (probably on a  
branch), let me know

On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.
>
> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.
>
>
> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.
>
> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki
>
>
>
> On Friday 15 December 2006 19:23, Chris Fields wrote:
>> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
>>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>>>> Hey Chris,
>>>>>
>>>>> My thoughts below.
>>>>>
>>>>>> [Chris]
>>>>>> This could be used to annotate any
>>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
>>>>>> you,
>>>>>> maybe in a collection (similar to AnnotationCollection).  I  
>>>>>> thought
>>>>>> something like this may be of general use for any PrimarySeq
>>>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>>>> SeqFeatures where structure could be stored (tRNA or  
>>>>>> riboswitches),
>>>>>> etc.
>>>>>>
>>>>>> However, this also seems to fall into the category of sequence
>>>>>> annotation.  So, would it be better to have a set of
>>>>>> Bio::Annotation
>>>>>> classes used for this purpose?
>>>>>
>>>>> To me, all meta data is equal. That is, your classic Genbank  
>>>>> feature
>>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this
>>>>> is a
>>>>> kinase domain" aren't different in kind even if they are
>>>>> different in
>>>>> content.
>>>>>
>>>>> As resequencing projects multiply, the ability to create arbitrary
>>>>> meta tags, attach them to different types of objects, and use  
>>>>> those
>>>>> tags to link them together will become desirable, if not  
>>>>> essential.
>>>>>
>>>>> Keeping a common interface to all of these meta data types  
>>>>> would be
>>>>> advantageous, plus new users won't have to determine whether they
>>>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>>>
>>>>> So I would argue for all of the meta data types to live "under one
>>>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>>>> already exists for today's meta data, seems like a reasonable
>>>>> choice.
>>>>> (assuming Annotation objects are flexible enough to be extended as
>>>>> you propose)
>>>>>
>>>>> There, and no flames or jibes even. :)
>>>>
>>>> I guess what I want to know is whether there should to be a
>>>> distinction between 'normal' sequence annotation (comments,
>>>> references, and so on) and annotation that could be best  
>>>> described as
>>>> position-specific (like RNA or protein structural annotation).  The
>>>> current meta implementation is for sequence data only; I felt it
>>>> would be nice to have a generic implementation that would be
>>>> applicable to any object data.
>>>
>>> my stream-of-consciousness for right now:
>>>
>>> I was thinking Bio::Annotation is where this should go - that
>>> system doesn't have anything about it that makes it explicitly
>>> sequence related. What we're trying to hammer out here on the
>>> Alignment side - which fits with your RNA example - is have
>>> features, basically SeqFeatures - associated with alignments so
>>> columns can be annotated to cover things like character sets and
>>> partitions for phylogenetic analyses.  As for data which annotates
>>> non-contiguous things like RNAstems we may have  to be more
>>> creative about that or model it with a splitLocation.
>>>
>>> So currently we've added code so that an Alignment is-a
>>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
>>> end, with the goal of being able to capture more of the data that
>>> can be represented in a NEXUS file.
>>>
>>> It feels more like a hack than an elegant Meta-data solution, but I
>>> am totally sure whether the data you are thinking about doing at
>>> this point, perhaps I need to spend more time thinking about it.
>>> Or are you worried about the idea of whether the semantic mapping
>>> of the data into features or annotations is confusing users?
>>
>> Sorry in advance for the longish response here...
>>
>> My original thought was to have a generic abstract class capable of
>> positionally describing data in any another class, similar to
>> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
>> Implementing classes would be capable of having different data
>> structures based on their use (simple string, array, AoA, AoH, AoO).
>> One MetaCollection class to contain them all in a tag-like system, so
>> you could have mixed data types describe the same object.  The latter
>> Collection class is so similar to AnnotationCollection that I agree
>> Bio::Annotation would be the best place for this.
>>
>> The way I reconfigured Stockholm alignment parsing/writing is to use
>> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
>> capable of holding a sequence and several meta strings, stored as
>> tags or 'names'.  However, there is no Meta object for alignments
>> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
>> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
>> rather have a generic Meta object independent of the sequence cruft.
>>
>> So for this partial Pfam alignment,
>>
>> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
>> #=GR Q92SV1_RHIME/122-299 pAS .........................
>> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
>> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
>> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
>> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
>> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
>> #=GC SA_cons                 03002200312...1312414..676
>> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
>> //
>>
>> '#=GC' lines would be in generic meta string objects in the
>> alignment, while '#=GR' tags would be in similar meta objects in the
>> relevant sequences.  As long as both aren't AnnotatableI this isn't
>> an issue.
>>
>> Similarly, NEXUS files which contained any position-based values
>> could hold a meta string/array object in a similar tag.
>>
>> The basic scheme is:
>>                      |--String
>>
>> Annotation::Meta----|--Array
>>
>>                      |--HorriblyComplexDataStruct
>>
>> Then I started thinking about where this could be applied, and
>> whether a true Meta object needs to be constrained only to describing
>> position-based data.  This somewhat relates to this bug:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>>
>> which seems to need a simple but unconstrained hash-of-arrays-based
>> meta object.
>>
>> Then my head appropriately exploded...
>>
>> Hope everything is going well at the hackathon!  Looks like some
>> interesting stuff coming out of it.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Dec 18 14:35:50 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:35:50 -0500
Subject: [Bioperl-l] codeml
In-Reply-To: <45868466.508@ub.edu>
References: <45868466.508@ub.edu>
Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>

This is shortcoming in the Run::Phylo::PAML::Codeml implementation -  
I guess we'll need to allow the -tree option to accept and arrayref  
of trees.
Are you willing to try write this patch?  It should be added as a bug/ 
feature request to bugzilla so it can be corrected in short order.

-jason
On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:

> Hi all,
>
> I've been using bioperl's PAML module (specifically the codeml  
> part) but
> with just one tree.
>
> Since the program accepts several trees as input (and runs the  
> analysis
> for each tree outputting the difference in likelihoods for each one) I
> was wondering if there's some way to do it through bioperl?
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From gowthaman.ramasamy at sbri.org  Mon Dec 18 17:19:09 2006
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 18 Dec 2006 14:19:09 -0800
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>


Hi List,
Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

Many thanks in advance,
gowtham


From cjfields at uiuc.edu  Mon Dec 18 17:33:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:33:34 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <FBD2CED3-EBE7-4CB9-8969-70C7A5931A04@uiuc.edu>


On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.

AlignIO::stockholm is where I'll initially test it out.

> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.

I agree.

> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.

I would probably start with a general Bio::Annotation::MetaI abstract  
class, which supplements AnnotationI with general meta-specific  
methods (meta, meta_text, named_meta, etc)?  Implement this in  
whatever way one wanted (RNA structure as strings, quality data as  
arrays, etc) under the constraints of the interface description.

Multiple meta objects, potentially of mixed data types, could be  
added in an AnnotationCollection along with other Bio::Annotation  
data, or stored in a nested meta-specific AnnotationCollection object  
(I favor the former as it's simpler).  So you could have an  
alignment, sequence, seqfeature (anything that is AnnotatableI) with  
a regular AnnotationCollection also containing possibly multiple meta  
objects, each meta object also containing possibly more than one set  
of meta data.

The key issue I have is whether or not to constrain these to  
describing positional data, similar to Bio::Seq::Meta, by ensuring  
that the data is_flush(), etc.  My current inclination is 'no', and  
to have a separate abstract class which describes these methods,  
implementing those separately.

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki

The initial use case for this would be simple data strings for  
alignment data.  I already have a partial implementation in place for  
stockholm using Bio::Seq::Meta (which led me to this proposal!).  I  
like Chris M.'s idea of ensuring that meta implementations use some  
sort of formalized ontology, but I'll probably start out very simple  
and work up from there.

chris


From cjfields at uiuc.edu  Mon Dec 18 17:38:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:38:14 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <4586BE99.7020308@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
	<4586BE99.7020308@sendu.me.uk>
Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu>


On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>>
>> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
>>
>>> Lincoln Stein wrote:
>>>> This is very embarassing for me, particularly since I spent a lot
>>>> of time validating that Bio::Graphics was working properly before
>>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>>> release? How about a 1.5.2.1release?
>>>
>>> I'm happy to try a point release for critical bug fixes. Why don't
>>> you commit the necessary fixes to branch-1-5-2 and let me know when
>>> you're happy, and I'll do 1.5.2.1.
>>
>> Feel free to do that, but why not make a 1.5.3 off the main trunk?
>> 1.5.2.1 may be adding more to the version confusion
>> (developer/stable/point-release/etc) than it is worth,
>
> My feeling is that 1.5.3 should be reserved for some significant  
> changes
> and new features, and not just a few bug fixes. I'd say this causes  
> less
> confusion amongst users - they can associate '1.5.2' with a certain  
> API
> and feature-set, and the specific name of the file they download and
> install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
> matter at all to them.
>
> I also won't have to make some major announcement about it; it will
> simply be the most recent developer version of bioperl available so  
> new
> users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
> 1.5.2 users will only feel compelled to get it if they suffer from the
> bugs fixed.
>
>
>> and there is no shame in releasing new developer versions every few
>> weeks.
>
> I think doing frequent releases are inadvisable; such a quick release
> won't have had much testing so we shouldn't encourage people to  
> install
> it: encouragement is implicit when a major new version comes out like
> 1.5.3. People who want to live on the edge can and should be using a
> CVS checkout.

I thought that 1.5.2 was considered a point release for the 1.5 dev  
series, for bug fixes along with the potential for added/experimental  
features.  Similarly, 1.6.x releases would be point releases for bug  
fixes only with all tests passing (no added features since it is a  
stable release series).  I guess one could reason that 1.5.x releases  
have both bug fixes and new features, while 1.5.x.y releases are  
simply bug fixes for the 1.5.x branch (no new features).  We probably  
should add something to the FAQ and maybe make a few changes to the  
1.5.2 wiki page.

I think having a 1.5.2.1 release is feasible as a quick one-off to  
get Lincoln's fixes in, since you would make them off the 1.5.2  
branch anyway (so I guess it could be considered a bug release from  
that branch).  It's probably not something we should make a habit of,  
but then again I'm not the Pumpkin!

chris


From bix at sendu.me.uk  Mon Dec 18 17:50:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 22:50:11 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
Message-ID: <45871B23.8070103@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
> 
>> For example, the first 'drop' happens for the 101st species which is
>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>> longer a leaf, so the overall number of leaves does not increase.
>
> Makes sense now.  I personally would consider this a bug since the 
> results are unexpected (so the docs need to be modified in order to 
> clarify).  Some say tomato...
> 
> I suppose this is one of the issues one might run into when using NCBI 
> taxonomy to build trees.

No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
deliberately then does:

# simple paths are contracted by removing degree one nodes
$tree->contract_linear_paths;

Because that is what Gabriel's script originally did.


>> Gabriel, please confirm this isn't a bug, or suggest how you propose to
>> see your taxa when they are not all leaves of the tree.
> 
> Having the nodes appear internally seems semantically correct to me.  Is 
> there any other way?

I suppose if we want to see all the input species output again we have 
to make contract_linear_paths() aware of nodes we want to keep, even 
when they are degree one nodes. Gabriel, is that what you want to see?


From cjfields at uiuc.edu  Mon Dec 18 18:14:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:14:23 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <45871B23.8070103@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
Message-ID: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>


On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'  
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now.  I personally would consider this a bug since the  
>> results are unexpected (so the docs need to be modified in order  
>> to clarify).  Some say tomato...
>> I suppose this is one of the issues one might run into when using  
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl  
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.

I think you misunderstood me.  The tree is fine; the data used to  
make the tree (NCBI taxonomy) is the issue.  One of the clear caveats  
that NCBI attaches to their taxonomy data is that should not be the  
'primary source for taxonomic or phylogenetic information':

http://tinyurl.com/y3k624

I think it works as a good guide as long as one takes the above into  
consideration.  That and the fact that not all taxids attached to  
sequence data will represent leaf nodes.

chris


From cjfields at uiuc.edu  Mon Dec 18 18:15:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:15:56 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
	<6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu>


On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote:

>
> I agree with everything Heikki is saying, I just wanted to highlight
> one paragraph:
>
>> The problem I see with undefined, totally open meta annotation, is
>> that if you
>> can put anything in there, it is also totally confusing to a user.
>> If you can
>> put anything in, how do you know what to get get out and know that
>> it is
>> there?
>
> One solution is to give your annotation/metadata-model formal
> computational semantics and use ontologies to give additional
> semantics to your metadata tags. This provides both user information
> in the form of documentation, and a means of specifying to the
> computer exactly what should be done with the tags.
>
> This is probably overkill for bioperl; but if the use cases being
> proposed do lean in the direction of a new metadata system that is
> not necessarily backwards compatible with the existing one, then I'd
> recommend checking out what's already out there before re-inventing
> the wheel. Perl RDF libraries are getting a little better.
>
> If anyone is interested in pursuing this sort of thing (probably on a
> branch), let me know
...

I like the idea of of using ontologies (although that's one of my  
many weak points!).  I'll likely start off with simple examples using  
meta data initially, then progress from there.  It is a developer  
series, after all!

Thanks everybody!  I think I have an idea on how to at least get  
started.

chris

From bix at sendu.me.uk  Mon Dec 18 18:27:15 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:27:15 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
Message-ID: <458723D3.4010908@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>>> For example, the first 'drop' happens for the 101st species which is
>>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>>>> longer a leaf, so the overall number of leaves does not increase.
>>>
>>> Makes sense now.  I personally would consider this a bug since the 
>>> results are unexpected (so the docs need to be modified in order to 
>>> clarify).  Some say tomato...
>>> I suppose this is one of the issues one might run into when using 
>>> NCBI taxonomy to build trees.
>>
>> No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
>> deliberately then does:
>>
>> # simple paths are contracted by removing degree one nodes
>> $tree->contract_linear_paths;
>>
>> Because that is what Gabriel's script originally did.
> 
> I think you misunderstood me.  The tree is fine; the data used to make 
> the tree (NCBI taxonomy) is the issue.

In what way is it the issue? The database is also fine as far as I can 
see, in so far as it is not causing any problems in this instance.

Gabriel asked for a tree featuring a species and its subspecies. The 
NCBI taxonomy database provided Bioperl the correct data to build such a 
tree. Then Gabriel asked to remove the degree one nodes of his tree. His 
problem was that doing that happened to (correctly) remove the species 
node. If he wants to see both his species and his subspecies he must 
either not remove degree one nodes, or alter the method of doing so to 
keep desired nodes. There is no possible way for NCBI to improve matters 
here.


From bix at sendu.me.uk  Mon Dec 18 18:45:59 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:45:59 +0000
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45872837.6050403@sendu.me.uk>

Gowthaman Ramasamy wrote:
> Hi List, Is there any module in bioperl which can find out the primer
> binding sites in a genomic sequence. I am interested in finding
> locations with few mismatches along the primer...not just the exact
> match (which is a very trivial task)

There's no module dedicated to that task, but Bioperl may help you to
answer the question.

Probably the easiest/reliable/clear thing to do is to do a Blast with
appropriate settings for short sequence with few mismatches. You can
write a script to only consider hits for your forward primer that are a
'primable' distance from a hit to your reverse primer (and check their
orientations are correct as well).

Or use some e-pcr tool.


From Kevin.M.Brown at asu.edu  Mon Dec 18 18:52:20 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 18 Dec 2006 16:52:20 -0700
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu>

A function I use to find the first landing site for a primer.  Should be
modifiable to handle multiple occurences:

=head1 C<match>

Match searches for a near alignment between two strings and returns the
position
at which the two strings align.  Match is based on 80% conformation

	match($this, $in_that)
	
=cut

sub match
{
	my ($primer, $gene) = @_;
	my $start   = 0;
	my $pattern = "";
	for (my $i = 0 ; $i < length($primer) ; $i++)
	{
		$pattern .= substr($primer, $i, 1);
		pos($gene) = 0;
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
		else
		{
			$start = 0;
			chop($pattern);
			$pattern .= '.';
		}
	}
	if ($pattern =~ /\.$/)
	{
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
	}
	$pattern =~ s/\.//g;

	if ((length($pattern) / length($primer)) > .8)
	{

		#print $start . "\n";
		return $start;
	}
	return 0;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, December 18, 2006 4:46 PM
> To: Gowthaman Ramasamy
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] module to find out primer binding 
> sites in a genome sequence
> 
> Gowthaman Ramasamy wrote:
> > Hi List, Is there any module in bioperl which can find out 
> the primer
> > binding sites in a genomic sequence. I am interested in finding
> > locations with few mismatches along the primer...not just the exact
> > match (which is a very trivial task)
> 
> There's no module dedicated to that task, but Bioperl may help you to
> answer the question.
> 
> Probably the easiest/reliable/clear thing to do is to do a Blast with
> appropriate settings for short sequence with few mismatches. You can
> write a script to only consider hits for your forward primer 
> that are a
> 'primable' distance from a hit to your reverse primer (and check their
> orientations are correct as well).
> 
> Or use some e-pcr tool.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From torsten.seemann at infotech.monash.edu.au  Mon Dec 18 18:52:58 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 19 Dec 2006 10:52:58 +1100
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <458729DA.9030909@infotech.monash.edu.au>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

This FAQ question may help:
http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

This software may help:
http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sdavis2 at mail.nih.gov  Mon Dec 18 21:16:19 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 18 Dec 2006 21:16:19 -0500
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45874B73.7010600@mail.nih.gov>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)
>   

See here:

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

It is designed for exactly this task, is very fast, is available as an 
executable or web-based (though watch the usage requirements), and the 
output can be parsed rather easily.

Sean

From cjfields at uiuc.edu  Mon Dec 18 21:30:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 20:30:04 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <458723D3.4010908@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>

>> I think you misunderstood me.  The tree is fine; the data used to  
>> make
>> the tree (NCBI taxonomy) is the issue.
>
> In what way is it the issue? The database is also fine as far as I can
> see, in so far as it is not causing any problems in this instance.

I should maybe have clarified a bit more: what I said has nothing to  
do with the structure of the database itself.  I was just pointing  
out that NCBI Taxonomy isn't the best source of data for building a  
phylogenetic tree, something NCBI also points out (the link in my  
last post).  Not a big deal, really.

> Gabriel asked for a tree featuring a species and its subspecies. The
> NCBI taxonomy database provided Bioperl the correct data to build  
> such a
> tree. Then Gabriel asked to remove the degree one nodes of his  
> tree. His
> problem was that doing that happened to (correctly) remove the species
> node. If he wants to see both his species and his subspecies he must
> either not remove degree one nodes, or alter the method of doing so to
> keep desired nodes. There is no possible way for NCBI to improve  
> matters
> here.

Actually, there isn't any way they could w/o digging through the  
literature in many cases.  The problem is incomplete taxonomic  
information for nodes derived from older sequence data, where a genus  
and species was designated but nothing else (strain, etc) is known.

Again, I merely was pointing out what I had mentioned above.  I  
wasn't criticizing you, Gabriel, or the methodology here.  Honest!

chris

From avilella at gmail.com  Mon Dec 18 16:43:27 2006
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 18 Dec 2006 21:43:27 +0000
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586F3A2.4010607@ub.edu>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
	<4586F3A2.4010607@ub.edu>
Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com>

Filipe, if you need to create the ctl file but not run the job, you
can use the "prepare" method in Codeml run.

Also, there is a tmpdir and save_tempfiles method that will keep the
files where you want. About the naming, you can add a ".tree" and
".aln" extension to the tempnames if you want, by altering the
$temptreefile and $tempseqfile variables in
bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version).

If you want, you can also add a couple of getters/setters there:

sub alnfilename{
    my $self = shift;

    return $self->{'alnfilename'} = shift if @_;
    return $self->{'alnfilename'};
}

and subtitute those $tempseqfile io calls for you
$self->{'alnfilename'} io calls.

$codeml->alnfilename("/path/name");
$codeml->prepare;
...
$codeml->run;

What I use to do is to have the aln and tree files in a different
place. Codeml will create the tmp files for running somewhere, and
then delete all the stuff when done.

Cheers,

    Albert.

On 12/18/06, Filipe Garrett <fgarret at ub.edu> wrote:
>
> Hi Jason,
>
> This question is related with the one I made previously today.
> I need to run codeml with 3 tree topologies. I looked on codeml module
> but it only accepts one tree as input so I thought of using the codeml
> module to prepare all the files and then I would just have to run the
> codeml with the new tree file in batch. But for that I need to know
> which one is the ctl file.
>
> any idea?
> FG
>
> Jason Stajich wrote:
> > They are temporary names so they are deliberately random and there is no
> > intention of you needing them after a run since it to be cleaned up on
> > the fly. We use an internal method for generating tempfiles that takes
> > care of cleanup afterwards.  I suppose since we do all the work within a
> > temp directory that is cleaned up, one could have a fixed name for the
> > tree, alignment, and ctl files but honestly we never expect people to be
> > reading these filenames as they are intended to be transient.
> >
> > What problem are you having that you need the filename?
> >
> > -jason
> > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> >
> >> Hi all,
> >>
> >> does anyone knows how to get the name of the .ctl file created by the
> >> PAML module? Inside the tmp directory there are 2 files with random
> >> names (tree and ctl). Why do they have random names?? Wouldn't it be
> >> easier to assign them a fixed name?? For instance "codeml.ctl" and
> >> "tree.nwk"??
> >>
> >> thanks in adv,
> >> FG
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> > http://jason.open-bio.org/
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From valiente at lsi.upc.edu  Mon Dec 18 23:18:20 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 19 Dec 2006 13:18:20 +0900
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>

Thanks a lot for the prompt answer and follow-up discussion. I think  
this turned out not to be a bug in the merge_lineage() code but a  
direct consequence of building a phylogenetic tree instead of a  
taxonomic tree, aka with internal node labels.

In order to reconstruct the NCBI taxonomy for the set of species  
present in a given phylogenetic tree, the only reasonable work-around  
seems to be a first step of merging lineages and contracting linear  
paths with the current implementation, followed by a second step of  
restricting the given phylogenetic tree to the set of species present  
in the obtained NCBI taxonomy. But this does not affect the code for  
merge_lineage().

Gabriel

>>> I think you misunderstood me.  The tree is fine; the data used to  
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I  
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing  
> to do with the structure of the database itself.  I was just  
> pointing out that NCBI Taxonomy isn't the best source of data for  
> building a phylogenetic tree, something NCBI also points out (the  
> link in my last post).  Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build  
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his  
>> tree. His
>> problem was that doing that happened to (correctly) remove the  
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing  
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve  
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the  
> literature in many cases.  The problem is incomplete taxonomic  
> information for nodes derived from older sequence data, where a  
> genus and species was designated but nothing else (strain, etc) is  
> known.
>
> Again, I merely was pointing out what I had mentioned above.  I  
> wasn't criticizing you, Gabriel, or the methodology here.  Honest!
>
> chris


From cjfields at uiuc.edu  Mon Dec 18 23:41:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 22:41:16 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
	<287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
Message-ID: <D72C19DB-B551-414E-96AF-113B32A34BCB@uiuc.edu>


On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote:

> Thanks a lot for the prompt answer and follow-up discussion. I  
> think this turned out not to be a bug in the merge_lineage() code  
> but a direct consequence of building a phylogenetic tree instead of  
> a taxonomic tree, aka with internal node labels.
>
> In order to reconstruct the NCBI taxonomy for the set of species  
> present in a given phylogenetic tree, the only reasonable work- 
> around seems to be a first step of merging lineages and contracting  
> linear paths with the current implementation, followed by a second  
> step of restricting the given phylogenetic tree to the set of  
> species present in the obtained NCBI taxonomy. But this does not  
> affect the code for merge_lineage().
>
> Gabriel

I did notice one thing, though it's minor: if you use the option to  
retrieve the data from Entrez, a few species aren't found (even  
though they show up in a local taxonomy search).  I think both were  
E. coli strains.

chris

From DGroskreutz at twt.com  Tue Dec 19 02:00:40 2006
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Tue, 19 Dec 2006 01:00:40 -0600
Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office.
Message-ID: <OFEB7AC000.56E72ED8-ON86257249.002683B4-86257249.002683B4@twt.com>


I will be out of the office starting  12/18/2006 and will not return until
01/02/2007.


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:20:56 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:20:56 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:27:59 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:27:59 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk>

Sorry, problem solved.

Mick 

-----Original Message-----
From: michael watson (IAH-C) 
Sent: 19 December 2006 12:21
To: bioperl-l at lists.open-bio.org
Subject: Problems with EMBL entries and fasta IDs?

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From roest216 at student.otago.ac.nz  Tue Dec 19 04:15:55 2006
From: roest216 at student.otago.ac.nz (Stephan Roessner)
Date: Tue, 19 Dec 2006 22:15:55 +1300
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>

Dear support team,

I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
gbrowse.
The installation seems to work (except of the test failures) but the
gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
of course it requires 1.52.

Is there a chance to find out what went wrong?

thanks a lot,
Stephan


From bix at sendu.me.uk  Tue Dec 19 10:12:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 15:12:39 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
Message-ID: <45880167.9010605@sendu.me.uk>

Stephan Roessner wrote:
> Dear support team,
> 
> I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> gbrowse.
> The installation seems to work (except of the test failures) but the
> gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> of course it requires 1.52.
> 
> Is there a chance to find out what went wrong?

Nothing went wrong with the Bioperl installation (well, expect there 
shouldn't have been any test failures - can you post those please?). 
gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
me exactly where you downloaded gbrowse from and how you went about 
installing it, and provide the exact, complete error message you got 
from it, I might be able help the authors fix the problem.

Or I'm pretty sure they can figure it our for themselves :)

From cjfields at uiuc.edu  Tue Dec 19 11:05:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 10:05:01 -0600
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>


On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:

> I really don't think the BioPerl version detection is wrong.  I  
> actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
> try reinstalling BioPerl and providing the --uninst 1 argument to  
> remove
> older versions of BioPerl:
>
>   sudo ./Build install --uninst 1
>
> Scott

Could having two Bioperl instances explain the test failures?  I'm  
not sure (maybe Sendu can answer this), but I would assume  
Module::Build uses the current working directory for test runs.

chris


From bix at sendu.me.uk  Tue Dec 19 12:02:34 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:02:34 +0000
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
Message-ID: <45881B2A.8060907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:
> 
>> I really don't think the BioPerl version detection is wrong.  I actually
>> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
>> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
>> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
>> have seen this happen when more than one BioPerl instance is installed
>> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
>> try reinstalling BioPerl and providing the --uninst 1 argument to remove
>> older versions of BioPerl:
>>
>>   sudo ./Build install --uninst 1
>>
>> Scott
> 
> Could having two Bioperl instances explain the test failures?  I'm not 
> sure (maybe Sendu can answer this), but I would assume Module::Build 
> uses the current working directory for test runs.

It does, so that shouldn't be an issue for the test failures.


From ferraria at gmail.com  Tue Dec 19 11:40:05 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 17:40:05 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
Message-ID: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari

From bix at sendu.me.uk  Tue Dec 19 12:06:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:06:03 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>	
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <45881BFB.7020008@sendu.me.uk>

Scott Cain wrote:
> I really don't think the BioPerl version detection is wrong.  I actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.

Yes, I saw that, which is why I thought I must be looking at something 
different to what the OP had installed.


> My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove
> older versions of BioPerl:
> 
>   sudo ./Build install --uninst 1

My confusion is that he has definitely installed 1.5.2 and this version 
is being polled for its version number (by something!) and returning the 
correct '1.0050021', whilst the something expects '1.52'. Anyway, this 
can only be resolved if Stephan provides the real error message and its 
context.

From cjfields at uiuc.edu  Tue Dec 19 12:27:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 11:27:24 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>


On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:

> Hi all,
>
> I've just installed BioPerl 1.5.2 (devel) on a linux mandrake  
> machine with
> the cpan shell.
> I want to use the Bio::DB::EUtilities to retrieve data (id's) from  
> NCBI
> 'gene' database (first step of my pipeline).
>
> But the installation of this package doesn't seem to be correct :
> The simple example given on the documentation doesn't work. (this  
> one :
> http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
>
> Here is the error message I got :
> "Can't use an undefined value as an ARRAY reference at
> /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> In the UserAgent package, line 779 is in the private "_need_proxy"
> subroutine and corresponds to the code :    ...if (@{ $self-> 
> {'no_proxy'} })
> ...
>
> If I comment this line in the UserAgent package and the  
> corresponding "}",
> the example works. But obviously, I prefer to solve the problem in  
> a regular
> way :)
>
> Indeed, my computer accesses the internet via a http proxy and I  
> didn't tell
> this to BioPerl at any moment.
> As I read on the BioPerl Wiki site, I tried to configure an  
> $http_proxy
> environment variable but it still doesn't work.
>
> One last maybe important information is that I saw during the  
> installation
> that the tests 't/EUtilities' were skipped because of an unrevealed  
> reason.
>
>
> So finally I got two questions :
> 1. Is there somebody who can figure out what is my problem ?
> 2. At the moment, is the Bio::DB::EUtilities package really  
> efficient or
> using directly the NCBI eutilities with the LWP::Simple package  
> could be an
> good alternative ?
>
> Many thanks in advance,
> Best Regards,
> Anthony Ferrari

First things first: at the moment the BioPerl EUtilities interface is  
very experimental (as specifically outlined in the POD), so I can't  
really recommend it for production use until the API is cleaned up.   
However, I do appreciate any feedback or comments re:EUtilities (the  
reason it's out there in the 1.5.2 release).

You might check out this bug report, which relates directly to your  
issue:

http://bugzilla.open-bio.org/show_bug.cgi?id=2109

After I worked out the proxy issue Torsten got it working.  Let me  
know if this doesn't help or fix the problem.

chris


From cain at cshl.edu  Tue Dec 19 10:31:50 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 19 Dec 2006 10:31:50 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <45880167.9010605@sendu.me.uk>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
Message-ID: <1166542310.6981.119.camel@localhost.localdomain>

I really don't think the BioPerl version detection is wrong.  I actually
don't check Bio::Root::Version::VERSION in Makefile.PL, I check
Bio::Graphics::Panel->api_version.  When it doesn't find the correct
api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
have seen this happen when more than one BioPerl instance is installed
and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
try reinstalling BioPerl and providing the --uninst 1 argument to remove
older versions of BioPerl:

  sudo ./Build install --uninst 1

Scott


On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> Stephan Roessner wrote:
> > Dear support team,
> > 
> > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> > gbrowse.
> > The installation seems to work (except of the test failures) but the
> > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> > of course it requires 1.52.
> > 
> > Is there a chance to find out what went wrong?
> 
> Nothing went wrong with the Bioperl installation (well, expect there 
> shouldn't have been any test failures - can you post those please?). 
> gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
> me exactly where you downloaded gbrowse from and how you went about 
> installing it, and provide the exact, complete error message you got 
> from it, I might be able help the authors fix the problem.
> 
> Or I'm pretty sure they can figure it our for themselves :)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/67132cb3/attachment.bin 

From ferraria at gmail.com  Tue Dec 19 12:06:31 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 18:06:31 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <b2ec54b90612190906s2b4ddbf8g9b591372a85fdcd@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari

From stewarta at nmrc.navy.mil  Tue Dec 19 13:49:57 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Tue, 19 Dec 2006 13:49:57 -0500
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>

I see that Bio::Tools::Glimmer documentation clearly states that this  
module is intended for use with GlimmerM (eukaryotic version) only.   
I am wondering if anyone can recall any talk about adopting  
Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?   
I've searched the list history with little luck other than someone  
else  asking a similar question.

If not, does anyone have any thoughts on how difficult it might be to  
implement support for glimmer2/3 result parsing?  Perhaps just a  
matter of editing the _parse_predictions method?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From rvosa at sfu.ca  Tue Dec 19 13:53:47 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 10:53:47 -0800
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/276348b7/attachment.pl 

From cjfields at uiuc.edu  Tue Dec 19 14:31:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 13:31:17 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu>


On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote:

> I see that Bio::Tools::Glimmer documentation clearly states that this
> module is intended for use with GlimmerM (eukaryotic version) only.
> I am wondering if anyone can recall any talk about adopting
> Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?
> I've searched the list history with little luck other than someone
> else  asking a similar question.

There is a thread here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ 
focus=12546

> If not, does anyone have any thoughts on how difficult it might be to
> implement support for glimmer2/3 result parsing?  Perhaps just a
> matter of editing the _parse_predictions method?

It depends on how different the various Glimmer formats are; I'll  
have to look at the ones Torsten added in CVS.  You could always try  
modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM  
reports, but based on the mail list thread above it may not be so  
straightforward.

chris


From MEC at stowers-institute.org  Tue Dec 19 14:57:48 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 19 Dec 2006 13:57:48 -0600
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
Message-ID: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>

Lincoln and fellow Bio::DB::SeqFeature travelers,

I find that using bp_seqfeature_load.PLS to load subfeatures of genes
already loaded using bp_seqfeature_load.PLS fails with 

------------- EXCEPTION  -------------
MSG: FBgn0017545 doesn't have a primary id
STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
STACK toplevel
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
ad.PLS:76

Where FBgn0017545 is the ID of a gene previously loaded.

I am unsure how to remedy my situation and welcome any advise on correct
or improved approach to my problem.

Here's some detail if it helps.  I am developing a pipeline to design a
microarray probes capable of distinguishing among splice variants in
drosophila (using latest Flybase dmel_r5.1 annotation).  So I

1) load a filtered selection of Flybase annotation using
bp_seqfeature_load.  (for testing purposes, I am using a single gene's
worth of annotation, FBgn0017545.gff, attached).  This is done as
follows:

	> bp_seqfeature_load.PLS  --create FBgn0017545.gff 

2) analyze all the genes in the database, and create GFF3 output each
feature of which has a 'Parent' that is a previously loaded gene (i.e.
FBgn0017545).  (These features represent the unique introns, splice
sites, and exonic design targets. Output of this analysis,
FBgn0017545_matd.gff, is also attached)

3) load these analysis results into the same database, as follows:

	> bp_seqfeature_load.PLS          FBgn0017545_matd.gff

It is at this point that I get the above error.

However, I don't get any error and the data loads fine if I load the two
files together, as follows:

	> bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
FBgn0017545_matd.gff)

So, I suspect that either I am misunderstanding when/how to use
bp_seqfeature_load.PLS or else this use case has not yet arisen and must
be provided for somehow.

I am running against bioperl-live

Thanks for your thoughts and assistance,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

From Kevin.M.Brown at asu.edu  Tue Dec 19 16:46:19 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 19 Dec 2006 14:46:19 -0700
Subject: [Bioperl-l] Bio::SimpleAlign
Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>

I'm working on a script that plays around with alignments of sequences
and one of the things I noticed is that the code for the match method
does not seem to actually use the start/end information when creating
the match between objects in the alignment.  Maybe I'm misunderstanding
what the alignment is supposed to hold in terms of sequence.  The
alignment objects I build up are based on the sequence of a gene and the
sequences of the primers that amplify that gene.

$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[0]->seq(),
				-id    => $seq[0]->id(),
				-start => $start,
				-end => $start + $seq[0]->length() - 1,
				-strand => 1
			 )
);
$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[1]->seq(),
				-id    => $seq[1]->id(),
				-start => $stop,
				-end => $stop + $seq[1]->length() - 1,
				-strand => -1
				)
);

So, you can see I input a start and stop point for the primer, but when
I use the match function all it does is match the first character of the
gene sequence to the first char of the primer sequences, then the second
gene char to the second in each primer, etc...  This doesn't seem to fit
with the documentation and seems odd that there would be holders for the
start/stop points and not use them when doing things like matching of
sequences in an alignment.


From bix at sendu.me.uk  Tue Dec 19 17:01:22 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 22:01:22 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
Message-ID: <45886132.7050505@sendu.me.uk>

Rutger Vos wrote:
> Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird
> version-string-translation way that makes 5.5 and 5.005 equivalent also?

Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 
5.500 however.

From lstein at cshl.edu  Tue Dec 19 16:58:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 19 Dec 2006 16:58:24 -0500
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
In-Reply-To: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com>

Hi Malcom,

Your second guess was right. The use case of augmenting an existing gene
with additional splice forms isn't provided for. You can get the
functionality by making direct calls to Bio::DB::SeqFeature::Store methods:

my @genes = $db->get_features_by_name('FBgn0017545');
@genes == 1 or die "Didn't get exactly one gene";
my $parent = $genes[0];

my $parent = $genes[0];
my $chr    = $parent->seq_id;
my $start  = $parent->start;
my $end    = $parent->end;
my $strand = $parent->strand;

my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA',
                       -source      => 'added',
                       -seq_id   => '4r',
                       -strand   => $strand,
                       -start    => $start+10,
                       -end      => $end,
                       );
$parent->add_SeqFeature($new_splice_form);

for my $pos ([$start+10,$start+100],[$start+200,$end]) {
  my ($e_start,$e_end) = @$pos;
  my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon',
                                      -store       => $db,
                      -seq_id      => '4r',
                      -strand     => $strand,
                      -start       => $e_start,
                      -end         => $e_end);
  $new_splice_form->add_SeqFeature($exon);
}

I found a bug in updating the seqfeature database when I wrote this script,
so you'll have to get the latest biperl live. I think you can use this to
write a splice form updating script.

In order to support the idea of adding new splice forms to an existing gene
using the GFF3 format, I will have to either modify the loader, or write a
separate script (probably better to do the latter). It shouldn't be hard if
you'd like to give it a try.

Lincoln

On 12/19/06, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln and fellow Bio::DB::SeqFeature travelers,
>
> I find that using bp_seqfeature_load.PLS to load subfeatures of genes
> already loaded using bp_seqfeature_load.PLS fails with
>
> ------------- EXCEPTION  -------------
> MSG: FBgn0017545 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
> ad.PLS:76
>
> Where FBgn0017545 is the ID of a gene previously loaded.
>
> I am unsure how to remedy my situation and welcome any advise on correct
> or improved approach to my problem.
>
> Here's some detail if it helps.  I am developing a pipeline to design a
> microarray probes capable of distinguishing among splice variants in
> drosophila (using latest Flybase dmel_r5.1 annotation).  So I
>
> 1) load a filtered selection of Flybase annotation using
> bp_seqfeature_load.  (for testing purposes, I am using a single gene's
> worth of annotation, FBgn0017545.gff, attached).  This is done as
> follows:
>
>         > bp_seqfeature_load.PLS  --create FBgn0017545.gff
>
> 2) analyze all the genes in the database, and create GFF3 output each
> feature of which has a 'Parent' that is a previously loaded gene (i.e.
> FBgn0017545).  (These features represent the unique introns, splice
> sites, and exonic design targets. Output of this analysis,
> FBgn0017545_matd.gff, is also attached)
>
> 3) load these analysis results into the same database, as follows:
>
>         > bp_seqfeature_load.PLS          FBgn0017545_matd.gff
>
> It is at this point that I get the above error.
>
> However, I don't get any error and the data loads fine if I load the two
> files together, as follows:
>
>         > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
> FBgn0017545_matd.gff)
>
> So, I suspect that either I am misunderstanding when/how to use
> bp_seqfeature_load.PLS or else this use case has not yet arisen and must
> be provided for somehow.
>
> I am running against bioperl-live
>
> Thanks for your thoughts and assistance,
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From rvosa at sfu.ca  Tue Dec 19 23:23:20 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 20:23:20 -0800
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/17ec7ff3/attachment.pl 

From cjfields at uiuc.edu  Wed Dec 20 01:16:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 00:16:47 -0600
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>


On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:

> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the  
> NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more  
> loosely
> an OTU).
>
> The object would primarily function as a way to relate nodes in  
> trees to
> sequences in an alignment (a foreign key that both nodes and  
> sequences refer
> to), and secondarily as the keeper of the canonical name of the  
> OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node  
> named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer  
> to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).

Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence  
objects; at the moment LocatableSeqs don't store their own annotation  
but they could easily be made or subclassed to be AnnotatableI (i.e.  
they can store annotation collections).  I recently made SimpleAlign  
Annotatable; Jason has also made SimpleAlign implement  
FeatureHolderI, so alignments can store SeqFeatures as well; he may  
have his own designs here.

There may be a wide variety of ways to go about this.  I would  
probably do the following (bear in mind I'm a microbiologist, not a  
computer scientist).  If one could add trees as annotation to the  
alignment (i.e. if trees could be Annotation objects and kept in the  
SimpleAlign's annotation collection), and each sequence in the  
alignment contained reference to a node object of the tree (i.e. if  
Bio::Taxon/Bio::Species objects could also be Annotation objects, but  
kept in a LocatableSeq annotation collection), both could refer to  
the same node object.  This may not be exactly what you want, but  
maybe it's close?

SimpleAlign->AnnoColln->Tree->OTU(Nodes)
    \----->LocSeqs-->AnnoColln-----/

I suppose this could also be done with Seqfeatures...

> I was thinking that a (possibly expanded) Bio::Species might work  
> if there
> was some sensible way of appending references to node and sequence  
> objects
> to it (or otherwise associate them with each other), but I am  
> writing *to
> solicit any and all suggestions*. I am looking for something  
> similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos

Sendu would be the best one to speak about Bio::Taxon and  
Bio::Species and may have some ideas on the above.  The current plan  
was to deprecate Bio::Species, but who knows?

chris

From heikki at sanbi.ac.za  Wed Dec 20 05:25:08 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 20 Dec 2006 12:25:08 +0200
Subject: [Bioperl-l] Bio::SimpleAlign
In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
Message-ID: <200612201225.08862.heikki@sanbi.ac.za>

Kevin,

Sequences that are added to the alignment are supposed to be *aligned*. 
SimpleAlign does not do it for you. It seems to me that you are adding 
sequences like this:

nnnnnnnnnnnnnnnnnnnn  1 - 20, "a short gene" 
nnnnnn               21 - 26 "a short primer after the gene"

when you should be doing this

nnnnnnnnnnnnnnnnnnnn        1 - 20, "a short gene" 
--------------------nnnnnn 21 - 26 "a short primer after the gene"

Note that the default way of displaying names in SimpleAlign 
is "name/start-end". The name usually are expected to refer to the sequence 
from which this subsequence is derived from. The displayname does not change 
if you add gaps.


Yours,
	-Heikki


On Tuesday 19 December 2006 23:46, Kevin Brown wrote:
> I'm working on a script that plays around with alignments of sequences
> and one of the things I noticed is that the code for the match method
> does not seem to actually use the start/end information when creating
> the match between objects in the alignment.  Maybe I'm misunderstanding
> what the alignment is supposed to hold in terms of sequence.  The
> alignment objects I build up are based on the sequence of a gene and the
> sequences of the primers that amplify that gene.
>
> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[0]->seq(),
> 				-id    => $seq[0]->id(),
> 				-start => $start,
> 				-end => $start + $seq[0]->length() - 1,
> 				-strand => 1
> 			 )
> );

If your sequence does not contain gaps and the numbering starts from one, you 
can let the object handle start/stop:

my $a = new Bio::LocatableSeq(
      -seq   => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
      -id    => 'A00001',
      -strand => 1
}


> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[1]->seq(),
> 				-id    => $seq[1]->id(),
> 				-start => $stop,
> 				-end => $stop + $seq[1]->length() - 1,
> 				-strand => -1
> 				)
> );
>
> So, you can see I input a start and stop point for the primer, but when
> I use the match function all it does is match the first character of the
> gene sequence to the first char of the primer sequences, then the second
> gene char to the second in each primer, etc...  This doesn't seem to fit
> with the documentation and seems odd that there would be holders for the
> start/stop points and not use them when doing things like matching of
> sequences in an alignment.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From ferraria at gmail.com  Wed Dec 20 06:04:16 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 12:04:16 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
Message-ID: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>

On 19/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:
>
> > Hi all,
> >
> > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake
> > machine with
> > the cpan shell.
> > I want to use the Bio::DB::EUtilities to retrieve data (id's) from
> > NCBI
> > 'gene' database (first step of my pipeline).
> >
> > But the installation of this package doesn't seem to be correct :
> > The simple example given on the documentation doesn't work. (this
> > one :
> > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
> >
> > Here is the error message I got :
> > "Can't use an undefined value as an ARRAY reference at
> > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > In the UserAgent package, line 779 is in the private "_need_proxy"
> > subroutine and corresponds to the code :    ...if (@{ $self->
> > {'no_proxy'} })
> > ...
> >
> > If I comment this line in the UserAgent package and the
> > corresponding "}",
> > the example works. But obviously, I prefer to solve the problem in
> > a regular
> > way :)
> >
> > Indeed, my computer accesses the internet via a http proxy and I
> > didn't tell
> > this to BioPerl at any moment.
> > As I read on the BioPerl Wiki site, I tried to configure an
> > $http_proxy
> > environment variable but it still doesn't work.
> >
> > One last maybe important information is that I saw during the
> > installation
> > that the tests 't/EUtilities' were skipped because of an unrevealed
> > reason.
> >
> >
> > So finally I got two questions :
> > 1. Is there somebody who can figure out what is my problem ?
> > 2. At the moment, is the Bio::DB::EUtilities package really
> > efficient or
> > using directly the NCBI eutilities with the LWP::Simple package
> > could be an
> > good alternative ?
> >
> > Many thanks in advance,
> > Best Regards,
> > Anthony Ferrari
>
> First things first: at the moment the BioPerl EUtilities interface is
> very experimental (as specifically outlined in the POD), so I can't
> really recommend it for production use until the API is cleaned up.
> However, I do appreciate any feedback or comments re:EUtilities (the
> reason it's out there in the 1.5.2 release).
>
> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>


I carefully read this bug but that doesn't help because this has already
been modified in the now given GenericWebDBI.pm
So my problem does not come from a deep recursion loop.

As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w
t/EUtilities.t " to see what's really happening.
And actually, all tests are skipped because of the same message error
-> "Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

***
I tried the same command with the modified LWP::UserAgent package (which
means I comment the line 779 and the corresponding '}') and all 453 tests
passed.
But not always. I made the tests several times and  it often failed. And
always on a test called "eXXX->cookie->cookie() query key" (ending with
query key). In those cases, I got back a html message indicating that the
error was thrown by the internal sever of NCBI. So I guess that sometimes it
is just NCBI server fault (internal problem), and BioPerl is not implied..
But once more, I comment a line from a basic package so it is a bit
hazardous.
***

tony - a little bit lost.

From smane at vbi.vt.edu  Tue Dec 19 14:46:56 2006
From: smane at vbi.vt.edu (Shrinivasrao P. Mane)
Date: Tue, 19 Dec 2006 14:46:56 -0500
Subject: [Bioperl-l] Using Muscle parameter within bioperl
Message-ID: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>

Hi,
I need to run muscle using bioperl. This is how I do it in command line.

muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet

I used the following in perl script

my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');

The program runs and produces the result file but it doesn't create a  
log file nor does it stop sending output to STDOUT (-quiet).
Could anybody help me with this?
Thanks
Mane

From cjfields at uiuc.edu  Wed Dec 20 09:09:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 08:09:56 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>
>
> I carefully read this bug but that doesn't help because this has  
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
>
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/ 
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ 
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package  
> (which means I comment the line 779 and the corresponding '}') and  
> all 453 tests passed.
> But not always. I made the tests several times and  it often  
> failed. And always on a test called "eXXX->cookie->cookie() query  
> key" (ending with query key). In those cases, I got back a html  
> message indicating that the error was thrown by the internal sever  
> of NCBI. So I guess that sometimes it is just NCBI server fault  
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit  
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies.

EUtilities is set up to check for an env. proxy and also take a set  
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy  
to say this was a bug in LWP, but I think the problem is that  
something is undefined (i.e. an env. variable), or username/password.

 From the bug report, Torsten set his proxy variables using the  
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy.
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference.   
After the recursion fix, I'm assuming he made no changes to the env.  
settings, and according to the bug everything was fine (is that  
correct Tortsen?).

Also LWP::UserAgent has this:

--------------------------------------
"Load proxy settings from *_proxy environment variables. You might  
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy=http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these  
environment variables.

On systems with case insensitive environment variables there exists a  
name clash between the CGI environment variables and the HTTP_PROXY  
environment variable normally picked up by env_proxy(). Because of  
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY  
environment variable can be used instead."
--------------------------------------

chris

From bix at sendu.me.uk  Wed Dec 20 09:08:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:08:16 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <458943D0.10400@sendu.me.uk>

Shrinivasrao P. Mane wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
> 
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
> 
> I used the following in perl script
> 
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
> 
> The program runs and produces the result file but it doesn't create a  
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?

The Muscle arguments don't take dashed args. To make switches active you 
need to set them to some true value. So (-verbose => 1, quiet => 1, log 
=> 'inv.log'). Verbose may not do what you want since it is both a 
Bioperl option and a Muscle option; if you want the latter try using 
verbose => 1.

From bix at sendu.me.uk  Wed Dec 20 09:51:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:51:33 +0000
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
	<4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
Message-ID: <45894DF5.1060503@sendu.me.uk>

Chris Fields wrote:
> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I am looking for a bioperl object that can be abused to function as
>> a suitable 'taxon' object, where I mean 'taxon' as understood by
>> the NEXUS file format (i.e. not strictly an entity from a taxonomy,
>> but more loosely an OTU).
>> 
>> The object would primarily function as a way to relate nodes in 
>> trees to sequences in an alignment (a foreign key that both nodes
>> and sequences refer to), and secondarily as the keeper of the
>> canonical name of the OTU, such that a sequence named
>> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens
>> (constrained monophyly)' can still be understood to refer to the 
>> same thing - the OTU 'Homo sapiens sapiens' (for example).

I haven't had time to give your suggestions consideration, but I can say 
that I'm having to do the same thing for a bioperl-run module and my 
work-around is simply to set a custom name on my Bio::Taxon objects. To 
explain, I have the benefit that my tree is made up of Bio::Taxon 
objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to 
know which of my sequences corresponds to a particular taxon, I work out 
which of them has the id given by shift @{$taxon->name('seq_id')}.

Hardly ideal, but it works for now.


>> I was thinking that a (possibly expanded) Bio::Species might work
>>  if there was some sensible way of appending references to node and
>> sequence objects to it (or otherwise associate them with each
>> other), but I am writing *to solicit any and all suggestions*. I am
>> looking for something similar to Bio::Phylo::Taxa::Taxon.
>
> Sendu would be the best one to speak about Bio::Taxon and 
> Bio::Species and may have some ideas on the above.  The current plan
> was to deprecate Bio::Species, but who knows?

Given that we do plan to deprecate Bio::Species, I'd resist the 
temptation to expand on it. Use Bio::Taxon as a base if it has stuff you 
need, or base straight from Bio::Tree::Node if not.

From ferraria at gmail.com  Wed Dec 20 10:40:34 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 16:40:34 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
	<13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
Message-ID: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>

Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!)


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
>
> > You might check out this bug report, which relates directly to your
> > issue:
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> >
> > After I worked out the proxy issue Torsten got it working.  Let me
> > know if this doesn't help or fix the problem.
> >
> > chris
> >
> >
> > I carefully read this bug but that doesn't help because this has
> > already been modified in the now given GenericWebDBI.pm
> > So my problem does not come from a deep recursion loop.
> >
> > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > EUtilities.t " to see what's really happening.
> > And actually, all tests are skipped because of the same message error
> > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > ***
> > I tried the same command with the modified LWP::UserAgent package
> > (which means I comment the line 779 and the corresponding '}') and
> > all 453 tests passed.
> > But not always. I made the tests several times and  it often
> > failed. And always on a test called "eXXX->cookie->cookie() query
> > key" (ending with query key). In those cases, I got back a html
> > message indicating that the error was thrown by the internal sever
> > of NCBI. So I guess that sometimes it is just NCBI server fault
> > (internal problem), and BioPerl is not implied..
> > But once more, I comment a line from a basic package so it is a bit
> > hazardous.
> > ***
> >
> > tony - a little bit lost.
>
> I'm cc'ing Torsten as he has a bit more experience with proxies.
>
> EUtilities is set up to check for an env. proxy and also take a set
> proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> to say this was a bug in LWP, but I think the problem is that
> something is undefined (i.e. an env. variable), or username/password.
>
> From the bug report, Torsten set his proxy variables using the
> following:
>
> --------------------------------------
> "Note: I am behind an _authenticating_ proxy.
> My $http_proxy and $HTTP_PROXY are both set to
> http://USER:PASS at proxy.monash.edu.au:80/"
> --------------------------------------
>
> Note the lowercase for $http_proxy, which can make a difference.
> After the recursion fix, I'm assuming he made no changes to the env.
> settings, and according to the bug everything was fine (is that
> correct Tortsen?).
>
> Also LWP::UserAgent has this:
>
> --------------------------------------
> "Load proxy settings from *_proxy environment variables. You might
> specify proxies like this (sh-syntax):
>
>        gopher_proxy=http://proxy.my.place/
>        wais_proxy=http://proxy.my.place/
>        no_proxy="localhost,my.domain"
>        export gopher_proxy wais_proxy no_proxy
>
>      csh or tcsh users should use the setenv command to define these
> environment variables.
>
> On systems with case insensitive environment variables there exists a
> name clash between the CGI environment variables and the HTTP_PROXY
> environment variable normally picked up by env_proxy(). Because of
> this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> environment variable can be used instead."
> --------------------------------------
>
> chris
>

From cjfields at uiuc.edu  Wed Dec 20 11:10:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 10:10:48 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine>

Just to clarify: does it work it you don't have any proxy env. settings?
 
chris


  _____  

From: Anthony Ferrari [mailto:ferraria at gmail.com] 
Sent: Wednesday, December 20, 2006 9:41 AM
To: Chris Fields
Cc: bioperl-l List; Torsten Seemann
Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy


Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!) 


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote: 


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
> 
>
> I carefully read this bug but that doesn't help because this has
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
> 
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error 
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package 
> (which means I comment the line 779 and the corresponding '}') and
> all 453 tests passed.
> But not always. I made the tests several times and  it often
> failed. And always on a test called "eXXX->cookie->cookie() query 
> key" (ending with query key). In those cases, I got back a html
> message indicating that the error was thrown by the internal sever
> of NCBI. So I guess that sometimes it is just NCBI server fault 
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies. 

EUtilities is set up to check for an env. proxy and also take a set
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
to say this was a bug in LWP, but I think the problem is that
something is undefined ( i.e. an env. variable), or username/password.

>From the bug report, Torsten set his proxy variables using the
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy. 
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference. 
After the recursion fix, I'm assuming he made no changes to the env.
settings, and according to the bug everything was fine (is that
correct Tortsen?).

Also LWP::UserAgent has this:

-------------------------------------- 
"Load proxy settings from *_proxy environment variables. You might
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy= http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these 
environment variables.

On systems with case insensitive environment variables there exists a
name clash between the CGI environment variables and the HTTP_PROXY
environment variable normally picked up by env_proxy(). Because of 
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
environment variable can be used instead."
--------------------------------------

chris


From ferraria at gmail.com  Wed Dec 20 11:59:49 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 17:59:49 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine>
References: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
	<007901c72451$6ad540a0$15327e82@pyrimidine>
Message-ID: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>

First, I got a $http_proxy env. variable automatically defined by the
BioPerl installation (I don't define and export it in my .bash_profile).
So when I'm logging in,             $http_proxy=http://ip_adress:port/

Next step :  two solutions :
1) defining an $no_proxy env.variable in my .bash_profile.
It can be set to 'whatever'.

2) If I do not define '$no_proxy'; to make it work, I must call the
no_proxy() method on each Bio::DB::EUtilities object I create before I can
call the get_response() method on it.

(The bug is in the 'get_response' call)

And finally without 1) or 2) it doesn't work.

Tony

On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>  Just to clarify: does it work it you don't have any proxy env. settings?
>
One thing I didn't point out previously is that Bio::DB::GenericWebDBI
> inherits LWP::UserAgent.  You should be able to use $eutil->no_proxy()
> instead of setting it in your env.
> chris
>
>  ------------------------------
> *From:* Anthony Ferrari [mailto:ferraria at gmail.com]
> *Sent:* Wednesday, December 20, 2006 9:41 AM
> *To:* Chris Fields
> *Cc:* bioperl-l List; Torsten Seemann
> *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy
>
> Defining a "no_proxy" environment variable in my '.bashrc' file solved my
> problem. I set it to "localhost".
>
> It indeed corresponds to the line...       [    ...if (@{
> $self->{'no_proxy'} }) ...    ]   (I guess!)
>
>
> I really don't know why we are compelled to do this, but let's say that's
> the way it is.
>
> It works now !
>
> Thanks a lot.
>
> Tony
>
>
>
>
> On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> >
> > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
> >
> > > You might check out this bug report, which relates directly to your
> > > issue:
> > >
> > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> > >
> > > After I worked out the proxy issue Torsten got it working.  Let me
> > > know if this doesn't help or fix the problem.
> > >
> > > chris
> > >
> > >
> > > I carefully read this bug but that doesn't help because this has
> > > already been modified in the now given GenericWebDBI.pm
> > > So my problem does not come from a deep recursion loop.
> > >
> > > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > > EUtilities.t " to see what's really happening.
> > > And actually, all tests are skipped because of the same message error
> > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> > >
> > > ***
> > > I tried the same command with the modified LWP::UserAgent package
> > > (which means I comment the line 779 and the corresponding '}') and
> > > all 453 tests passed.
> > > But not always. I made the tests several times and  it often
> > > failed. And always on a test called "eXXX->cookie->cookie() query
> > > key" (ending with query key). In those cases, I got back a html
> > > message indicating that the error was thrown by the internal sever
> > > of NCBI. So I guess that sometimes it is just NCBI server fault
> > > (internal problem), and BioPerl is not implied..
> > > But once more, I comment a line from a basic package so it is a bit
> > > hazardous.
> > > ***
> > >
> > > tony - a little bit lost.
> >
> > I'm cc'ing Torsten as he has a bit more experience with proxies.
> >
> > EUtilities is set up to check for an env. proxy and also take a set
> > proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> > to say this was a bug in LWP, but I think the problem is that
> > something is undefined ( i.e. an env. variable), or username/password.
> >
> > From the bug report, Torsten set his proxy variables using the
> > following:
> >
> > --------------------------------------
> > "Note: I am behind an _authenticating_ proxy.
> > My $http_proxy and $HTTP_PROXY are both set to
> > http://USER:PASS at proxy.monash.edu.au:80/"
> > --------------------------------------
> >
> > Note the lowercase for $http_proxy, which can make a difference.
> > After the recursion fix, I'm assuming he made no changes to the env.
> > settings, and according to the bug everything was fine (is that
> > correct Tortsen?).
> >
> > Also LWP::UserAgent has this:
> >
> > --------------------------------------
> > "Load proxy settings from *_proxy environment variables. You might
> > specify proxies like this (sh-syntax):
> >
> >        gopher_proxy=http://proxy.my.place/
> >        wais_proxy= http://proxy.my.place/
> >        no_proxy="localhost,my.domain"
> >        export gopher_proxy wais_proxy no_proxy
> >
> >      csh or tcsh users should use the setenv command to define these
> > environment variables.
> >
> > On systems with case insensitive environment variables there exists a
> > name clash between the CGI environment variables and the HTTP_PROXY
> > environment variable normally picked up by env_proxy(). Because of
> > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> > environment variable can be used instead."
> > --------------------------------------
> >
> > chris
> >
>
>

From cjfields at uiuc.edu  Wed Dec 20 13:28:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 12:28:09 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>
Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine>


> First, I got a $http_proxy env. variable automatically 
> defined by the BioPerl installation (I don't define and 
> export it in my .bash_profile).
> So when I'm logging in,             $http_proxy=http://ip_adress:port/

BioPerl can't permanently set any env. variables out of the box since that
would mean modifying your local .bash_profile or the system profile.  If
you're a user on a system where you're not the sysadmin, then it's more
likely the sysadmin has set up user accounts with an already-defined
$http_proxy variable in the system .bash_profile (which is passed on to all
users).  

The problem I can see (going by what you have above) is there is no
username/password defined, only the address (IP:Port).  I am assuming LWP is
expecting some form of authentication when a proxy is env. defined w/o
username/password included.  If so, you'll have to supply those yourself,
either by redefining $http_proxy to include it in your local .bash_profile,

export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/'

by using $agent->proxy() for including all proxy information, or by using
$agent->authentication() so that a proxy can authorize any outgoing/incoming
requests.  The first may be preferrable if you are able to do so since you
wouldn't have to authenticate every agent.

Note that this would also explain why you had an LWP problem with an
undefined array ref: the LWP agent is likely expecting some form of
authentication, probably in the form [username, password], if a proxy env.
variable is found.

> Next step :  two solutions :
> 1) defining an $no_proxy env.variable in my .bash_profile.
> It can be set to 'whatever'.
> 
> 2) If I do not define '$no_proxy'; to make it work, I must call the
> no_proxy() method on each Bio::DB::EUtilities object I create 
> before I can call the get_response() method on it.
> 
> (The bug is in the 'get_response' call)

If you mean when the request is calling proxy_authorization_basic(), that's
not a bug.  If we didn't authorize then it likely wouldn't work for properly
set up proxies (Torsten's worked).  Note that it's supposed to be passing a
username/password from $self->authentication().  

The fact that you can set $no_proxy to anything suggests there is no proxy
in place.  
 
> And finally without 1) or 2) it doesn't work.
> 
> Tony

We can't guarantee that defining no_proxy will always work on your system,
either.  It's possible a proxy was added systemwide but a firewall hasn't
been put in place yet; once it goes up and all requests need to be
authorized, then you'll run into problems again.  Conversely, maybe this was
defined at some point systemwide in the .bash_profile but wasn't removed.
The only one who would know is the sysadmin.

If you aren't the sysadmin, you can contact them to find out about how to
properly set up your proxy, or whether it is even necessary (maybe they
neglected to remove the proxy definition from the system .bash_profile).
Who knows?  

chris


From bix at sendu.me.uk  Wed Dec 20 16:03:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 21:03:03 +0000
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
References: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <4589A507.60106@sendu.me.uk>

Chris Fields wrote:
>> First, I got a $http_proxy env. variable automatically 
>> defined by the BioPerl installation (I don't define and 
>> export it in my .bash_profile).
>> So when I'm logging in,             $http_proxy=http://ip_adress:port/
> 
> BioPerl can't permanently set any env. variables out of the box since

True, and it doesn't try to set one temporarily either.

To clarify some of the other points Chris made, the proxy variable 
certainly doesn't need username and password to be defined (from LWPs 
point of view), since not all proxies authenticate. Of course accesses 
won't work if authentication is actually required and these aren't set.

There's no reason that no_proxy should have to be set. It is used to say 
what domains shouldn't be proxied. Either this is a real LWP bug, or 
somehow EUtilities or one of its bases is doing something wrong. It 
should be investigated...

It would be very informative if Anthony could log in when he hasn't done 
anything to his environment variables (and so where the original problem 
manifests) and give us the results of:

perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }'


From avilella at gmail.com  Wed Dec 20 09:07:17 2006
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 20 Dec 2006 14:07:17 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com>

Try something like:

my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log');
my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params);

it works for me with muscle 3.6. The log only gives me a start,
commandstring and end. I dunno if that is what muscle is supposed to
spit out.

    Albert.

On 12/19/06, Shrinivasrao P. Mane <smane at vbi.vt.edu> wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
>
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
>
> I used the following in perl script
>
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
>
> The program runs and produces the result file but it doesn't create a
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?
> Thanks
> Mane
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Wed Dec 20 17:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 16:46:35 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <4589A507.60106@sendu.me.uk>
Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine>


> Chris Fields wrote:
> >> First, I got a $http_proxy env. variable automatically 
> defined by the 
> >> BioPerl installation (I don't define and export it in my 
> >> .bash_profile).
> >> So when I'm logging in,             
> $http_proxy=http://ip_adress:port/
> > 
> > BioPerl can't permanently set any env. variables out of the 
> box since
> 
> True, and it doesn't try to set one temporarily either.
> 
> To clarify some of the other points Chris made, the proxy 
> variable certainly doesn't need username and password to be 
> defined (from LWPs point of view), since not all proxies 
> authenticate. Of course accesses won't work if authentication 
> is actually required and these aren't set.
>
> There's no reason that no_proxy should have to be set. It is 
> used to say what domains shouldn't be proxied. Either this is 
> a real LWP bug, or somehow EUtilities or one of its bases is 
> doing something wrong. It should be investigated...

Actually, after some investigation I repeated the error and committed a fix.


If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error:

Can't use an undefined value as an ARRAY reference at
C:/Perl/lib/LWP/UserAgent.pm line 787.

It's EUtilities-specific as other WebAgents that have proxy settings do not
have the same problem, though I haven't checked any WebAgent-based classes.
I think this may also partly be an LWP bug as setting env_proxy to
TRUE/FALSE doesn't seem to have an effect, but instantiating with it
(env_proxy => 1) in the constructor fixes the problem.  Anthony, I have
committed a fix to CVS to GenericWebDBI and EUtilities.  Could you try it
out?

-chris


From cjfields at uiuc.edu  Wed Dec 20 18:19:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 17:19:59 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine>

> > First, I got a $http_proxy env. variable automatically 
> defined by the 
> > BioPerl installation (I don't define and export it in my 
> > .bash_profile).
> > So when I'm logging in,             
> $http_proxy=http://ip_adress:port/

Anthony,

Sorry about the prior long-winded response.  I managed to reproduce the
error about five minutes after I responded and managed to trace the problem
back to GenericWebDBI.  The issue seems to be with the LWP::UserAgent
env_proxy method not setting correctly post-instantiation; setting to 0 or 1
doesn't seem to do anything.  If I add it to the list of args for chained
instantiation in the constructor:

    my $self = $class->SUPER::new(@args, env_proxy => 1);

it suddenly works like a charm.  Hard to know why it's being fussy...

I'm going to try reproducing this on a few platforms and check to see if it
has been reported as an LWP bug.  I have also committed a fix to CVS if you
want to test it out.

Chris


From jnewcomer at jhu.edu  Wed Dec 20 20:56:10 2006
From: jnewcomer at jhu.edu (Joe Newcomer)
Date: Wed, 20 Dec 2006 20:56:10 -0500
Subject: [Bioperl-l]  a stupid question
Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu>

Hello Paul Leo,
I am with Johns Hopkins University Advanced Academic Programs.  I am trying
to contact a student named Paul Leo who has registered for Protein
Bioinformatics.  If this is you please email me.  I would like to send you
information about the spring course.

Respectfully, 
Joe Newcomer  (410) 516-5047
Online Education


From anhthu.tieu at gsf.de  Thu Dec 21 05:10:47 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:10:47 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5DA7.1010802@gsf.de>

Hi,

 I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu

From anhthu.tieu at gsf.de  Thu Dec 21 05:12:36 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:12:36 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5E14.8060409@gsf.de>

Hi,

I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu

From somil.sharma1 at gmail.com  Thu Dec 21 01:22:24 2006
From: somil.sharma1 at gmail.com (Somil Sharma)
Date: Thu, 21 Dec 2006 14:22:24 +0800
Subject: [Bioperl-l] problem
Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>

hello

*i  run this program*

*#!/use/bin/perl*

*use Bio::DB::GenBank;*

*$gb = new Bio::DB::GenBank;
$seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1;
*

*and got this error on cmd line--*

---------- *EXCEPTION  -------------
MSG: WebDBSeqI Request Error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
Content-Type: text/plain
Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
Client-Warning: Internal response*

*500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*

*STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
STACK Bio::DB::WebDBSeqI::get_Stream_by_id
C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
STACK toplevel C:\Perl\a2.pl:5*

plz see if u can help me out.

my ppm is also not able to install Bioperl so i did that also manually.

waiting for ur reply

From granjeau at tagc.univ-mrs.fr  Thu Dec 21 06:14:25 2006
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Thu, 21 Dec 2006 12:14:25 +0100
Subject: [Bioperl-l] BioFetch: Adding databases
Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr>

Hello!

I needed to query the Unisave database at EBI. Up to date, the only way 
to access it is the dbfetch web service 
(http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined 
in the BioFetch package 
(http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote 
these few lines to make it work, but I don't think it fits a good 
programming practice. May be it makes sense to defined a method to add 
databases to FORMATMAP, in order to follow the dbfetch service evolutions.

Cheers,
--Samuel

use Bio::DB::BioFetch;
$Bio::DB::BioFetch::FORMATMAP{unisave} = {
default   => 'swiss',
swissprot => 'swiss',
fasta     => 'fasta',
namespace => 'unisave',
};
my $bf = new Bio::DB::BioFetch(-db=>'unisave');
my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); 

print $seq->display_id();
print $seq->seq();


From cain at cshl.edu  Thu Dec 21 08:56:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 08:56:21 -0500
Subject: [Bioperl-l] problem
In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
Message-ID: <1166709381.3739.47.camel@localhost.localdomain>

Hello,

It looks to me like you have a networking problem that doesn't have
anything to do with BioPerl.  When I run your script, I get:

Bio::Seq::RichSeq=HASH(0x97013e0)

Fairly quickly, too.  Can you get to http://eutils.ncbi.nlm.nih.gov/ in
a browser without proxy settings?

As an aside, you probably don't really want the HASH stuff above, so I
modified your script to look like this, with warnings and strict to make
future debugging easier:

#!/use/bin/perl -w
use strict;

use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank;
my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1->seq;


Scott


On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote:
> hello
> 
> *i  run this program*
> 
> *#!/use/bin/perl*
> 
> *use Bio::DB::GenBank;*
> 
> *$gb = new Bio::DB::GenBank;
> $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
> print $seq1;
> *
> 
> *and got this error on cmd line--*
> 
> ---------- *EXCEPTION  -------------
> MSG: WebDBSeqI Request Error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
> Content-Type: text/plain
> Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
> Client-Warning: Internal response*
> 
> *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*
> 
> *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
> STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
> STACK Bio::DB::WebDBSeqI::get_Stream_by_id
> C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
> STACK toplevel C:\Perl\a2.pl:5*
> 
> plz see if u can help me out.
> 
> my ppm is also not able to install Bioperl so i did that also manually.
> 
> waiting for ur reply
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f63031e2/attachment.bin 

From cjfields at uiuc.edu  Thu Dec 21 09:28:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 21 Dec 2006 08:28:07 -0600
Subject: [Bioperl-l] BioFetch: Adding databases
In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr>
References: <458A6C91.7090000@tagc.univ-mrs.fr>
Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu>

I've added this to the BioFetch FORMATMAP as 'unisave' and committed  
to CVS.  Thanks!

chris

On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> I needed to query the Unisave database at EBI. Up to date, the only  
> way
> to access it is the dbfetch web service
> (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet  
> defined
> in the BioFetch package
> (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote
> these few lines to make it work, but I don't think it fits a good
> programming practice. May be it makes sense to defined a method to add
> databases to FORMATMAP, in order to follow the dbfetch service  
> evolutions.
>
> Cheers,
> --Samuel
>
> use Bio::DB::BioFetch;
> $Bio::DB::BioFetch::FORMATMAP{unisave} = {
> default   => 'swiss',
> swissprot => 'swiss',
> fasta     => 'fasta',
> namespace => 'unisave',
> };
> my $bf = new Bio::DB::BioFetch(-db=>'unisave');
> my $seq = $bf->get_Seq_by_id('LAM1_MOUSE');
>
> print $seq->display_id();
> print $seq->seq();
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From anhthu.tieu at gsf.de  Thu Dec 21 09:31:45 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 15:31:45 +0100
Subject: [Bioperl-l] multiple glyph elements in one track
Message-ID: <458A9AD1.50907@gsf.de>

Hello,

 I use bioperl 1.5.2. Does anyone know how I could create two seperate 
glyph elements on the same track with the Bio::Graphics::Panel module? 
My aim is to have two (e.g. two different) clickable imagemap elements 
on the same track. Until now I can merely create two glyph elements 
(transcript2 or generic options) per track with only one imagemap 
element (e.g. the same imagemap element is used for the entire track as 
the entire (=both elements) glyph's coordinates are returned to the 
image_and_map function as one set of coordinate).

Thank you for your help.

Best regards,

Anh Thu

From cain at cshl.edu  Thu Dec 21 09:47:32 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 09:47:32 -0500
Subject: [Bioperl-l] multiple glyph elements in one track
In-Reply-To: <458A9AD1.50907@gsf.de>
References: <458A9AD1.50907@gsf.de>
Message-ID: <1166712453.3739.53.camel@localhost.localdomain>

Hello Anh Thu,

You can provide a callback for the glyph argument that returns different
glyphs depending on what you want to do (ie, how you've coded your
callback).  This example is from the perldoc for Bio::Graphics::Panel:

        $panel->add_track(\@exons,
                          -glyph => sub { my $feature = shift;
                                          $feature->source_tag eq ?curated?                                                    
                                                    ? ?ellipse? : ?generic?; }
                         );

Scott

 
On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote:
> Hello,
> 
>  I use bioperl 1.5.2. Does anyone know how I could create two seperate 
> glyph elements on the same track with the Bio::Graphics::Panel module? 
> My aim is to have two (e.g. two different) clickable imagemap elements 
> on the same track. Until now I can merely create two glyph elements 
> (transcript2 or generic options) per track with only one imagemap 
> element (e.g. the same imagemap element is used for the entire track as 
> the entire (=both elements) glyph's coordinates are returned to the 
> image_and_map function as one set of coordinate).
> 
> Thank you for your help.
> 
> Best regards,
> 
> Anh Thu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/9ec29c3e/attachment.bin 

From cain.cshl at gmail.com  Thu Dec 21 15:03:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 21 Dec 2006 15:03:48 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz>
	<1166621113.3739.11.camel@localhost.localdomain>
	<1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz>
	<1166643051.3739.28.camel@localhost.localdomain>
	<1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
Message-ID: <1166731428.3739.71.camel@localhost.localdomain>

Hi Stephan,

About your bioperl mail: did you cancel it, or did it just disappear?
If the latter, I might have accidentally deleted it, sorry :-/

So 'GBrowse is running' means that you can see the sample yeast chr1
database, browse around, etc, right?  I still don't know what is up with
the warning but my guess is that everything is working there.

As for your question about writing a callback, the reason it's not
working is that the attributes method returns a list (typically but not
always with only one element), so what you are really doing in your test
is this "number of elements in the list > 1200", which is almost always
going to be false.  You should change it to this:

  my $feature = shift;
  my ($score) = $feature->attributes('score');
  if ($score > 1200) {
  ...etc...

Finally, if you really want to test that you are using the correct
bioperl, you can put this simple cgi in your cgi-bin directory as
test_biographics.pl, set it as world executable and go to
http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and
warnings even when the script is only 10 lines long :-)  :

#!/usr/bin/perl
use strict;
use warnings;
use Bio::Graphics::Panel;
use CGI qw/:standard/;

print header(),
      start_html,
      p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version),
      p("It should be 1.654 for BioPerl 1.5.2"),
      end_html;

Scott


On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote:
> Hi Scott,
> 
> responded to group but did get through.
> So I reply back to you.
> 
> I installed Class-Base-0.03 using CPAN.
> 
> Reinstalling GBrowse gives me still a warning like:
> Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021.
> Writing Makefile for Bio::Graphocs::Browser::CAlign
> Writing Makefile for Generic-Genome-Browser.
> 
> GBrowse is running but I cannot access attributes and/or the score column
> of .gff files. Is this related to the warning?
> 
> To get an attribute I use
> 
> my $feature = shift;
>                 if ($feature->attributes('score') > 1200) {
>                   return 'blue';
>                 } else {
>                   return 'pink';
>                 }
> But I retrieve not data using $feature->
> 
> Can I somehaow verify what bioperl version GBrowse is using?
> 
> Stephan,
> 
> 
> 
> Quoting Scott Cain <cain.cshl at gmail.com>:
> 
> > Stephan,
> >
> > Yes, it is in cpan:
> >
> > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm
> >
> > The cpan shell should be able to install it.
> >
> > Whether or not that works, please respond to the mailing list so that
> > the rest of the conversation can be archived.
> >
> > Scott
> >
> >
> > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote:
> > > Hi Scott,
> > >
> > > No I didn't.
> > > I had a look and couldn't find it.
> > > It is not part of CPAN?
> > >
> > > Stephan
> > >
> > >
> > > Quoting Scott Cain <cain.cshl at gmail.com>:
> > >
> > > > Stephan,
> > > >
> > > > Did you install Class::Base?  It was inadvertantly left out the
> > > > install
> > > > document, but is required.
> > > >
> > > > Scott
> > > >
> > > >
> > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote:
> > > > > Hi all,
> > > > >
> > > > > I did sudo ./Build install --uninst 1 and got the error
> > > > > * ERROR: Confiduration was initially created with MOdule::Build
> > > > version
> > > > > '0.2805', but we are now using '0.2806'. ...
> > > > >
> > > > > So I ran perl Build.PL and got the message
> > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'.
> > > > >
> > > > > I did run sudo ./Build install --uninst 1 again.
> > > > > Seems to be fine with no error messages.
> > > > >
> > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in
> > > > >
> > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have
> > 1.0050021.
> > > > > Warning: prerequisite Class::Base 0 not found.
> > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign
> > > > > Writing Makefile for Generic-Genome-Browser
> > > > >
> > > > > GBrowse is running but I have really troubles with aggregators
> > trying
> > > > to
> > > > > use xyplot. It does not plot anything. So I thought the bioperl
> > could
> > > > be
> > > > > the problem.
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > Quoting Scott Cain <cain at cshl.edu>:
> > > > >
> > > > > > I really don't think the BioPerl version detection is wrong.
> > I
> > > > > > actually
> > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I
> > check
> > > > > > Bio::Graphics::Panel->api_version.  When it doesn't find the
> > > > correct
> > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not
> > installed.
> > > >  I
> > > > > > have seen this happen when more than one BioPerl instance is
> > > > installed
> > > > > > and `perl Makefile.PL` finds the wrong one first.  My
> > suggestion is
> > > > to
> > > > > > try reinstalling BioPerl and providing the --uninst 1 argument
> > to
> > > > > > remove
> > > > > > older versions of BioPerl:
> > > > > >
> > > > > >   sudo ./Build install --uninst 1
> > > > > >
> > > > > > Scott
> > > > > >
> > > > > >
> > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> > > > > > > Stephan Roessner wrote:
> > > > > > > > Dear support team,
> > > > > > > >
> > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be
> > able
> > > > to
> > > > > > use
> > > > > > > > gbrowse.
> > > > > > > > The installation seems to work (except of the test
> > failures)
> > > > but
> > > > > > the
> > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is
> > > > > > installed, but
> > > > > > > > of course it requires 1.52.
> > > > > > > >
> > > > > > > > Is there a chance to find out what went wrong?
> > > > > > >
> > > > > > > Nothing went wrong with the Bioperl installation (well,
> > expect
> > > > there
> > > > > > > shouldn't have been any test failures - can you post those
> > > > please?).
> > > > > > > gbrowse simply defined its Bioperl requirement incorrectly.
> > If
> > > > you
> > > > > > tell
> > > > > > > me exactly where you downloaded gbrowse from and how you
> > went
> > > > about
> > > > > > > installing it, and provide the exact, complete error message
> > you
> > > > got
> > > > > > > from it, I might be able help the authors fix the problem.
> > > > > > >
> > > > > > > Or I'm pretty sure they can figure it our for themselves :)
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > --
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > Scott Cain, Ph. D.
> > > > > > cain at cshl.edu
> > > > > > GMOD Coordinator (http://www.gmod.org/)
> > > > > > 216-392-3087
> > > > > > Cold Spring Harbor Laboratory
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > --
> > > >
> > ------------------------------------------------------------------------
> > > > Scott Cain, Ph. D.
> > > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > >
> > >
> > >
> > >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f8621965/attachment-0001.bin 

From rvosa at sfu.ca  Sat Dec 23 17:17:37 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sat, 23 Dec 2006 14:17:37 -0800
Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <458DAB01.6080200@sfu.ca>

The replies I've received so far indicate I should look into Bio::Taxon. 
I will probably come back with further questions/discussions as to how 
to link and cross reference taxa, sequences and  nodes, but for now I 
should first look at the Bio::Taxon api (and unpack my other Christmas 
gifts). Thank you for all comments and suggestions.

Happy holidays!

Rutger


Rutger Vos wrote:
> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more loosely
> an OTU). 
>
> The object would primarily function as a way to relate nodes in trees to
> sequences in an alignment (a foreign key that both nodes and sequences refer
> to), and secondarily as the keeper of the canonical name of the OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).
>
> I was thinking that a (possibly expanded) Bio::Species might work if there
> was some sensible way of appending references to node and sequence objects
> to it (or otherwise associate them with each other), but I am writing *to
> solicit any and all suggestions*. I am looking for something similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Rutger A. Vos
 Postdoctoral research fellow
 University of British Columbia
 Personal site: http://www.sfu.ca/~rvosa
        CIPRES: http://www.phylo.org
    Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From paul.boutros at utoronto.ca  Sat Dec 23 22:36:59 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:36:59 -0500
Subject: [Bioperl-l] Bio::Graphics::Glyph::dna
Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca>

Hi,

I've been trying to get the dna glyph working and have had some  
problems.  I'm using a fasta file, and am having some problems.  This  
is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP.  I'm  
starting with a FASTA file, so I've tried:
$panel->add_track(
	$seq,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

where $seq is a Bio::Seq object

and I've tried it using a GFF $segment:
my $db = Bio::DB::GFF->new(
          -adaptor=>    'berkeleydb',
          -create =>    1,
          -dsn    =>    'temp'
          );

$db->load_sequence_string(
           $seq->primary_id(),
           $seq->seq()
           );

my $segment = Bio::DB::GFF::Segment->new(
           $db,
           $seq->primary_id(),
           $seq->primary)_id(),
           1,
           $seq->length()
           );

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);


From paul.boutros at utoronto.ca  Sat Dec 23 22:46:27 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:46:27 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having  
some problems.  I'm starting with a fasta file, and I am running perl  
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2

If I try simply using a Bio::Seq object like this:
$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
Can't locate object method "start" via package "Bio::Seq" at  
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this:
my $db = Bio::DB::GFF->new(
	-adaptor  => 'berkeleydb',
	-create   => 1,
	-dsn      => '/usr/local/share/gff/dmel'
	);

$db->initialize(1);

$db->load_sequence_string(
	$seq->primary_id(),
	$seq->seq()
	);

my $segment = Bio::DB::GFF::Segment->new(
	$db,
	$seq->primary_id(),
	$seq->primary_id(),
	1,
	$seq->length()
	);

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not  
implemented b
y package Bio::DB::GFF::Segment.
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented  
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures  
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
STACK: Bio::Graphics::Glyph::subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Panel::_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated!
Paul


From lstein at cshl.edu  Sun Dec 24 12:23:18 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun, 24 Dec 2006 12:23:18 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>

Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g.
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:
>
> Hello,
>
> I'm trying to get the dna glyph of Bio::Graphics to work and am having
> some problems.  I'm starting with a fasta file, and I am running perl
> 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2
>
> If I try simply using a Bio::Seq object like this:
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> Can't locate object method "start" via package "Bio::Seq" at
> C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.
>
>
> And if I try creating a Bio::DB::GFFSegment object like this:
> my $db = Bio::DB::GFF->new(
>         -adaptor  => 'berkeleydb',
>         -create   => 1,
>         -dsn      => '/usr/local/share/gff/dmel'
>         );
>
> $db->initialize(1);
>
> $db->load_sequence_string(
>         $seq->primary_id(),
>         $seq->seq()
>         );
>
> my $segment = Bio::DB::GFF::Segment->new(
>         $db,
>         $seq->primary_id(),
>         $seq->primary_id(),
>         1,
>         $seq->length()
>         );
>
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> ------------- EXCEPTION: Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
> implemented b
> y package Bio::DB::GFF::Segment.
> This is not your fault - author of Bio::DB::GFF::Segment should be blamed!
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Root::RootI::throw_not_implemented
> C:/Perl/site/lib/Bio/Root/RootI.pm:522
> STACK: Bio::FeatureHolderI::get_SeqFeatures
> C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
> STACK: Bio::Graphics::Glyph::_subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
> STACK: Bio::Graphics::Glyph::subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Panel::_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
> STACK: Bio::Graphics::Panel::_do_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
> STACK: Bio::Graphics::Panel::add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
> STACK: create_figure.pl:147
> ----------------------------------------------------------------
>
> I'm really unsure what to try next, any suggestions much appreciated!
> Paul
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From tgenahmet at gmail.com  Wed Dec 27 16:38:43 2006
From: tgenahmet at gmail.com (Ahmet Kurdoglu)
Date: Wed, 27 Dec 2006 14:38:43 -0700
Subject: [Bioperl-l] get mRNA details for a gene
Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com>

Hi,

This is my first message to the list. I hope I get it right. Here is what
I'm trying to accomplish:

Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file.

Using the web-interface I can search with this query:
DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search
'gene' database)
and get the GenBank file by clicking on NC_000001.9 and I can see the
details for its two mRNAs. (I eventually need to get exon locations for both
of its transcripts)

However trying to do this in Perl has proved to be very difficult for me.
I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and
get_Stream_by_query. Before I explain in detail what I did I'd like to hear
your ideas on how to accomplish this.

Thank you.

From sdavis2 at mail.nih.gov  Thu Dec 28 16:57:03 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 28 Dec 2006 16:57:03 -0500
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
Message-ID: <45943DAF.70100@mail.nih.gov>

Michael Muratet US-Huntsville wrote:
> Sean
>
> Thanks. I did consider the bioconductor package and downloaded your
> write-up after it was recommended by the GEO folks. I've looked at R a
> few times, but I never got proficient at it. I'm a lot better with perl.
>
> I've been looking at MINiML, too. It looked like it might be easier to
> parse the SOFT file since the data is in-line with the attributes and
> I'd have to use a SAX parser (not enough memory for DOM) for MINiML.
>
> NCBI must have parsers to get the data into their databases. Do you know
> what they use?
>   
Michael,

You might want to look more specifically at the MINiML format specs.  
The data tables are stored as separate tab-delimited files with an 
external reference in the XML, so DOM parsing is possible with just a 
few kB of memory.  Of course, to read in all of the data into memory at 
once will take a large amount of memory for some datasets.  If you are 
going to load into a database, I would suggest reading the MINiML using 
DOM and then stepping through the data files one at a time, loading as 
you go.

As for their parsers, I'm not sure what language they use, but writing a 
parser for either SOFT or MINiML isn't at all difficult.  GEO uses a 
very simplified MAGE schema. 

As for R vs. perl, if you are planning on doing analyses of microarray 
data, I would highly suggest looking again at the R/bioconductor 
project.  It will save you reinventing many wheels, such as getting 
annotation like gene ontology and pathways, doing stats, plotting, 
maintaining MIAME-compliant data structures, converting from multiple 
microarray formats, etc. 

Sean

From allenday at ucla.edu  Thu Dec 28 18:21:07 2006
From: allenday at ucla.edu (Allen Day)
Date: Thu, 28 Dec 2006 15:21:07 -0800
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <45943DAF.70100@mail.nih.gov>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
	<45943DAF.70100@mail.nih.gov>
Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com>

> As for R vs. perl, if you are planning on doing analyses of microarray
> data, I would highly suggest looking again at the R/bioconductor
> project.  It will save you reinventing many wheels, such as getting
> annotation like gene ontology and pathways, doing stats, plotting,
> maintaining MIAME-compliant data structures, converting from multiple
> microarray formats, etc.

I'll second this statement WRT the data analysis.  I'm doing all my
analysis in R, Perl is just not good at dealing with large matrices or
plotting.  OTOH, I have also found that R is particularly weak when it
comes to pipelining data and system interfacing.  If your goal is to
do ETL to a local database you're better off using Perl.

I've found they're both about equally clunky for dealing with the
experimental metadata, with a slight preference for Perl.  That's more
a reflection of the baroque MAGE model though than the programming
languages themselves.

-Allen

>
> Sean
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From Paul.Boutros at utoronto.ca  Sat Dec 30 02:43:32 2006
From: Paul.Boutros at utoronto.ca (Paul Boutros)
Date: Sat, 30 Dec 2006 02:43:32 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>
Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main>

Hi Lincoln,

Thanks, that worked like a charm!  Can I suggest adding the
example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna?
Here's a patch against the 1.5.2 version of dna.pm to do that.

Paul

 
266c266,274

< in response to the dna() method.

---

> in response to the dna() method.  For example, you can use a

> Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq

> like this:

>    my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 );

>    my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800
);

>    $feature->attach_seq($dna);

>    $panel->add_track( $feature, -glyph => 'dna' );

> 

> A Bio::Graphics::Feature object may also be used.

 
  _____  

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of
Lincoln Stein
Sent: Sunday, December 24, 2006 12:23 PM
To: Paul.Boutros at utoronto.ca
Cc: BioPerl Mailing List
Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?

 
Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g. 
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having
some problems.  I'm starting with a fasta file, and I am running perl
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 

If I try simply using a Bio::Seq object like this:
$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
Can't locate object method "start" via package "Bio::Seq" at
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this: 
my $db = Bio::DB::GFF->new(
        -adaptor  => 'berkeleydb',
        -create   => 1,
        -dsn      => '/usr/local/share/gff/dmel'
        );

$db->initialize(1);

$db->load_sequence_string(
        $seq->primary_id(),
        $seq->seq()
        );

my $segment = Bio::DB::GFF::Segment->new(
        $db,
        $seq->primary_id(),
        $seq->primary_id(), 
        1,
        $seq->length()
        );

$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
implemented b
y package Bio::DB::GFF::Segment. 
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented 
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 
STACK: Bio::Graphics::Glyph::subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 
STACK: Bio::Graphics::Panel::_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track 
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated! 
Paul


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice) 
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From er at xs4all.nl  Sat Dec 30 19:05:16 2006
From: er at xs4all.nl (Erik)
Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET)
Subject: [Bioperl-l] acquiring a local refseq + index
Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>

Hi all,

I downloaded the refseq files (.gbff) and want to index the lot with
Bio::DB::Flat.

It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.

I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).

Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
 Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?

Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.

Thanks for your help,

Erikjan


-------------
use Bio::DB::Flat;

my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
   -directory  => $refseq_dir,
   -dbname     => 'refseq',
   -format     => 'genbank',
   -index      => 'bdb',
   -write_flag => 1,
);
my @files = getfiles($refseq_dir);
for my $f (@files) {
        db->build_index($f);
}


From hlapp at gmx.net  Sat Dec 30 20:48:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Dec 2006 20:48:33 -0500
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
Message-ID: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>

Can you send examples and the resulting error messages? Also, I'm  
assuming you running the 1.5.2 release of Bioperl; if not that's what  
I would try first.

	-hilmar

On Dec 30, 2006, at 7:05 PM, Erik wrote:

> Hi all,
>
> I downloaded the refseq files (.gbff) and want to index the lot with
> Bio::DB::Flat.
>
> It turns out that there are many cases where the SOURCE and  
> ORGANISM lines
> are messed up, sometimes to a degree where the indexing fails on a
> Bio::SeqIO::genbank error.
>
> I'd like to change Bio::SeqIO::genbank to let this parsing go at  
> least so
> far as to make the indexing of the refseq files possible, and  
> hopefully
> improving the taxonomic output ($seq->species->binomial is often  
> mutilated
> at the moment).
>
> Is it still worthwhile to change parsing modules like  
> Bio::SeqIO::genbank?
>  Is anyone already working on a rewrite? Because if this is the  
> case I may
> be better off writing my own indexing scheme?
>
> Below is (outline of) my indexing program, which uses  
> Bio::DB::Flat::DBD.
> If anyone knows of a better way to get a locally searchable refseq  
> flat
> file index, I would be very interested.
>
> Thanks for your help,
>
> Erikjan
>
>
> -------------
> use Bio::DB::Flat;
>
> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
> my $db=Bio::DB::Flat->new(
>    -directory  => $refseq_dir,
>    -dbname     => 'refseq',
>    -format     => 'genbank',
>    -index      => 'bdb',
>    -write_flag => 1,
> );
> my @files = getfiles($refseq_dir);
> for my $f (@files) {
>         db->build_index($f);
> }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Dec 30 21:33:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Dec 2006 20:33:23 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>

Agree with Hilmar, in that we need examples.  If you are referring to  
your submitted bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=2167

we could add this in as long as it passes (I'll try giving it a  
workout with my local bacterial seqs tonight or tomorrow).  However,  
in the not-too-distant future your patch would likely be rendered  
obsolete, as any parsing in Bio::SeqIO modules pertaining to  
Bio::Species-related matters will be deprecated in favor of simple  
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has  
optional db lookups using NCBI Taxonomy).  Bio::Species and anything  
related to it are considered marked for deprecation.  Fair warning...

chris

On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:

> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> 	-hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>>  Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>>    -directory  => $refseq_dir,
>>    -dbname     => 'refseq',
>>    -format     => 'genbank',
>>    -index      => 'bdb',
>>    -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>>         db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 31 14:36:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 31 Dec 2006 13:36:47 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
	<76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu>

As a followup, I have committed the fix Erik had in Bugzilla.  I  
don't know if this helps with the below issue Erik describes (they  
sound unrelated).

chris

On Dec 30, 2006, at 8:33 PM, Chris Fields wrote:

> Agree with Hilmar, in that we need examples.  If you are referring to
> your submitted bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow).  However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
> related to it are considered marked for deprecation.  Fair warning...
>
> chris
>
> On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
>
>> Can you send examples and the resulting error messages? Also, I'm
>> assuming you running the 1.5.2 release of Bioperl; if not that's what
>> I would try first.
>>
>> 	-hilmar
>>
>> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>>
>>> Hi all,
>>>
>>> I downloaded the refseq files (.gbff) and want to index the lot with
>>> Bio::DB::Flat.
>>>
>>> It turns out that there are many cases where the SOURCE and
>>> ORGANISM lines
>>> are messed up, sometimes to a degree where the indexing fails on a
>>> Bio::SeqIO::genbank error.
>>>
>>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>>> least so
>>> far as to make the indexing of the refseq files possible, and
>>> hopefully
>>> improving the taxonomic output ($seq->species->binomial is often
>>> mutilated
>>> at the moment).
>>>
>>> Is it still worthwhile to change parsing modules like
>>> Bio::SeqIO::genbank?
>>>  Is anyone already working on a rewrite? Because if this is the
>>> case I may
>>> be better off writing my own indexing scheme?
>>>
>>> Below is (outline of) my indexing program, which uses
>>> Bio::DB::Flat::DBD.
>>> If anyone knows of a better way to get a locally searchable refseq
>>> flat
>>> file index, I would be very interested.
>>>
>>> Thanks for your help,
>>>
>>> Erikjan
>>>
>>>
>>> -------------
>>> use Bio::DB::Flat;
>>>
>>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>>> my $db=Bio::DB::Flat->new(
>>>    -directory  => $refseq_dir,
>>>    -dbname     => 'refseq',
>>>    -format     => 'genbank',
>>>    -index      => 'bdb',
>>>    -write_flag => 1,
>>> );
>>> my @files = getfiles($refseq_dir);
>>> for my $f (@files) {
>>>         db->build_index($f);
>>> }
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 02:47:03 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 07:47:03 +0000
Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm?
In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com>
References: <519167.29410.qm@web50804.mail.yahoo.com>
Message-ID: <456FDDF7.1080403@sheffield.ac.uk>

Caitlin wrote:
> Hi all.
>
> I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references
> to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version?
> The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages
> among those deemed upgradable.
>
> Thanks,
>
> ~Katie
>
>
>   

Hi Katie,

Currently there is not an RC5 PPM package available - we are hoping to
have the official 1.5.2 release out pretty soon and there will
definitely be a PPM package for that! Are you experiencing any problems
with your current version of bioperl? If not, there is no need to worry,
once we've released an updated PPM package your PPM GUI should then be
able to see it as an upgrade - hopefully! :o)

Sendu, I know you were working on automatically generating PPM packages
- what is the current situation with regards to this?

Nath


---
avast! Antivirus: Inbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:46:58
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:47:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:00:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:00:18 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <456F27E9.70205@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>
	<456F27E9.70205@york.ac.uk>
Message-ID: <456FEF22.4090004@sendu.me.uk>

Samantha Thompson wrote:

You missed a step...


> use strict;
> use Bio::Perl;
> use Bio::Seq;
> use Bio::SeqIO;
> 
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> 
> #seq bit
> 
> #$seq_obj = Bio::Seq->new(-format => 'fasta');
> 
> my $seqio_obj = Bio::SeqIO->new(-file => 
> "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta');
> 
> my $seq_obj = $seqio_obj->next_seq;
> 
> 
> 
> #blast bit
> 
> my $remote_blast = Bio::Tools::Run::RemoteBlast->new (
>          -prog => 'blastp', -db => 'nr', -expect => '1e-15' );
> 
> my $blast_report = $remote_blast->submit_blast($seq_obj);

Go back to the Bptutorial:
http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29

And you'll see that submit_blast doesn't return a SearchIO object.

For a complete working example see the synopsis for RemoteBlast:
http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html


> #new part for SearchIO...
> 
> while( my $result = $blast_report->next_result ) {
>   while( my $hit = $result->next_hit ) {
>    while( my $hsp = $hit->next_hsp ) {
>     if( $hsp->length('total') > 100 ) {
>      if ( $hsp->percent_identity >= 75 ) {
>       print "Hit= ",       $hit->name,
>             ",Length=",     $hsp->length('total'),
>             ",Percent_id=", $hsp->percent_identity, "\n";
>      }
>     }
>    } 
>   }
> }


From bix at sendu.me.uk  Fri Dec  1 04:03:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:03:13 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <456FEFD1.4070704@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: The supplied lineage does not start near 'Photorhabdus luminescens
> subsp. laumondii'

In your uniprot_sprot.dat file there'll be some kind of entry with that 
Photorhabdus species. Can you post that entry (sans sequence if it has 
one) so I can take a look at it? Maybe post a few that cause problems, 
and a few that don't.


From bix at sendu.me.uk  Fri Dec  1 04:19:09 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:19:09 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
Message-ID: <456FF38D.3070508@sendu.me.uk>

Chris Fields wrote:
>> Nathan S. Haigh wrote:
>>> More updates:
>>>
>>> After the failed install I updating Module::Build, and re-ran the 
>>> install, I get:
>>>
>>> -- snip --
>>> Creating new 'Build' script for 'bioperl' version '1.005002005'
>>> Warning: while trying to determine prerequisites for 
>>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of 
>>> Module::Build the following error occurred: 'Failed to re-load 
>>> 'ModuleBuildBiope
>>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: 
>>> _build\lib C:\Perl\site\lib C:\
>>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1.
>>> '
>>>
>>> Falling back to META.yml for prerequisites 'YAML' not installed, 
>>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml'
>>> -- snip --
>> I had that problem fleetingly and it drove me crazy because 
>> later I couldn't reproduce it. Is it reproducible on your end?
> 
> During Module::Build installation I see this:
> 
> ...
> t\metadata........ok
>         8/43 skipped: YAML_support feature is not enabled

You were pointing out the YAML issue? I think I'm less concerned with 
that (solution: install YAML) and much more concerned with why it can't 
reload ModuleBuildBioperl (claiming it isn't in @INC). The module in 
question is in the same dir as the Build script, so it should be found 
automatically.

The only thing I can think of is that CPAN doesn't manage to chdir to 
the directory. Hopefully I'll be able to reproduce this and then I can 
investigate further.


From n.haigh at sheffield.ac.uk  Fri Dec  1 04:26:22 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 09:26:22 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <456FF53E.90907@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>>
>> I know that setting up the PPM is a pain, but I have to say it is 
>> much faster, and all required PPMs are available.  Which makes me 
>> curious: why bother with trying out a CPAN installation process at 
>> this point, especially when you have to use PPM to install some of 
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all 
> platforms, not just Windows. So thanks for trying it out and reporting 
> back. Secondly, the PPM method, like Bundle::BioPerl, is 
> all-or-nothing. The CPAN installation method allows an interactive 
> choice of which optional things to install.
>
> If what you say about DB_File is true, then that's a great shame!
>
>
> So I can do further trouble-shooting of my own, what is the sure-fire 
> way to completely clean-out an ActivePerl install, including any 
> modules you might have installed with PPMs or CPAN?
>
>

In addition, using CPAN allows you to run the test suite easily without 
the need to download it separately and run it after a PPM install.

I don't know of a way to clean out ActivePerl - I use VMWare Workstation 
and have a virtual machine with a fresh install of WinXP and ActivePerl 
5.8.8.819 - maybe someone else has ideas?

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 09:26:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:13:23 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:13:23 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
Message-ID: <456FF233.6040704@sendu.me.uk>

Chris Fields wrote:
> 
> I know that setting up the PPM is a pain, but I have to say it is much 
> faster, and all required PPMs are available.  Which makes me curious: 
> why bother with trying out a CPAN installation process at this point, 
> especially when you have to use PPM to install some of the prereqs 
> properly anyway?

Firstly, problems discovered and resulting fixes will help all 
platforms, not just Windows. So thanks for trying it out and reporting 
back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. 
The CPAN installation method allows an interactive choice of which 
optional things to install.

If what you say about DB_File is true, then that's a great shame!


So I can do further trouble-shooting of my own, what is the sure-fire 
way to completely clean-out an ActivePerl install, including any modules 
you might have installed with PPMs or CPAN?


From cjfields at uiuc.edu  Fri Dec  1 09:08:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:08:55 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>


On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I know that setting up the PPM is a pain, but I have to say it is  
>> much faster, and all required PPMs are available.  Which makes me  
>> curious: why bother with trying out a CPAN installation process at  
>> this point, especially when you have to use PPM to install some of  
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all  
> platforms, not just Windows. So thanks for trying it out and  
> reporting back. Secondly, the PPM method, like Bundle::BioPerl, is  
> all-or-nothing. The CPAN installation method allows an interactive  
> choice of which optional things to install.

Yes, I understand that.  My point is, you are generally forced to use  
PPM anyway due to several modules not installing properly (all the  
'trouble' distributions, like DB_File, are available via PPM).  I can  
see using CPAN as an alternative way of installing Bioperl for a  
distribution, or as the primary method via CVS or manually, but not  
for distributions.  At least not until the kinks are worked out for  
Windows users.

What are the significant issues for a bioperl PPM installation, based  
on the last PPM Nathan set up?  If there is a redirection problem,  
could we just modify the installation docs to address that ('due to  
problem X, you must install the following modules prior to installing  
BioPerl 1.5.2...').

> If what you say about DB_File is true, then that's a great shame!

We need to go through the various prereqs to see which ones need PPM  
vs CPAN.  In general, anything that requires C code compilation (and  
thus needs a recent VC++) will likely be an issue.

> So I can do further trouble-shooting of my own, what is the sure- 
> fire way to completely clean-out an ActivePerl install, including  
> any modules you might have installed with PPMs or CPAN?

Not sure, beyond uninstalling and cleaning out the Perl directory (I  
think you might be able to delete the site/ directory, but I haven't  
tried it).  ActivePerl comes preloaded with a number of non-core  
modules which makes it tricky to uninstall them one-by-one.

chris


From cjfields at uiuc.edu  Fri Dec  1 09:10:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:10:34 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <456FF38D.3070508@sendu.me.uk>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>


On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:

> You were pointing out the YAML issue? I think I'm less concerned  
> with that (solution: install YAML) and much more concerned with why  
> it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The  
> module in question is in the same dir as the Build script, so it  
> should be found automatically.
>
> The only thing I can think of is that CPAN doesn't manage to chdir  
> to the directory. Hopefully I'll be able to reproduce this and then  
> I can investigate further.

My thought was the two were related in some way.  I'm not sure to  
tell the truth.

-chris


From bix at sendu.me.uk  Fri Dec  1 09:17:41 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:17:41 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
	<10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
Message-ID: <45703985.5050203@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I know that setting up the PPM is a pain, but I have to say it is 
>>> much faster, and all required PPMs are available.  Which makes me 
>>> curious: why bother with trying out a CPAN installation process at 
>>> this point, especially when you have to use PPM to install some of 
>>> the prereqs properly anyway?
>>
>> Firstly, problems discovered and resulting fixes will help all 
>> platforms, not just Windows. So thanks for trying it out and reporting 
>> back. Secondly, the PPM method, like Bundle::BioPerl, is 
>> all-or-nothing. The CPAN installation method allows an interactive 
>> choice of which optional things to install.
> 
> Yes, I understand that.  My point is, you are generally forced to use 
> PPM anyway due to several modules not installing properly (all the 
> 'trouble' distributions, like DB_File, are available via PPM).  I can 
> see using CPAN as an alternative way of installing Bioperl for a 
> distribution, or as the primary method via CVS or manually, but not for 
> distributions.  At least not until the kinks are worked out for Windows 
> users.

CPAN isn't being suggested as the primary or preferred installation 
method for Windows. That will still be PPM. I'm mentioning CPAN / manual 
installation in the Windows INSTALL docs for the benefit of anyone who 
wants a simple install and test environment when checking out from CVS.


> What are the significant issues for a bioperl PPM installation

None that I'm aware of - I just need to find the time to start looking 
into generating an appropriate PPD. Hopefully Nathan's wiki page on the 
subject will be all I need.


From bix at sendu.me.uk  Fri Dec  1 09:18:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:18:43 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
	<6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
Message-ID: <457039C3.30907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:
> 
>> You were pointing out the YAML issue? I think I'm less concerned with 
>> that (solution: install YAML) and much more concerned with why it 
>> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The 
>> module in question is in the same dir as the Build script, so it 
>> should be found automatically.
>>
>> The only thing I can think of is that CPAN doesn't manage to chdir to 
>> the directory. Hopefully I'll be able to reproduce this and then I can 
>> investigate further.
> 
> My thought was the two were related in some way.  I'm not sure to tell 
> the truth.

They weren't, using YAML is the fall-back position incase of earlier 
failure.

I've fixed it now in any case.


From gwu at molbio.mgh.harvard.edu  Fri Dec  1 10:19:42 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Fri, 01 Dec 2006 10:19:42 -0500
Subject: [Bioperl-l] One more load_seqdatabase.pl question
In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com>	<53C6D534-6E36-4061-B955-E74537839265@gmx.net>	<456CA667.6010609@molbio.mgh.harvard.edu>
	<ED3F5F49-78A7-4E63-ACB8-5E8F745F0C34@gmx.net>
	<456F5648.6070207@molbio.mgh.harvard.edu>
	<70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu>

Thanks Hilmar. I did include the -lookup switch on the command line. The 
warning messages say that the code failed to "INSERT" instead of 
"UPDATE", which sounds like a match was not found. But I was just 
loading the same Genbank file for the second time. To test if it 
actually updated the records, I made a minor modification on one of the 
COMMENT feature. Unfortunately it's not updated. By the way, the test 
genbank file has four "COMMENT" features but they are different. Any 
idea what's happening there?

I wonder if it's a bad idea to "UPDATE" a sequence.  Say I got a new 
sequence version with 5 features removed, 5 features modified and 5 
features new. If only --lookup is included, according to the POD, the 5 
new features will be inserted, the 5 modified features will be updated 
and the 5 removed features will be in the database untouched. This 
rendered the new sequence records a mixture of old and new versions. I 
did not see a reason anyone would like to have a sequence like this. 
Either include -remove to replace the old version if only one version is 
needed, or put the new version under a different name space if multiple 
versions are needed. Do I have the correct understanding of these issues?

I deeply appreciate your help.

Gang


Hilmar Lapp wrote:
> Right. You need to tell it to lookup sequences first if you know that 
> you are loading sequences which may be in the database already (see 
> the POD of load_seqdatabase.pl, switch --lookup; there are several 
> other command line options that control what will happen if a sequence 
> entry is already present in the database.).
>
> The messages in you report are warnings, not errors. It looks like 
> some of the comments are duplicated for a sequence, it doesn't look 
> like reason for concern. Is not so good if you get errors thrown.
>
>     -hilmar
>
> On Nov 30, 2006, at 5:08 PM, gang wu wrote:
>
>> Thanks Hilmar. Do you mean the NVL() clause will make 
>> load_seqdatabase.pl not work when update?
>>
>> I have problem with updating. Seems load_seqdatabase.pl only tries to 
>> insert instead of update. I used one of the test genbank file coming 
>> whith bioperl-db. Please take a look at the attached output.
>>
>> Thanks.
>>
>> Gang
>>
>> =========================================
>> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle 
>> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank 
>> -namespace test 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb
>> Loading 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb 
>> ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("This sequence was reannotated via the Ensembl system. 
>> Please visit the Ensembl web site, http://www.ensembl.org/ for more 
>> information. ","1") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("The /gene indicates a unique id for a gene, /cds a 
>> unique id for a translation and a /exon a unique id for an exon. 
>> These ids are maintained wherever possible between versions. For more 
>> information on how to interpret the feature table, please visit 
>> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>> ...
>> ...
>> ==========================================================
>> Hilmar Lapp wrote:
>>> These are the protein translations stored in the feature table as 
>>> tags of features, right? You can change the type of the column 
>>> (although there may be some issues when you update the column 
>>> because the NVL() clause won't work if I recall that correctly), but 
>>> doing so will deprive you of any 'normal' searches against that 
>>> column. (You can still use functions >from the DBMS_LOB package, but 
>>> they will be much slower and are completely non-standard.) It is up 
>>> to you whether that is too big of a price to pay for having some 
>>> redundant protein translations (translating the feature's DNA 
>>> sequence should give you the same) in the database. I always trimmed 
>>> those feature tags off (using a custom SeqProcessor). An alternative 
>>> is to convert these feature tags into actual bioentries (i.e., 
>>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do 
>>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote:
>>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank 
>>>> genome sequences to my Oracle BioSQL database. I saw some 
>>>> errors(See attached warning message) related to 
>>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE 
>>>> column), which has Varchar2 data type of maximum 4000 bytes. Did 
>>>> anybody mention this issue before? Should I just modify the column 
>>>> to a type being able store more data such as LONG or CLOB? Thanks. 
>>>> Gang Log information: ============================================ 
>>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc 
>>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace 
>>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading 
>>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- 
>>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: 
>>>> unexpected failure of statement execution: ORA-01461: can bind a 
>>>> LONG value only for insert into a LONG column (DBD ERROR: error 
>>>> possibly near <*> indicator at char 12 in 'INSERT INTO 
>>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) 
>>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] 
>>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: 
>>>> FK[Bio::SeqFeature::Generic]:14898, 
>>>> FK[Bio::Annotation::SimpleValue]:800, 
>>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV 
>>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR 
>>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI 
>>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP 
>>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA 
>>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY 
>>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA 
>>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI 
>>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW 
>>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL 
>>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN 
>>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY 
>>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT 
>>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL 
>>>> VQATYQASA! 
>>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV 
>>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY 
>>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV 
>>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE 
>>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG 
>>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV 
>>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL 
>>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL 
>>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT 
>>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL 
>>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV 
>>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY 
>>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD 
>>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR 
>>>> VKLDFNFM! 
>>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS 
>>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN 
>>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL 
>>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD 
>>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE 
>>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV 
>>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL 
>>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS 
>>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF 
>>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL 
>>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA 
>>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL 
>>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN 
>>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE 
>>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL 
>>>> WLSVGADAS! 
>>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY 
>>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND 
>>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES 
>>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS 
>>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV 
>>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW 
>>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV 
>>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS 
>>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV 
>>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM 
>>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI 
>>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK 
>>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR 
>>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG 
>>>> QRKFIPAK! 
>>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ 
>>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", 
>>>> rank:"1" -------------------------------------------------- 
>>>> =============================================   
>>>> _______________________________________________ Bioperl-l mailing 
>>>> list Bioperl-l at lists.open-bio.org 
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From bosborne11 at verizon.net  Fri Dec  1 09:55:18 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 01 Dec 2006 09:55:18 -0500
Subject: [Bioperl-l] An announcement
Message-ID: <C195AC86.BB6A%bosborne11@verizon.net>

bioperl-l,

I would like to call your attention to a job posting and in doing so I
realize that I?m probably breaking a rule of this list. I apologize and and
acknowledge that I?ve transgressed. The reason I do this is because this is
an interesting job that is relevant to a lot of what we do in this mailing
list, and some of you might want to consider it. The posting is here:

http://www.nescent.org/main/employment.html#gmodhelpdesk

I encourage you to pass this on to anyone who you think might be interested.

Thanks again,

Brian O.


From cjfields at uiuc.edu  Fri Dec  1 11:49:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 10:49:32 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF53E.90907@sheffield.ac.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk>
Message-ID: <D464535F-E70F-44B4-AD48-3CC79181869C@uiuc.edu>


On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote:
...
> In addition, using CPAN allows you to run the test suite easily  
> without the need to download it separately and run it after a PPM  
> install.

A PPM, by design, is supposed to imply that the distribution passes  
tests for the specified platform, at that point in time, after all  
prereqs are installed and any additional postinstall operations  
(install C libraries, modify config files, etc) are complete.  The  
ActiveState automated PPM building process dictates that; if it fails  
any test, it will not be made into a PPM.  It's sort of a 'stamp of  
approval' that all tests pass, so you don't need to run them.

However, a test may fail (and a PPM may not get generated) for pretty  
superficial reasons, such as the makefile not specifying that a  
module is needed, server issues, etc, so the automated process isn't  
fullproof.  That's why Kobes and the other repositories are  
available, where the PPM/PPD is manually generated and made to work  
specifically for Windows (or whatever other platform).

Saying that, it is completely up to the person packaging the  
distribution to follow those rules if one were to make a PPM  
manually.  You don't even have to run tests prior to using 'nmake  
ppd'.  We can currently state, though, that all tests pass when all  
prereqs are installed for this distribution.  At least at this point  
in time!

> I don't know of a way to clean out ActivePerl - I use VMWare  
> Workstation and have a virtual machine with a fresh install of  
> WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas?

I haven't tried it that way.  I have Parallels on Mac OS X (I run a  
SigmaPlot/Excel combo off it).  My tests were using a native WinXP  
installation (i.e. not virtually) on my old Dell.  It shouldn't make  
a difference; VMWare, Parallels, and the like should all run  
ActivePerl for WinXP since it's a virtual machine.  Windows Vista, on  
the other hand...

I think with PPM4 you can install to a custom directory.  It may be  
possible to install all new modules to that directory, then you would  
at least have an idea of what was there (though I don't think you can  
delete it directly w/o screwing up the PPM database).

chris


From bix at sendu.me.uk  Fri Dec  1 12:12:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 17:12:49 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <45706291.80201@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:

I extracted just Q7N3Q6 from 
ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
and was able to load it in using load_seqdatabase.pl under linux with no 
errors. If you make a file with just that sequence do you still get the 
error?

Is anyone else able to reproduce the problem?


From cjfields at uiuc.edu  Fri Dec  1 12:57:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 11:57:18 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45703985.5050203@sendu.me.uk>
Message-ID: <006301c71572$24be8830$15327e82@pyrimidine>


> Chris Fields wrote:
> PPM).  I can 
> > see using CPAN as an alternative way of installing Bioperl for a 
> > distribution, or as the primary method via CVS or manually, but not 
> > for distributions.  At least not until the kinks are worked out for 
> > Windows users.
> 
> CPAN isn't being suggested as the primary or preferred 
> installation method for Windows. That will still be PPM. I'm 
> mentioning CPAN / manual installation in the Windows INSTALL 
> docs for the benefit of anyone who wants a simple install and 
> test environment when checking out from CVS.

That's fine by me.  I think the focus is making sure the PPM works, but that
shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
was never released concurrently with the distribution (if at all); it
generally followed by a few weeks to a few months past a final release.

> > What are the significant issues for a bioperl PPM installation
> 
> None that I'm aware of - I just need to find the time to 
> start looking into generating an appropriate PPD. Hopefully 
> Nathan's wiki page on the subject will be all I need.

I'll try testing it out today and next week (the more people we have looking
into the issue the better).  I'm sure that Module::Build hasn't updated to
using PPM4 XML formatting, but the tags are similar enough.  I can always
create a local PPM database using a similar directory structure to
bioperl.org/DIST and test an installation from it.

chris


From n.haigh at sheffield.ac.uk  Fri Dec  1 13:52:55 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 18:52:55 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707A07.7000106@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   
>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>   

To clarify a few things about PPM4 XML and to highlight the main 
differences:

1) The use of PROVIDE and REQUIRE tags
2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma 
separated tuples like PPM3 XML
4) the VERSION in PROVIDE and REQUIRE are used internally to do version 
comparisons for upgrades and solving prereqs etc
5) Module names should all contain '::' either natively according their 
namespace, if it doesn't have one natively, then one is appended to the 
end e.g. "GD::"
6) the VERSION in the SOFTPKG key is for human readability only
7) the NAME in SOFTPKG is used to identify which packages are actually 
the same.

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 18:52:57
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 13:52:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 18:52:44 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <457079FC.7010209@sendu.me.uk>

Sendu Bala wrote:
> pelikan at cs.pitt.edu wrote:
[snip]
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
> 
> I extracted just Q7N3Q6 from 
> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux with no 
> errors. If you make a file with just that sequence do you still get the 
> error?
> 
> Is anyone else able to reproduce the problem?

In fact, if I just try and load it again I reproduce the problem.
The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092

And I have a tentative fix that extends Brian's fix there. Committed to 
HEAD only atm. I don't know anything about bioperl-db and don't have the 
faintest clue why this is happening, nor the time to figure it out. Can 
someone please have a proper look at this and decide if my fix is sane?

All I can say is the the test suites for bioperl-live and bioperl-db 
continue to pass, but that isn't really saying much.


PS. having used load_seqdatabase.pl to load a sequence, how do I remove 
it afterwards?


From cjfields at uiuc.edu  Fri Dec  1 14:00:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:00:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <EAE311A7-DB66-4CFC-9598-EA6FCAED9B7F@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

I can reproduce on both WinXP and Mac OS X using the latest bioperl- 
db/bioperl-live and a BioSQL database preloaded with taxonomy.   
Notably the bug doesn't show up with a database lacking taxonomy,  
where no lookup is used (I guess).

Here's some overly verbose debugging (apologies):

Loading saved.flat ...
attempting to load adaptor class for Bio::Seq::RichSeq
	attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
	attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
	attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Tree::Tree
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Root::Root
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
	attempting to load module Bio::DB::BioSQL::RootIAdaptor
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Tree::TreeI
	attempting to load module Bio::DB::BioSQL::TreeIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Tree::NodeI
	attempting to load module Bio::DB::BioSQL::NodeIAdaptor
	attempting to load module Bio::DB::BioSQL::NodeAdaptor
attempting to load adaptor class for Bio::Tree::TreeFunctionsI
	attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor
no adaptor found for class Bio::Tree::Tree
attempting to load adaptor class for Bio::DB::Taxonomy::list
	attempting to load module Bio::DB::BioSQL::listAdaptor
attempting to load adaptor class for Bio::DB::Taxonomy
	attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load adaptor class for Bio::Annotation::Collection
	attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
	attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
	attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
	attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
	attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
	attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
	attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
	attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
	attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
	attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
	attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
	attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
	attempting to load module Bio::DB::BioSQL::LocationIAdaptor
	attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
	attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,  
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority)  
VALUES (?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "Swiss- 
Prot" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load driver for adaptor class  
Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for  
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon,  
taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class  
= ? AND ncbi_taxon_id = ?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid)
prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM  
taxon node, taxon taxon, taxon_name name WHERE name.taxon_id =  
node.taxon_id AND taxon.left_value BETWEEN node.left_value AND  
node.right_value AND taxon.taxon_id = ? AND name.name_class =  
'scientific name' ORDER BY node.left_value
attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::SeqAdaptor
Could not store Q7N3Q6:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The supplied lineage does not start near 'Photorhabdus  
luminescens subsp. laumondii'
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ 
Bio/Species.pm:166
STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:552
STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ 
Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK: load_seqdatabase.pl:620
-----------------------------------------------------------

at load_seqdatabase.pl line 633


chris


From cjfields at uiuc.edu  Fri Dec  1 14:01:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:01:59 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45707A07.7000106@sheffield.ac.uk>
References: <006301c71572$24be8830$15327e82@pyrimidine>
	<45707A07.7000106@sheffield.ac.uk>
Message-ID: <C233572F-BD36-4DBE-BE9B-2C097F4C939B@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>>> Chris Fields wrote:
>>> PPM).  I can
>>>> see using CPAN as an alternative way of installing Bioperl for a  
>>>> distribution, or as the primary method via CVS or manually, but  
>>>> not for distributions.  At least not until the kinks are worked  
>>>> out for Windows users.
>>>>
>>> CPAN isn't being suggested as the primary or preferred  
>>> installation method for Windows. That will still be PPM. I'm  
>>> mentioning CPAN / manual installation in the Windows INSTALL docs  
>>> for the benefit of anyone who wants a simple install and test  
>>> environment when checking out from CVS.
>>>
>>
>> That's fine by me.  I think the focus is making sure the PPM  
>> works, but that
>> shouldn't hold up the final 1.5.2 release.  The PPM for previous  
>> releases
>> was never released concurrently with the distribution (if at all); it
>> generally followed by a few weeks to a few months past a final  
>> release.
>>
>>
>>>> What are the significant issues for a bioperl PPM installation
>>>>
>>> None that I'm aware of - I just need to find the time to start  
>>> looking into generating an appropriate PPD. Hopefully Nathan's  
>>> wiki page on the subject will be all I need.
>>>
>>
>> I'll try testing it out today and next week (the more people we  
>> have looking
>> into the issue the better).  I'm sure that Module::Build hasn't  
>> updated to
>> using PPM4 XML formatting, but the tags are similar enough.  I can  
>> always
>> create a local PPM database using a similar directory structure to
>> bioperl.org/DIST and test an installation from it.
>>
>> chris
>>
>
> To clarify a few things about PPM4 XML and to highlight the main  
> differences:
>
> 1) The use of PROVIDE and REQUIRE tags
> 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
> 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma  
> separated tuples like PPM3 XML
> 4) the VERSION in PROVIDE and REQUIRE are used internally to do  
> version comparisons for upgrades and solving prereqs etc
> 5) Module names should all contain '::' either natively according  
> their namespace, if it doesn't have one natively, then one is  
> appended to the end e.g. "GD::"
> 6) the VERSION in the SOFTPKG key is for human readability only
> 7) the NAME in SOFTPKG is used to identify which packages are  
> actually the same.
>
> Nath

Okay.  Maybe place this in the wiki (PPM page) for future reference?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 14:05:38 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 19:05:38 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707D02.9070504@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   

Forgot to say, one really annoying thing about PPM is that it seems to 
display all the versions of Bioperl defined in the XML file. An 
addition, I think a bug in PPM4 means that if a package is available in 
ActiveStates repo PPM4 always want to install it rather than a more 
recent version in a different repo (this includes upgrades). This 
results in this annoying behaviour:
1) If activestate and bioperl repos are active, searching for bioperl 
lists several versions
2) If you are using PPM4 GUI, and have installed a non activestate 
version, then it says you can upgrade to the version in activestates 
repo (even if it's actually a downgrade).
3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" 
it will always install the version in the activestate repo.
4) I'm sure there are also some other annoyances.

In the end, it means the best way to install and upgrade bioperl, is to 
search for bioperl packages and install the latest version by eye rather 
than relying in the "upgrade feature" (at least for the time being). You 
may also need to remove an old version of bioperl before installing a 
more recent version. NOTE: using "upgrade" runs the risk of installing 
bioperl 1.2.3 from activestate and not the latest version in any other repo!

I'll update the wiki when I have time.
Nath


>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0652-4, 30/11/2006
> Tested on: 01/12/2006 18:29:23
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
>
>
>
>   


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 19:05:39
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From cjfields at uiuc.edu  Fri Dec  1 14:06:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:06:53 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

Okay, just updated to get your latest CVS fixes for bioperl-live and  
it passes now for both Mac OS X and WinXP.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Dec  1 14:09:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:09:15 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <A85B86B9-3DCD-4855-AC06-675D19E3689E@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote:

>
> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

There's not much documentation on it, but it demonstrated several  
times in the test suite.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Dec  1 14:39:17 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 19:39:17 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
	<0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
Message-ID: <457084E5.2050300@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:
> 
>> pelikan at cs.pitt.edu wrote:
>>> Hello all,
>>>
>>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>>> without Cygwin. The "make test"s have all completed without error. This
>>> is my first time dealing with bioperl, so bear with me.
>>>
>>>    I've successfully loaded the most recent taxonomy information 
>>> using the
>>> biosql-schema scripts. After this, I attempted to load the most recent
>>> release of the uniprot flat file dataset with the following command:
>>>
>>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>>
>>> I am subsequently greeted by many of the following errors:
>>>
>>> Could not store Q7N3Q6:
>>
>> I extracted just Q7N3Q6 from
>> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz 
>>
>> and was able to load it in using load_seqdatabase.pl under linux with no
>> errors. If you make a file with just that sequence do you still get the
>> error?
>>
>> Is anyone else able to reproduce the problem?
> 
> Okay, just updated to get your latest CVS fixes for bioperl-live and it 
> passes now for both Mac OS X and WinXP.

Can you confirm if it is actually working correctly though? Like, having 
stored a previously-problem sequence, can you get it back out from the 
database and is its ->species() correct?


From cjfields at uiuc.edu  Fri Dec  1 14:52:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:52:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457084E5.2050300@sendu.me.uk>
Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine>

> > 
> > Okay, just updated to get your latest CVS fixes for 
> bioperl-live and 
> > it passes now for both Mac OS X and WinXP.
> 
> Can you confirm if it is actually working correctly though? 
> Like, having stored a previously-problem sequence, can you 
> get it back out from the database and is its ->species() correct?

I would assume so, if we can trust the species tests.  I will have to try it
again over the weekend.  I planned on loading a ton of protein sequences in
anyway, most of which are bacterial; if anything breaks it will probably be
with those.

I think Jason and Hilmar were going to get together about the BioSQL paper
at the hackathon.  That may be a good place to bring some of the species
issues, if they persist.

chris


From hlapp at gmx.net  Fri Dec  1 20:42:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 1 Dec 2006 20:42:05 -0500
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net>

Either using SQL:

	-- theoretically you should convince yourself first that there
	-- is only one such record (the UK is over acc,version,namespace)
	SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6';

or through bioperl-db (see the delete test for examples):

	my $db = Bio::DB::BioDB->new(....);
	my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6',
	                               -namespace=>'whatever you used when  
loading');
	my $adp = $db->get_persistence_adaptor($seq);
	my $pseq = $adp->find_by_unique_key($seq);
	$pseq->remove();
	$pseq->commit();

-hilmar

On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote:

> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From chhalling at verizon.net  Sun Dec  3 20:56:51 2006
From: chhalling at verizon.net (Conrad Halling)
Date: Sun, 03 Dec 2006 20:56:51 -0500
Subject: [Bioperl-l] BioPerl Wiki is down
Message-ID: <45738063.1070504@verizon.net>

When I attempted to navigate to http://www.bioperl.org/, I got the 
following message:

A database query syntax error has occurred. This may indicate a bug in 
the software. The last attempted database query was:

    (SQL query hidden)

from within function "MediaWikiBagOStuff::_doquery". MySQL returned 
error "1205: Lock wait timeout exceeded; try restarting transaction 
(localhost)".

-- 
Conrad Halling
chhalling at verizon.net


From rbirnie at totalise.co.uk  Sun Dec  3 16:38:02 2006
From: rbirnie at totalise.co.uk (richard)
Date: Sun, 3 Dec 2006 21:38:02 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
Message-ID: <200612032138.02522.rbirnie@totalise.co.uk>

Hi all,

I'm having a little trouble getting Bio::Graphics to give me the correct 
output and I'm looking for some help. I am trying to extend from example 5 of 
the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 
Eventually I intend the script to follow example 6 but I thought I'd try the 
simpler version first.

The basic aim of the script is that it takes as input a file containing a list 
of GenBank IDs plus some other info for alternative transcripts of a gene. 
This information is stored in a hash and the GenBank IDs are used to retrieve 
the appropriate entries from GenBank. I then want to use Bio::Graphics to 
generate a figure from the feature tables showing the CDSs from the 
alternative transcripts. 

So far I have managed to retrieve the GenBank entries extract the feature 
tables and store a reference to these in the hash mentioned above. I've also 
got Bio::Graphics to draw a basic image but some of the details aren't right 
and I don't understand why. I have attached the code I have so far, the input 
file and the output image to this mail. I didn't want to display it all in 
the main message but I'm not actually sure which bit is causing the problem. 
The code is very rough and in need of polishing but I need to get it to work 
correctly first.

These are the problems:
1) As I understand it this:

my $wholeseq = Bio::SeqFeature::Generic->new (
		-start => 1,
		-end => $refseq->length,
		-display_name =>$refseq->display_name
		);

should display the name of the gene (CD133/Prominin1) near the top of image. 
It doesn't, am I misunderstanding or is there an error in the code?

2) In the quoted example the CDS is broken up into smaller regions which are 
then linked together in example 6. This isn't happening in my code and I 
think it should be, I get one solid block for the CDS. I don't understand why 
this is because I'm not clear which parts of the feature table are used to 
define where the CDS should be split. I think this is the relevant bit of 
code:

foreach my $alt_trans (keys %main) {
	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

		my $feature = $main{$alt_trans}{'features'}{$tag};

		$panel->add_track($feature,
				-glyph => 'generic',
				-bgcolor => $colors[$idx++ % @colors],
				-fgcolor => 'black',
				-font2color => 'black',
				-key => $alt_trans,
				-bump => +1,
				-height => 8,
				-label => 1,
				-description => 1,
				) if ($tag eq 'CDS');

}
}

Can anyone tell me what I am doing wrong?

RefSeq entry for the gene of interest is here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386
If I understand correctly the example file used in the HOWTO is this gene:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320

Final question, does bioperl come with example scripts and is so where whould 
they normally be found on a Linux system?

If anyone is still reading this thanks for your patience. Any clarification 
will be appreciated.

regards,
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133_graphic_code
Type: application/x-perl
Size: 2702 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0002.bin>
-------------- next part --------------
sequence_ID	Exon_Boundary	Assay_location	Amplicon_length
NM_006017	9 - 10	1118	106
AF027208.1	9 - 10	1118	106
AK027420.1	9 - 10	1312	106
AK027422.1	9 - 10	1334	106
BC012089.1	9 - 10	1289	106
AY449689.1	8 - 9	1054	106
AY449690.1	8 - 9	1054	106
AY449691.1	8 - 9	1054	106
AY449692.1	9 - 10	1081	106
AY449693.1	9 - 10	1081	106
AF507034.1	8 - 9	1091	106
AK075411.1	9 - 10	1289	106
AF117225.1	9 - 10	1334	106
AK226033.1	-	1312	106
DQ895452.1	-	1054	106
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133.png
Type: image/png
Size: 4322 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0002.png>

From cjfields at uiuc.edu  Sun Dec  3 22:35:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Dec 2006 21:35:17 -0600
Subject: [Bioperl-l] BioPerl Wiki is down
In-Reply-To: <45738063.1070504@verizon.net>
References: <45738063.1070504@verizon.net>
Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu>

On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote:

> When I attempted to navigate to http://www.bioperl.org/, I got the
> following message:
>
> A database query syntax error has occurred. This may indicate a bug in
> the software. The last attempted database query was:
>
>     (SQL query hidden)
>
> from within function "MediaWikiBagOStuff::_doquery". MySQL returned
> error "1205: Lock wait timeout exceeded; try restarting transaction
> (localhost)".
>
> -- Conrad Halling
> chhalling at verizon.net

This has been an ongoing problem with the server; I have reported it  
previously to open-bio support.  There have been a few attempts to  
fix it which seem to work short-term but something else must be  
wrong.  Jason?  Chris D?

For my part, Googling found the following link, which indicates that  
this error may be due to heavy server load:

http://tibia.erig.net/TibiaWiki:Bug_reports

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Derek.Fairley at bll.n-i.nhs.uk  Mon Dec  4 05:18:37 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Mon, 4 Dec 2006 10:18:37 -0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C63D@bllmail.bll.n-i.nhs.uk>

Richard,

 
You can find instructions for installing the example scripts directory
here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_SCRIPTS 

 
or you can get individual scripts from here:

http://www.bioperl.org/wiki/Bioperl_scripts11 

 
Derek.

 
-----Original Message-----

From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard

Sent: 03 December 2006 21:38

To: Bioperl list

Subject: [Bioperl-l] confused by Bio::Graphics

 
Hi all,

 
I'm having a little trouble getting Bio::Graphics to give me the correct


output and I'm looking for some help. I am trying to extend from example
5 of 

the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 

Eventually I intend the script to follow example 6 but I thought I'd try
the 

simpler version first.

 
The basic aim of the script is that it takes as input a file containing
a list 

of GenBank IDs plus some other info for alternative transcripts of a
gene. 

This information is stored in a hash and the GenBank IDs are used to
retrieve 

the appropriate entries from GenBank. I then want to use Bio::Graphics
to 

generate a figure from the feature tables showing the CDSs from the 

alternative transcripts. 

 
So far I have managed to retrieve the GenBank entries extract the
feature 

tables and store a reference to these in the hash mentioned above. I've
also 

got Bio::Graphics to draw a basic image but some of the details aren't
right 

and I don't understand why. I have attached the code I have so far, the
input 

file and the output image to this mail. I didn't want to display it all
in 

the main message but I'm not actually sure which bit is causing the
problem. 

The code is very rough and in need of polishing but I need to get it to
work 

correctly first.

 
These are the problems:

1) As I understand it this:

 
my $wholeseq = Bio::SeqFeature::Generic->new (

            -start => 1,

            -end => $refseq->length,

            -display_name =>$refseq->display_name

            );

 
should display the name of the gene (CD133/Prominin1) near the top of
image. 

It doesn't, am I misunderstanding or is there an error in the code?

 
2) In the quoted example the CDS is broken up into smaller regions which
are 

then linked together in example 6. This isn't happening in my code and I


think it should be, I get one solid block for the CDS. I don't
understand why 

this is because I'm not clear which parts of the feature table are used
to 

define where the CDS should be split. I think this is the relevant bit
of 

code:

 
foreach my $alt_trans (keys %main) {

      foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

 
            my $feature = $main{$alt_trans}{'features'}{$tag};

 
            $panel->add_track($feature,

                        -glyph => 'generic',

                        -bgcolor => $colors[$idx++ % @colors],

                        -fgcolor => 'black',

                        -font2color => 'black',

                        -key => $alt_trans,

                        -bump => +1,

                        -height => 8,

                        -label => 1,

                        -description => 1,

                        ) if ($tag eq 'CDS');

 
}

}

 
Can anyone tell me what I am doing wrong?

 
RefSeq entry for the gene of interest is here:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386

If I understand correctly the example file used in the HOWTO is this
gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053
20

 
Final question, does bioperl come with example scripts and is so where
whould 

they normally be found on a Linux system?

 
If anyone is still reading this thanks for your patience. Any
clarification 

will be appreciated.

 
regards,

Richard

 
From rbirnie at totalise.co.uk  Mon Dec  4 04:30:36 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 09:30:36 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/551f1442/attachment-0002.html>

From bix at sendu.me.uk  Mon Dec  4 09:37:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:37:16 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <45706671.9000201@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>	<456F27E9.70205@york.ac.uk>
	<456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk>
Message-ID: <4574329C.2030905@sendu.me.uk>

Samantha Thompson wrote:
> Hi,
> Thanks for all your help so far, I am still trying to understand a 
> couple of things...

You should make sure your replies are sent to the list, as you're likely 
to get a faster response.


[where $blast_report is the value returned by 
Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)]
> when I run this line..
> 
> $searchio = Bio::SearchIO->new(-format <http://www.perldoc.com/perl5.6/pod/func/format.html> => 'blast',
>                                -file   => $blast_report);
> 
> between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not open 1: No such file or directory
[snip]
> Does this mean that my BLAST is failing when I submit it?

No, the -file option of SearchIO->new() takes, unsurprisingly, a 
filename. I'd tell you to pay careful attention to the docs, but sadly 
the RemoteBlast docs are currently wrong.

submit_blast() claims to return 'Blast report object' (which in any case 
certainly wouldn't be a filename) when in fact it returns, as you 
discovered, a (for our purposes) meaningless number.

As I suggested before, you need to look at the synopsis for 
Bio::Tools::Run::RemoteBlast instead.

(having called submit_blast you must do the each_rid loop)


Does anyone care to go through the POD for RemoteBlast and update it to 
an accurate state?


From bix at sendu.me.uk  Mon Dec  4 09:40:27 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:40:27 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
	<BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
Message-ID: <4574335B.805@sendu.me.uk>

rbirnie at totalise.co.uk wrote:
> Hi all,
> 
> I've just seen my previous mail come through on the digest and I noticed 
> that the code I attached has been scrubbed which means that the message 
> won't make much sense. If I've contravened list rules by posting 
> attachments then apologies, I did look for a posting guide but couldn't 
> see one on the wiki. I deliberatley didn't put the whole code in the 
> main message because it's quite long. I'm not sure which part is wrong 
> so I don't know which part to post I'm just not seeing the output I 
> would expect from the example. What is the best thing for me to do?

I saw a few attachments on your post (including your code example), so I 
think what you did was fine.


From cjfields at uiuc.edu  Mon Dec  4 10:40:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 09:40:20 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <4574335B.805@sendu.me.uk>
Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine>


> rbirnie at totalise.co.uk wrote:
> > Hi all,
> > 
> > I've just seen my previous mail come through on the digest and I 
> > noticed that the code I attached has been scrubbed which means that 
> > the message won't make much sense. If I've contravened list 
> rules by 
> > posting attachments then apologies, I did look for a 
> posting guide but 
> > couldn't see one on the wiki. I deliberatley didn't put the 
> whole code 
> > in the main message because it's quite long. I'm not sure 
> which part 
> > is wrong so I don't know which part to post I'm just not seeing the 
> > output I would expect from the example. What is the best 
> thing for me to do?
> 
> I saw a few attachments on your post (including your code 
> example), so I think what you did was fine.

Same here.  I received a PNG file and two text files (a script and a data
file).

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

 
From rbirnie at totalise.co.uk  Mon Dec  4 11:06:51 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 16:06:51 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine>
References: <002001c717ba$823c1500$15327e82@pyrimidine>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612041606510.37306@webm5.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/22c3c5e0/attachment-0002.html>

From dmessina at wustl.edu  Mon Dec  4 11:46:16 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 4 Dec 2006 10:46:16 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
References: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <ACE259C3-DC1C-41CC-88F3-7ACF8B9D66AA@wustl.edu>

Hi Richard,


> [richard]
>
> These are the problems:
> 1) As I understand it this:
>
> my $wholeseq = Bio::SeqFeature::Generic->new (
> 		-start => 1,
> 		-end => $refseq->length,
> 		-display_name =>$refseq->display_name
> 		);
>
> should display the name of the gene (CD133/Prominin1) near the top  
> of image.
> It doesn't, am I misunderstanding or is there an error in the code?

The contents of a sequence object's display_name varies depending on  
the type of sequence record; for a sequence object created from a  
Genbank record, it's the value of the LOCUS field on the first line  
of the record.

If you want the gene name, you'll have to dig it out of the feature  
table. If you look at the  Genbank record for your first sequence,  
you'll see that under both the gene and CDS primary features, the  
HUGO gene abbreviation is stored under the "gene" secondary tag, and  
various synonyms are under the "note" and "product" secondary tags.

LOCUS       NM_006017               3794 bp    mRNA    linear   PRI  
17-NOV-2006
DEFINITION  Homo sapiens prominin 1 (PROM1), mRNA.
ACCESSION   NM_006017
VERSION     NM_006017.1  GI:5174386
[...skipping irrelevant part of the Genbank record...]
FEATURES             Location/Qualifiers
      source          1..3794
                      /organism="Homo sapiens"
                      /mol_type="mRNA"
                      /db_xref="taxon:9606"
                      /chromosome="4"
                      /map="4p15.32"
      gene            1..3794
                      /gene="PROM1"
                      /note="prominin 1; synonyms: AC133, CD133, PROML1,
                      MSTP061"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
      CDS             38..2635
                      /gene="PROM1"
                      /go_component="integral to plasma membrane  
[pmid 9389720];
                      membrane"
                      /go_process="response to stimulus; visual  
perception"
                      /note="hProminin; prominin (mouse)-like 1;  
hematopoietic
                      stem cell antigen"
                      /codon_start=1
                      /product="prominin 1"
                      /protein_id="NP_006008.1"
                      /db_xref="GI:5174387"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
[....more...]

In your script, you grab the primary features between lines 34-60.  
You can grab the secondary feature you want with something like:

[cribbed from the Feature-Annotation HOWTO]
for my $feat_object ($seq_object->get_SeqFeatures) {
    push @ids, $feat_object->get_tag_values("gene") if ($feat_object- 
 >has_tag("gene"));
}


> 2) In the quoted example the CDS is broken up into smaller regions  
> which are
> then linked together in example 6. This isn't happening in my code  
> and I
> think it should be, I get one solid block for the CDS. I don't  
> understand why
> this is because I'm not clear which parts of the feature table are  
> used to
> define where the CDS should be split. I think this is the relevant  
> bit of
> code:
>
> foreach my $alt_trans (keys %main) {
> 	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {
>
> 		my $feature = $main{$alt_trans}{'features'}{$tag};
>
> 		$panel->add_track($feature,
> 				-glyph => 'generic',
> 				-bgcolor => $colors[$idx++ % @colors],
> 				-fgcolor => 'black',
> 				-font2color => 'black',
> 				-key => $alt_trans,
> 				-bump => +1,
> 				-height => 8,
> 				-label => 1,
> 				-description => 1,
> 				) if ($tag eq 'CDS');
>
> }
> }


The problem here is that RefSeq mRNA records don't contain intron- 
exon boundary information. I think you'll have to get that from an  
assembly record. From the Entrez gene page for PROM1, I obtained a  
Genbank record for the PROM1 genomic locus:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb

Saving that as 'PROM1.gb' (the suffix is important), and running the  
bp_embl2picture.pl script on it, I got an image similar to Figure 6  
(attached).

Hope this helps,
Dave


?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PROM1.png
Type: image/png
Size: 8646 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0002.png>

From bix at sendu.me.uk  Mon Dec  4 14:37:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 19:37:13 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
Message-ID: <457478E9.3060405@sendu.me.uk>

Chris Fields wrote:
> Sendu,
> 
> Are current plans to still try getting the final 1.5.2 release out
> before the hackathon next week?

Yes, I seriously hope so. I was kind of hoping to see test results from 
you and Nathan on the wiki though...


> There are a few commits I want to make, but I may wait until after
> 1.5.2 is out before I add them.

But don't let the release stop you. As long as you don't commit to the
1.5.2 branch it will be fine.


From cjfields at uiuc.edu  Mon Dec  4 14:34:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 13:34:34 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine>

Sendu,

Are current plans to still try getting the final 1.5.2 release out before
the hackathon next week?  There are a few commits I want to make, but I may
wait until after 1.5.2 is out before I add them.

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Dec  4 15:23:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 14:23:45 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine>

> Chris Fields wrote:
> > Sendu,
> > 
> > Are current plans to still try getting the final 1.5.2 release out 
> > before the hackathon next week?
> 
> Yes, I seriously hope so. I was kind of hoping to see test 
> results from you and Nathan on the wiki though...

Ah, forgot to post those!  Working on that now...

> > There are a few commits I want to make, but I may wait until after
> > 1.5.2 is out before I add them.
> 
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.

There are a few things I plan on adding over the next few weeks, including
some things for Bio::Location::SplitLocation.  However I'm sure some of the
latter will break tests, so I'll be adding it in a bit at a time.

It all depends when I can squeeze time in to work on them!

chris 


From pelikan at cs.pitt.edu  Mon Dec  4 17:34:59 2006
From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu)
Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST)
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>

Hello,

    My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the
latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
memory. "make test"s past fine.

The problem is that I'm not getting similar numbers of anything when I
load datasets using load_seqdatabase.pl. For instance, if I want to load
only protiens from Homo Sapiens,
I go to UniProt,
use the database search function,
do a text search for Homo Sapiens (returns 70914 hits),
export the hits to flat file format (--format swiss) using the data set
manager,
and load it using load_seqdatabase.pl.

The result of  "select count(*) from bioentry;" results in only 1003 entries.
Moreover it seems like the entries don't go past the B's in the alphabet -
I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%',
but I can find apolipoproteins, for example.

I know this is an annoying question, but if someone has more experience in
dealing with this issue, I would be grateful for any assistance. I don't
get any error messages, so it's difficult for me to tell what's going on.

-Richard


From n.haigh at sheffield.ac.uk  Tue Dec  5 01:53:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 06:53:34 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <4575176E.3020906@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

OK, I'll get onto this today.

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From n.haigh at sheffield.ac.uk  Tue Dec  5 06:43:16 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 11:43:16 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <45755B54.7080902@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

I've added my test results for Debian to the wiki.
Nath

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From bix at sendu.me.uk  Tue Dec  5 06:47:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 05 Dec 2006 11:47:06 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <45755B54.7080902@sheffield.ac.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk>
Message-ID: <45755C3A.9050903@sendu.me.uk>

Nathan S. Haigh wrote:
> Sendu Bala wrote:
>> Chris Fields wrote:
>>   
>>> Sendu,
>>>
>>> Are current plans to still try getting the final 1.5.2 release out
>>> before the hackathon next week?
>>>     
>> Yes, I seriously hope so. I was kind of hoping to see test results from 
>> you and Nathan on the wiki though...
>
> I've added my test results for Debian to the wiki.

Thanks (and to Chris as well). I can't tell you how much I loath and 
despise TCoffee and Tmhmm now ;)


From cjfields at uiuc.edu  Tue Dec  5 11:04:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Dec 2006 10:04:38 -0600
Subject: [Bioperl-l] Build.PL changes
Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine>

Sendu,

I think the Build.PL commits which force installation of XML::SAX::Expat
should be rolled back.  XML::Simple works with any XML::SAX backend, not
just XML::SAX::Expat, which hasn't been actively maintained since 2003 and
is deprecated in favor of XML::SAX::ExpatXS.  In fact, forcing
XML::SAX::Expat to install as the default XML::SAX backend currently breaks
blastxml parsing.

Note that forcing this also forces one to install the Expat library (now at
v 2), which now has some compatibility problems with XML::SAX::Expat (but
not ExpatXS).

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From qetzal at tutopia.com.br  Wed Dec  6 10:21:20 2006
From: qetzal at tutopia.com.br (giovani)
Date: Wed, 06 Dec 2006 10:21:20 -0500
Subject: [Bioperl-l] Biodiversity graphic
Message-ID: <auto-000222418003@frontend01.cg.ifxnetworks.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061206/9d9e4a09/attachment-0002.html>

From benoit at ebi.ac.uk  Wed Dec  6 12:30:12 2006
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 06 Dec 2006 17:30:12 +0000
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <4576FE24.1030807@ebi.ac.uk>

giovani wrote:
> 
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having troubles. 
> here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.


It looks to me that the graph you pointing too has been made by gnuplot.
Why don't you use gnuplot or R instead ?

Ben

> 
> #!/usr/bin/perl -w
> 
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
> 
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
> 
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
> 
>  
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gwu at molbio.mgh.harvard.edu  Wed Dec  6 16:12:57 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 06 Dec 2006 16:12:57 -0500
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <45773259.3010405@molbio.mgh.harvard.edu>

Do you mean the GD code can not run or it does not generate image as you 
wanted?

Gang

giovani wrote:
>
>
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having 
> troubles. here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.
>
> #!/usr/bin/perl -w
>
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
>
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
>
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Dec  6 17:39:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 06 Dec 2006 22:39:49 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
Message-ID: <457746B5.2020006@sendu.me.uk>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.


From cjfields at uiuc.edu  Wed Dec  6 21:30:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 6 Dec 2006 20:30:44 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


From hlapp at gmx.net  Wed Dec  6 22:20:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:20:14 -0500
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net>

I seriously doubt that load_seqdatabase.pl would have deliberately  
stopped loading the file. Either there was an error in loading an  
entry (which you should see, and you can also ask the script to just  
keep going by providing the --safe option), or the file only  
contained 1003 entries.

Note that you can get progress logging by using the --logchunk  
option, which will also give you a final count of the number of  
sequences loaded.

I'm not sure how you ran your search and your download on Uniprot. If  
I try what you describe I get 70491 hits, and if I try to export them  
using the data set manager I get the message:

This download mechanism only supports 1000 proteins. The first 1000  
proteins have been added from the selected.

Which perfectly explains what you see.

Did you convince yourself that the file contains 70491 entries? If  
you don't have grep and wc on your windows machine, you can use perl  
one-liners directly, e.g.,

perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' <your-file- 
here>

	-hilmar

On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote:

> Hello,
>
>     My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC,  
> and the
> latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
> memory. "make test"s past fine.
>
> The problem is that I'm not getting similar numbers of anything when I
> load datasets using load_seqdatabase.pl. For instance, if I want to  
> load
> only protiens from Homo Sapiens,
> I go to UniProt,
> use the database search function,
> do a text search for Homo Sapiens (returns 70914 hits),
> export the hits to flat file format (--format swiss) using the data  
> set
> manager,
> and load it using load_seqdatabase.pl.
>
> The result of  "select count(*) from bioentry;" results in only  
> 1003 entries.
> Moreover it seems like the entries don't go past the B's in the  
> alphabet -
> I can't find bioentry.descriptions like '%cytochrome%' or '% 
> myoglobin%',
> but I can find apolipoproteins, for example.
>
> I know this is an annoying question, but if someone has more  
> experience in
> dealing with this issue, I would be grateful for any assistance. I  
> don't
> get any error messages, so it's difficult for me to tell what's  
> going on.
>
> -Richard
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lzhtom at hotmail.com  Wed Dec  6 22:13:47 2006
From: lzhtom at hotmail.com (zhihua li)
Date: Thu, 07 Dec 2006 03:13:47 +0000
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
Message-ID: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>

Hi netters,

Recently I found this:

For constructing a new SeqI object, I had to write:
$seq_obj=Bio::SeqIO->new(
      -file => '/home/myfile',
      -format => 'Fasta');              #Note the dash before the two 
arguments.

If I omitted the dash:
$seq_obj=Bio::SeqIO->new(
     file => '/home/myfile',
     format => 'Fasta');
I'd get error:
MSG: Unknown format given or could not determine it []
STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377

So it seems to me that the dashes before the arguments are essential.  
However, when I tried to build a factory for StandaloneBlast, I found the 
other way around.

If the script had the dash:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             -program => 'blastn',
             -database => '/home/mydatabase');

I'd get the error message: 
MSG: Unallowed parameter: - !
STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
STACK Bio::Tools::Run::StandAloneBlast::new 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400

If I left out the dash by saying:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             program => 'blastn',
             database => '/home/mydatabase');

Everyting is fine.

Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm 
not allowed to?

Thanks in advance!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From hlapp at gmx.net  Wed Dec  6 22:56:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:56:44 -0500
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <CE76F074-5897-431C-9E39-9E096DBD1973@gmx.net>

Congrats! Great work, Sendu! Don't forget to celebrate.

	-hilmar

On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From arareko at campus.iztacala.unam.mx  Wed Dec  6 22:53:21 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 06 Dec 2006 21:53:21 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <45779031.3050202@campus.iztacala.unam.mx>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Thu Dec  7 00:06:36 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Dec 2006 21:06:36 -0800
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org>

hear! hear!  Excellent work.   Thanks for leading the effort on this  
release and all of the behind the scenes work, attention to detail,   
and cat herding work it took make this possible.

-jason

On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From n.haigh at sheffield.ac.uk  Thu Dec  7 02:23:47 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 07 Dec 2006 07:23:47 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <4577C183.7010501@sheffield.ac.uk>

I know I'm very new to Bioperl development and don't know very much yet,
so I'm probably not the best person to express the views of the Bioperl
developers or users. However, I'm sure I'm safe in saying that on behalf
of everyone associated with Bioperl a *huge* thank you must go out to
Sendu for the gargantuan effort he has put into this release.

Just looking over some of the e-mails he's sent over the past few weeks
alone, it's clear that he has devoted a huge amount of time to the
effort and in some cases with little sleep. Since there is very little
(or should I say no) monetary recognition in such an important and time
consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe
in the knowledge that his efforts have helped enormously and are clearly
recognised and fully appreciated by the Bioperl community.

Therefore, I'd just like to iterate what others have already
said.....Well done, excellent work!!!

Nath


From valiente at lsi.upc.edu  Thu Dec  7 03:25:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 09:25:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
In-Reply-To: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>

The following popped out when input more the 110 species to  
taxonomy2tree script version 1.4:

         (in cleanup)
------------- EXCEPTION  -------------
MSG: Must supply a Bio::Taxon
STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
flatfile.pm:260
STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
STACK (eval) taxonomy2tree.pl:0
STACK toplevel taxonomy2tree.pl:0

Any clues? Thanks,

Gabriel


From bix at sendu.me.uk  Thu Dec  7 04:24:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:24:39 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DDD7.7060208@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Are you able to narrow the problem down? What was your command line, 
what species were you using? Does it work with the first 110 species you 
tried? Is there anything special about the 111th?

Do I understand correctly that this was a problem during cleanup only, 
and didn't affect the correctness and completeness of the result?


From bix at sendu.me.uk  Thu Dec  7 04:33:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:33:18 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DFDE.6000500@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Oh, does it work with option -e? Or does it work if you delete your old 
indexes of the nodes and names files and let it re-create them?


From valiente at lsi.upc.edu  Thu Dec  7 04:38:03 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 10:38:03 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577DDD7.7060208@sendu.me.uk>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
	<4577DDD7.7060208@sendu.me.uk>
Message-ID: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>

Hi,

If you run the attached shell script you should be able to reproduce  
the problem. It is not about any species in particular, but about the  
total number of species: it crushes with more than 120 species. The  
resulting tree is not correct, I'm checking it further now. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/00f0aeda/attachment-0002.obj>
-------------- next part --------------

On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>> The following popped out when input more the 110 species to   
>> taxonomy2tree script version 1.4:
>>          (in cleanup)
>> ------------- EXCEPTION  -------------
>> MSG: Must supply a Bio::Taxon
>> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/  
>> flatfile.pm:260
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
>> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
>> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
>> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
>> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
>> STACK (eval) taxonomy2tree.pl:0
>> STACK toplevel taxonomy2tree.pl:0
>> Any clues? Thanks,
>
> Are you able to narrow the problem down? What was your command  
> line, what species were you using? Does it work with the first 110  
> species you tried? Is there anything special about the 111th?
>
> Do I understand correctly that this was a problem during cleanup  
> only, and didn't affect the correctness and completeness of the  
> result?


From cjfields at uiuc.edu  Thu Dec  7 10:22:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 09:22:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110species
In-Reply-To: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine>

> Hi,
> 
> If you run the attached shell script you should be able to 
> reproduce the problem. It is not about any species in 
> particular, but about the total number of species: it crushes 
> with more than 120 species. The resulting tree is not 
> correct, I'm checking it further now. Thanks,
> 
> Gabriel

Gabriel, 

My guess is this may have to do with using an old taxonomy dump file.  I got
this to work on winXP using the latest NCBI taxonomy.  I had to modify
taxonomy2tree and your shell script to get it to play nice with Windows, but
I didn't get the error and I did get a tree (abbreviated for brevity):

(((((("Agrobacterium tumefaciens str. C58","Sinorhizobium
meliloti")Rhizobiaceae,...

chris


From cjfields at uiuc.edu  Thu Dec  7 13:44:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 12:44:32 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
References: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>


On Dec 6, 2006, at 9:13 PM, zhihua li wrote:

> Hi netters,
>
> Recently I found this:
>
> For constructing a new SeqI object, I had to write:
> $seq_obj=Bio::SeqIO->new(
>      -file => '/home/myfile',
>      -format => 'Fasta');              #Note the dash before the  
> two arguments.
>
> If I omitted the dash:
> $seq_obj=Bio::SeqIO->new(
>     file => '/home/myfile',
>     format => 'Fasta');
> I'd get error:
> MSG: Unknown format given or could not determine it []
> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>
> So it seems to me that the dashes before the arguments are  
> essential.  However, when I tried to build a factory for  
> StandaloneBlast, I found the other way around.
>
> If the script had the dash:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             -program => 'blastn',
>             -database => '/home/mydatabase');
>
> I'd get the error message: MSG: Unallowed parameter: - !
> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>
> If I left out the dash by saying:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             program => 'blastn',
>             database => '/home/mydatabase');
>
> Everyting is fine.
>
> Now I'm confused. Why sometimes I have to add the dash, while  
> sometimes I'm not allowed to?
>
> Thanks in advance!

I agree that this should be more consistent.  Does anyone know the  
reasoning for this?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Thu Dec  7 14:32:21 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Dec 2006 14:32:21 -0500
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
 constructor?
In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>
Message-ID: <C19DD675.BD72%bosborne11@verizon.net>

Chris,

The latest StandAloneBlast takes "dashed parameters", as in:

 @params = (-database => 'swissprot',-outfile => 'blast1.out');
 $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

Or

 my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp",
                                                     -database=>"swissprot",
                                                     -e => 1e-20);

So that's why I asked "what version?"

Someone made the change to allow dashes in @params a few months ago and I
believe that that someone was you!

Brian O.


On 12/7/06 1:44 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> 
> On Dec 6, 2006, at 9:13 PM, zhihua li wrote:
> 
>> Hi netters,
>> 
>> Recently I found this:
>> 
>> For constructing a new SeqI object, I had to write:
>> $seq_obj=Bio::SeqIO->new(
>>      -file => '/home/myfile',
>>      -format => 'Fasta');              #Note the dash before the
>> two arguments.
>> 
>> If I omitted the dash:
>> $seq_obj=Bio::SeqIO->new(
>>     file => '/home/myfile',
>>     format => 'Fasta');
>> I'd get error:
>> MSG: Unknown format given or could not determine it []
>> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>> 
>> So it seems to me that the dashes before the arguments are
>> essential.  However, when I tried to build a factory for
>> StandaloneBlast, I found the other way around.
>> 
>> If the script had the dash:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             -program => 'blastn',
>>             -database => '/home/mydatabase');
>> 
>> I'd get the error message: MSG: Unallowed parameter: - !
>> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
>> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>> 
>> If I left out the dash by saying:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             program => 'blastn',
>>             database => '/home/mydatabase');
>> 
>> Everyting is fine.
>> 
>> Now I'm confused. Why sometimes I have to add the dash, while
>> sometimes I'm not allowed to?
>> 
>> Thanks in advance!
> 
> I agree that this should be more consistent.  Does anyone know the
> reasoning for this?
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Dec  7 14:44:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 13:44:19 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <C19DD675.BD72%bosborne11@verizon.net>
References: <C19DD675.BD72%bosborne11@verizon.net>
Message-ID: <A12BC418-6400-46FC-8383-66E21D997E56@uiuc.edu>


On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote:

> Chris,
>
> The latest StandAloneBlast takes "dashed parameters", as in:
>
>  @params = (-database => 'swissprot',-outfile => 'blast1.out');
>  $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>
> Or
>
>  my $factory = Bio::Tools::Run::StandAloneBlast->new(-program  
> =>"wublastp",
>                                                      - 
> database=>"swissprot",
>                                                      -e => 1e-20);
>
> So that's why I asked "what version?"
>
> Someone made the change to allow dashes in @params a few months ago  
> and I
> believe that that someone was you!
>
> Brian O.

Nope, I plead innocent (at least to this!).  I haven't made any  
commits to StandAloneBlast.  These were added in by Torsten (see  
commits 1.59, 1.60), so you'll need to blame/thank him...

http://tinyurl.com/y7ym9g

So they're now a bit more consistent.  That's not to say  
StandAloneBlast doesn't need some major revisions....

BTW, I didn't see a post from you asking about the version.

Chris


From akarger at CGR.Harvard.edu  Thu Dec  7 16:32:51 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 7 Dec 2006 16:32:51 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>

I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.

I'm reading in some fungal GFFs generated by Jason Stajich. I

- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq

(Code below)

Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.

If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?

I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University

P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
    # Create a new object representing the exons' gene
    my $coding_loc_obj = new Bio::Location::Split;
    foreach my $exon (@sorted_exons) {
        $coding_loc_obj->add_sub_Location($exon->location);
    }

    # Build a spliced feature representing the whole gene
    my $spliced_feat = new Bio::SeqFeature::Generic(
        -start  => $coding_loc_obj->start,
        -end    => $coding_loc_obj->end,
        -strand => $strand_num,
        -primary=> "splicedGene",
    );
    $spliced_feat->location($coding_loc_obj);

    # Attach a contig object containing the sequence
    $spliced_feat->attach_seq($contig_obj->bioperl_object);

    # Get the spliced seq and translate to protein:
    my $coding_seq = $spliced_feat->spliced_seq->seq;
    my $protein = $spliced_feat->spliced_seq->translate->seq;


From bix at sendu.me.uk  Thu Dec  7 17:45:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 7 Dec 2006 15:45:32 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release
Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.
_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cjfields at uiuc.edu  Thu Dec  7 18:00:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 16:00:43 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From kaboroev at sfu.ca  Thu Dec  7 17:26:35 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Thu, 07 Dec 2006 14:26:35 -0800
Subject: [Bioperl-l] Bio::Graphics xyplot
Message-ID: <4578951B.5050206@sfu.ca>

Hi everyone,

I'm attempting to add an xyplot of the phred quality scores to an
Bio::Graphics image, and cannot get it to work.
I have the panel with a track for both the scale and the DNA displaying
properly.  When I attempt to add the xyplot i just get a garbled track
of, what looks like, timy xyplots for each datapoint.  I have the cvs
(updated today) of bioperl-live running.  I think what I am missing is
the creation of a "Sequence Feature Group" to hold the individual points
of the plot.  However, I cannot seem to find such an object. This is
what I attempted:

-------BEGIN---CODE-----------
# start panel
my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
                      -width     => $f_seqlen*10,
                      -pad_left  => 10,
                      -pad_right => 10,
                      -grid      => 1
                      );
# add scale
$panel->add_track(arrow =>
Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
              -double  => 1,
              -tick    => 2,
              -fgcolor => 'black');
# add DNA ($feature is of type Bio::SeqFeature::Annotated)
$panel->add_track(dna => $feature);
# get list of quality scores from database
my ($pqs_value) = $dbh->selectrow_array($sql);
my @pqs_value = split(/\s/,$pqs_value);
# create track
my $track =  $panel->add_track(-glyph        => 'xyplot',
                   -graph_type   => 'points',
                   -point_symbol => 'point',
                   -max_score    => 100,
                   -min_score    => 0,
                   -scale        => 'none');
# add "subfeatures" to
for (my $i=0;$i<$f_seqlen;$i++) {
   
$track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));

}
print $panel->png();
$panel->finished;
------END---CODE----------

I also attempted to create an array of the point features and passed
that by reference to the panel "add_track" as it describes in the xyplot
documentation, but that resulted in the exact same image.

keith

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From arareko at campus.iztacala.unam.mx  Thu Dec  7 18:15:53 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 7 Dec 2006 16:15:53 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cain at cshl.edu  Thu Dec  7 17:46:09 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 07 Dec 2006 17:46:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	a	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <1165531569.2569.49.camel@localhost.localdomain>

Amir,

I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:

  http://www.sequenceontology.org/gff3.shtml

(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).

Scott


On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
> 
> I'm reading in some fungal GFFs generated by Jason Stajich. I
> 
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
> 
> (Code below)
> 
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
> 
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
> 
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
> 
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
> 
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
> 
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
> 
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> 
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment-0002.bin>

From cjfields at uiuc.edu  Thu Dec  7 21:52:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 20:52:47 -0600
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain>
Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine>

Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented.  However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond).  It's one of the things I want to work out for the next release.

If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.

chris

> Amir,
> 
> I don't know for sure what the problem is, but here is one 
> possibility:
> the number in column 8 of a GFF file is not the frame, it is 
> the phase.
> See the GFF3 spec for a description of what the phase is:
> 
>   http://www.sequenceontology.org/gff3.shtml
> 
> (It doesn't matter if you are using GFF3 or GFF2, as the 
> phase is the same in both).
> 
> Scott
> 
> 
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features 
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be 
> > translated into a protein.
> > 
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > 
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's 
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above 
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> > 
> > (Code below)
> > 
> > Unfortunately, I get the wrong result when the GFF features 
> have frame 
> > != 0. This happens for only a few percent of the exons, but when it 
> > does, I end up translating in the wrong frame.
> > 
> > If I read the docs correctly, Location objects don't have a 
> frame. So 
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the 
> > beginning of certain exons?
> > 
> > I suspect the answer to this is that I'm going about this in 
> > completely the wrong way, in which case, please tell me how 
> I ought to be doing it.
> > 
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> > 
> > P.S. In case you want to see actual code, here it is. After using 
> > Bio::Tools::GFF to create a sorted list of features for each exon 
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> > 
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> > 
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > 
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;


From jason at bioperl.org  Thu Dec  7 21:01:33 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 7 Dec 2006 18:01:33 -0800
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org>

This was a problem in the gene prediction output I suspect, more  
recent versions of the program should have fixed this.  I do not  
currently have free time to deal with the errors in the small number  
of ORFs where this has happened.

I think you just need to do
  start -= start- (frame*strand)
for 1st exons.

You can also probably provide the 1st exon's frame to the translate  
function as another possibility but you should try and get the CDS  
correct first depending on your downstream analyses.

-jason
On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:

> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in  
> completely
> the wrong way, in which case, please tell me how I ought to be  
> doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Fri Dec  8 05:21:50 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 8 Dec 2006 15:51:50 +0530
Subject: [Bioperl-l] need help with phrap parser
Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>

Can anyone point me to a Phrap parser which parses the ace file to extract
what reads make up each contig (eg. read_a and read_b make contig1; read_d
read_e and read_z make contig2, and other information of the reads (like
whether the read is complemented or not with respect to the contig, what
region of the contig does each read contribute etc), basically the AF and BS
lines of the ACE output.

-- 
-Neeti
Even my blood says, B positive


From pmiguel at purdue.edu  Fri Dec  8 09:17:02 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 08 Dec 2006 09:17:02 -0500
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
Message-ID: <457973DE.6050900@purdue.edu>

neeti somaiya wrote:
> Can anyone point me to a Phrap parser which parses the ace file to extract
> what reads make up each contig (eg. read_a and read_b make contig1; read_d
> read_e and read_z make contig2, and other information of the reads (like
> whether the read is complemented or not with respect to the contig, what
> region of the contig does each read contribute etc), basically the AF and BS
> lines of the ACE output.
>
>   
neeti,

    To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. 
I write this because of an email sent me by David Gordon in 2001 included here 
without his permission:


> > Phrap writes BS lines which
> > indicate, for each consensus position, which read phrap uses at that
> > position to become the consensus.  These BS ("base segments") are 
> > manipulated by Consed when there are changes to the assembly, such as
> > joins, tears, removing reads, or changing the consensus.
>   
    The simplest way is:

egrep '^CO|AF|RD' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF|RD/);
}

But then you would need to parse the fields of interest. You get the 
position/strand in the contig from AF, then you get the length of the 
read from RD.

There does look like there is a part of bioperl that meant to perform 
this task--including Bio::Assembly::IO::ace but it looks like it was 
started, but never completed.


From cjfields at uiuc.edu  Fri Dec  8 10:17:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:17:31 -0600
Subject: [Bioperl-l] NAR Database Issue Papers
Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine>

For those interested, the Nucleic Acids Research Database issue papers have
been popping up in the Advance Access section of the NAR website:

http://nar.oxfordjournals.org/papbyrecent.dtl

Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are
represented.  Of particular note are a few mentions of formatting changes to
UniProt, EMBL, and other records, which should be taken care of in the
latest BioPerl release (fingers crossed!).

chris


From cjfields at uiuc.edu  Fri Dec  8 10:31:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:31:19 -0600
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <457973DE.6050900@purdue.edu>
Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine>

...
> But then you would need to parse the fields of interest. You get the 
> position/strand in the contig from AF, then you get the length of the 
> read from RD.
> 
> There does look like there is a part of bioperl that meant to perform 
> this task--including Bio::Assembly::IO::ace but it looks like it was 
> started, but never completed.

...and if anyone wants to chip in and work on it, let us know!   The various
Bio::Assembly modules are one of many areas that needs some updating.

chris


From akarger at CGR.Harvard.edu  Fri Dec  8 13:25:47 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:25:47 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>

> This was a problem in the gene prediction output I suspect, more  
> recent versions of the program should have fixed this.  I do not  
> currently have free time to deal with the errors in the small number  
> of ORFs where this has happened.
> 
> I think you just need to do
>   start -= start- (frame*strand)
> for 1st exons.

I used
    if (strand==1) {start += exon->frame}
    else {end -= exon->frame}

This took me from 90 translations that had * within the sequence to just
9, out of 5500 CDS in S bayanus.

> You can also probably provide the 1st exon's frame to the translate  
> function as another possibility but you should try and get the CDS  
> correct first depending on your downstream analyses.

Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase",
which I had never heard of before. My current, very limited,
understanding is that sometimes you'll have an exon with, say, 31 bp,
followed by an exon with 29 bp. When the intron gets spliced out, you
eventually get an mRNA of 60 bp, which translates to a protein of 20 aa.
But the second exon has a phase of 1, not 0, because you can't just
start translating at the first bp of the second exon and expect to get
nice amino acids.

By the way, whether or not phase is the same thing as frame, when I call
the frame() method on the features created by Bio::Tools::GFF, I get the
phase info. I assume that's a feature (no pun intended), not a bug?

I'm still confused as to why you would have a phase in the first exon,
though. Why not just say the CDS starts 1 or 2 bp later? (This is
probably a bio question, not a bioperl question, but a quick Google
didn't get me an answer. "Phase" isn't a very good search term.)

I guess the real question here, which Jason alludes to, is whether
SeqFeature->spliced_seq ought to take into account the phase information
of the first exon. Right now, it doesn't, so when you call
SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
where you would want spliced_seq to include the first bp or two? Should
there be an option to spliced_seq for whether you want to take phase
information into account?

I can't submit a bug report until we confirm it's a bug.

Thanks,
-Amir Karger

> -jason
> On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:
> 
> > I need to know how to get the frame information in exon features
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be
> > translated into a protein.
> >
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> >
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> >
> > (Code below)
> >
> > Unfortunately, I get the wrong result when the GFF features 
> have frame
> > != 0. This happens for only a few percent of the exons, but when it
> > does, I end up translating in the wrong frame.
> >
> > If I read the docs correctly, Location objects don't have a 
> frame. So
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the
> > beginning of certain exons?
> >
> > I suspect the answer to this is that I'm going about this in  
> > completely
> > the wrong way, in which case, please tell me how I ought to be  
> > doing it.
> >
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> >
> > P.S. In case you want to see actual code, here it is. After using
> > Bio::Tools::GFF to create a sorted list of features for each exon
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> >
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> >
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> >
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From akarger at CGR.Harvard.edu  Fri Dec  8 13:33:09 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:33:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>

> Another issue is the splittype() is not defined, though I 
> don't think that
> would kill anything as currently implemented.  However, one 
> thing we have
> passingly discussed is having Bio::Location::Split objects 
> possibly exhibit
> different (but expected) behaviors based upon the splittype() 
> (order, join,
> or bond).  It's one of the things I want to work out for the 
> next release.

Should I be writing -splittype => "JOIN" or some such in my new()?

-Amir Karger

> 
> chris
> 
> > Amir,
> > 
> > I don't know for sure what the problem is, but here is one 
> > possibility:
> > the number in column 8 of a GFF file is not the frame, it is 
> > the phase.
> > See the GFF3 spec for a description of what the phase is:
> > 
> >   http://www.sequenceontology.org/gff3.shtml
> > 
> > (It doesn't matter if you are using GFF3 or GFF2, as the 
> > phase is the same in both).
> > 
> > Scott
> > 
> > 
> > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > > I need to know how to get the frame information in exon features 
> > > (created by Bio::Tools::GFF) into a whole-gene feature 
> that will be 
> > > translated into a protein.
> > > 
> > > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > > 
> > > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > > - Create a Bio::Location::Split object containing each feature's 
> > > location
> > > - Create a Bio::SeqFeature::Generic object whose location 
> > is the above 
> > > BL::Split
> > > - Attach my contig Bio::Seq to the feature
> > > - get the protein with feature->spliced_seq->translate->seq
> > > 
> > > (Code below)
> > > 
> > > Unfortunately, I get the wrong result when the GFF features 
> > have frame 
> > > != 0. This happens for only a few percent of the exons, 
> but when it 
> > > does, I end up translating in the wrong frame.
> > > 
> > > If I read the docs correctly, Location objects don't have a 
> > frame. So 
> > > how do I get the correct spliced_seq, which skips one or 
> > two bp at the 
> > > beginning of certain exons?
> > > 
> > > I suspect the answer to this is that I'm going about this in 
> > > completely the wrong way, in which case, please tell me how 
> > I ought to be doing it.
> > > 
> > > Thanks,
> > > - Amir Karger
> > > Research Computing
> > > Life Sciences Division
> > > Harvard University
> > > 
> > > P.S. In case you want to see actual code, here it is. After using 
> > > Bio::Tools::GFF to create a sorted list of features for each exon 
> > > (basically stolen from the module POD), I:
> > >     # Create a new object representing the exons' gene
> > >     my $coding_loc_obj = new Bio::Location::Split;
> > >     foreach my $exon (@sorted_exons) {
> > >         $coding_loc_obj->add_sub_Location($exon->location);
> > >     }
> > > 
> > >     # Build a spliced feature representing the whole gene
> > >     my $spliced_feat = new Bio::SeqFeature::Generic(
> > >         -start  => $coding_loc_obj->start,
> > >         -end    => $coding_loc_obj->end,
> > >         -strand => $strand_num,
> > >         -primary=> "splicedGene",
> > >     );
> > >     $spliced_feat->location($coding_loc_obj);
> > > 
> > >     # Attach a contig object containing the sequence
> > >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > > 
> > >     # Get the spliced seq and translate to protein:
> > >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> > >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> 
> 
> 


From cjfields at uiuc.edu  Fri Dec  8 14:04:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 13:04:55 -0600
Subject: [Bioperl-l] Using frame info from GFF
	ingettinga	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>
Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine>


> > Another issue is the splittype() is not defined, though I 
> don't think 
> > that would kill anything as currently implemented.  
> However, one thing 
> > we have passingly discussed is having Bio::Location::Split objects 
> > possibly exhibit different (but expected) behaviors based upon the 
> > splittype() (order, join, or bond).  It's one of the things 
> I want to 
> > work out for the next release.
> 
> Should I be writing -splittype => "JOIN" or some such in my new()?
> 
> -Amir Karger

I missed the fact that 'JOIN' is the default splittype() from looking at the
constructor in Location::Split, so you actually don't have to explicitly set
it; apologies for that.  

If we make any changes that affect how Location::Split behaves we'll likely
leave the default splittype() as 'JOIN' as it's by far the most common join
operator.  

chris


From cjfields at uiuc.edu  Fri Dec  8 15:03:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 14:03:16 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>
Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine>

> Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> "phase", which I had never heard of before. My current, very 
> limited, understanding is that sometimes you'll have an exon 
> with, say, 31 bp, followed by an exon with 29 bp. When the 
> intron gets spliced out, you eventually get an mRNA of 60 bp, 
> which translates to a protein of 20 aa.
> But the second exon has a phase of 1, not 0, because you 
> can't just start translating at the first bp of the second 
> exon and expect to get nice amino acids.

I think the use of 'frame' here is meant relative to the DNA sequence (i.e.
ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
translation, three frames).  At least I think that's what is meant!

> By the way, whether or not phase is the same thing as frame, 
> when I call the frame() method on the features created by 
> Bio::Tools::GFF, I get the phase info. I assume that's a 
> feature (no pun intended), not a bug?
> 
> I'm still confused as to why you would have a phase in the 
> first exon, though. Why not just say the CDS starts 1 or 2 bp 
> later? (This is probably a bio question, not a bioperl 
> question, but a quick Google didn't get me an answer. "Phase" 
> isn't a very good search term.)

It could be b/c the location coordinates delineate the exon coding boundary.
It's conceivable the first exon in a sequence record is not the first exon
of the mRNA (i.e. there may be one or more exons prior to or past the exon
of interest that are in 'remote' sequence records).  Like this admittedly
extreme example (GB acc AF130134):

join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174,
AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595,
AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115,
AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428,
AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401,
AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128)

Also, the ends of the lcoation may be uncertain ('fuzzy'):

join(complement(1009..>1260),complement(AF081827.1:<1..177))

> I guess the real question here, which Jason alludes to, is whether
> SeqFeature->spliced_seq ought to take into account the phase 
> information
> of the first exon. Right now, it doesn't, so when you call
> SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
> where you would want spliced_seq to include the first bp or 
> two? Should there be an option to spliced_seq for whether you 
> want to take phase information into account?
> 
> I can't submit a bug report until we confirm it's a bug.
> 
> Thanks,
> -Amir Karger

You can already pass the frame or an offset to PrimarySeqI::translate().
Here are the args:

 Args    : -terminator    - character for terminator        default is *
           -unknown       - character for unknown           default is X
           -frame         - frame                           default is 0
           -codontable_id - codon table id                  default is 1
           -complete      - complete CDS expected           default is 0
           -throw         - throw exception if not complete default is 0
           -orf           - find 1st ORF                    default is 0
           -start         - alternative initiation codon
           -codontable    - Bio::Tools::CodonTable object
           -offset        - offset for fuzzy locations      default is 0

The offset comes from some GenBank seqfeatures which have an '\codon_start'
tag indicating which nucleotide to start translation from (1,2,3).  This is
essentially just the phase+1.  We could add a '-phase' argument for
convenience which accepts 0,1,2.

chris


From bobfreemanma at speakeasy.net  Fri Dec  8 15:47:15 2006
From: bobfreemanma at speakeasy.net (Bob Freeman)
Date: Fri, 8 Dec 2006 15:47:15 -0500
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
Message-ID: <p0623090bc19f7f46bd1d@[10.0.107.251]>

Can't seem to find a good post on this to answer my question:

Does anyone know a good way to (re)write BLAST reports in XML format? 
I've got about 30,000 reports I need to rewrite for a (good!) piece 
of java software that will only import xml formatted BLAST reports. 
Right now, all mine are plain text.

I don't think bioperl can do this yet, correct? If not, any 
suggestions, besides reblasting all 30,000? I'd like to save a few 
trees and lumps of coal.

TIA,
Bob

-- 

-----------------------------------------------------
Bob Freeman, Ph.D.
Bioinformatics consultant
51 Downer Avenue, #2
Dorchester, MA  02125
617/699.7057, vox

If brains were taxed, he'd get a refund.
-- Anonymous


From camp_boot at hotmail.com  Sun Dec 10 05:00:55 2006
From: camp_boot at hotmail.com (synapse)
Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC)
Subject: [Bioperl-l] Driver program for PestFind.pm
Message-ID: <loom.20061210T105614-429@post.gmane.org>

   Dear All, 

   I apologize in advance for my almost total lack of knowledge of perl as a 
programming language. 

   I need to use PestFind program, part of the biop_run package of bioperl. My 
understanding is that I will need a simple wrapper program that will read 
arguments from the command line, and pass them to that module. 

   - Is there such program available that I can just use?

   - Does anyone know if pestfind can work on multiple sequence files (in fasta 
format), or does it only process single sequence files?

   Thanks a lot for the feedback. 


From cjfields at uiuc.edu  Sun Dec 10 13:45:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:45:26 -0600
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <p0623090bc19f7f46bd1d@[10.0.107.251]>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
	<p0623090bc19f7f46bd1d@[10.0.107.251]>
Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu>


On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote:

> Can't seem to find a good post on this to answer my question:
>
> Does anyone know a good way to (re)write BLAST reports in XML format?
> I've got about 30,000 reports I need to rewrite for a (good!) piece
> of java software that will only import xml formatted BLAST reports.
> Right now, all mine are plain text.
>
> I don't think bioperl can do this yet, correct? If not, any
> suggestions, besides reblasting all 30,000? I'd like to save a few
> trees and lumps of coal.
>
> TIA,
> Bob

The only BioPerl writers for BLAST reports are in BSML and HTML, not  
BLAST XML.  I don't think there there have been any requests for it,  
and no one has really stepped forward to submit one.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 10 13:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:55:16 -0600
Subject: [Bioperl-l] Driver program for PestFind.pm
In-Reply-To: <loom.20061210T105614-429@post.gmane.org>
References: <loom.20061210T105614-429@post.gmane.org>
Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu>


On Dec 10, 2006, at 4:00 AM, synapse wrote:

>    Dear All,
>
>    I apologize in advance for my almost total lack of knowledge of  
> perl as a
> programming language.
>
>    I need to use PestFind program, part of the biop_run package of  
> bioperl. My
> understanding is that I will need a simple wrapper program that  
> will read
> arguments from the command line, and pass them to that module.

PestFind is part of the EMBOSS suite of programs:

http://emboss.sourceforge.net/

The PestFind module in bioperl-run is actually used via Pise.

>    - Is there such program available that I can just use?

See above

>    - Does anyone know if pestfind can work on multiple sequence  
> files (in fasta
> format), or does it only process single sequence files?
>
>    Thanks a lot for the feedback.

No idea there, but the EMBOSS docs should tell you.

chris


From cjfields at uiuc.edu  Mon Dec 11 00:38:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 23:38:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>

I am writing up a few bioperl-run modules and have a simple question,  
though I don't know if anyone knows the answer.  I was curious as to  
why parameters for most (all?) bioperl-run modules lack the '-'  
preceding them.  This came up re: StandAloneBlast last week  
(something Torsten fixed), but I noticed just about every bioperl-run  
module uses the dashless parameters.

chris


From n.haigh at sheffield.ac.uk  Mon Dec 11 01:44:25 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Mon, 11 Dec 2006 06:44:25 +0000
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457CFE49.5010201@sheffield.ac.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

No idea!

Is there any reason for/against using dashed/dashless parameters? I
suppose dshed parameters allow you to easy see which tokens on the
command line are parameters and which are values. Should modules be able
to accept both? Should dashed be preferred?

Nath


From cjfields at uiuc.edu  Mon Dec 11 08:06:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 07:06:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457CFE49.5010201@sheffield.ac.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457CFE49.5010201@sheffield.ac.uk>
Message-ID: <D223B6BF-7C0C-41BF-B267-8C07F82FDD7D@uiuc.edu>


On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple question,
>> though I don't know if anyone knows the answer.  I was curious as to
>> why parameters for most (all?) bioperl-run modules lack the '-'
>> preceding them.  This came up re: StandAloneBlast last week
>> (something Torsten fixed), but I noticed just about every bioperl-run
>> module uses the dashless parameters.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> No idea!
>
> Is there any reason for/against using dashed/dashless parameters? I
> suppose dshed parameters allow you to easy see which tokens on the
> command line are parameters and which are values. Should modules be  
> able
> to accept both? Should dashed be preferred?
>
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

I'm thinking about it from the point of consistency.  When using a  
mix of core and run modules it can be a bit confusing, particularly  
when (as pointed out in the previous thread on StandAloneBlast) you  
can use only dashed parameters with core modules, while most (all?)  
run modules only accept dashless ones (in most cases some exception  
is thrown).  Torsten fixed this in StandAloneBlast so it accepts  
both, but shouldn't this rule also apply to all run modules?

Much of this probably is probably due to the donated nature of much  
of the bioperl-run code and Jason's 'cat-herding', and I understand  
that it would be a lot of work to change this for all run modules.   
However, we could at least try to start enforcing some loose rules  
with new bioperl-run wrappers (e.g. implement WrapperBase, use core- 
like parameters, etc).

chris


From akarger at CGR.Harvard.edu  Mon Dec 11 11:20:03 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 11 Dec 2006 11:20:03 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>

Chris Fields wrote:
> 
> > Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> > "phase", which I had never heard of before. My current, very 
> > limited, understanding is that sometimes you'll have an exon 
> > with, say, 31 bp, followed by an exon with 29 bp. When the 
> > intron gets spliced out, you eventually get an mRNA of 60 bp, 
> > which translates to a protein of 20 aa.
> > But the second exon has a phase of 1, not 0, because you 
> > can't just start translating at the first bp of the second 
> > exon and expect to get nice amino acids.
> 
> I think the use of 'frame' here is meant relative to the DNA 
> sequence (i.e.
> ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
> translation, three frames).  At least I think that's what is meant!

I agree. By the way, I'd love a reference to a simple bio-explanation of
what's happening here. Google searches for "coding sequence phase" are
not all that relevant.

> > I'm still confused as to why you would have a phase in the 
> > first exon, though. Why not just say the CDS starts 1 or 2 bp 
> > later? (This is probably a bio question, not a bioperl 
> > question, but a quick Google didn't get me an answer. "Phase" 
> > isn't a very good search term.)
> 
> It could be b/c the location coordinates delineate the exon 
> coding boundary.
> It's conceivable the first exon in a sequence record is not 
> the first exon
> of the mRNA (i.e. there may be one or more exons prior to or 
> past the exon
> of interest that are in 'remote' sequence records).

That's certainly not the case here, because the files have the entire
genomes in them.

> Also, the ends of the lcoation may be uncertain ('fuzzy'):
> 
> join(complement(1009..>1260),complement(AF081827.1:<1..177))

Also not the case here. These locations aren't listed as fuzzy.

Any other thoughts?

> > I guess the real question here, which Jason alludes to, is whether
> > SeqFeature->spliced_seq ought to take into account the phase 
> > information
> > of the first exon. Right now, it doesn't, so when you call
> > SeqFeature->spliced_seq->translate, you get gibberish. Are 
> there cases
> > where you would want spliced_seq to include the first bp or 
> > two? Should there be an option to spliced_seq for whether you 
> > want to take phase information into account?
> 
> You can already pass the frame or an offset to 
> PrimarySeqI::translate().
>  We could add a '-phase' argument for
> convenience which accepts 0,1,2.

But as Jason pointed out, you should find the problem earlier. What if I
want to get the RNA sequence that will become the protein? then having a
phase arg to translate() doesn't help. Should there be a phase arg to
spliced_seq?

Which raises another bio question: at what point are the first 1 or 2 bp
dropped when you have a phase of 1 or 2? Do they appear in the mRNA? 

-Amir Karger


From bix at sendu.me.uk  Mon Dec 11 13:21:42 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 13:21:42 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457DA1B6.1060706@sendu.me.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.

I didn't follow that particular thread, but from my experience there is 
a useful distinction between bioperl options using the - as normal for 
full consistency with core (eg. -verbose), whilst the options that 
belong to the program the run module is a wrapper for do not take 
dashes. Again, this seems consistent within the run package.

I'd suggest sticking to the current pattern.


Cheers,
Sendu.


From cjfields at uiuc.edu  Mon Dec 11 15:07:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 14:07:16 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DA1B6.1060706@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
Message-ID: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>


On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple  
>> question,  though I don't know if anyone knows the answer.  I was  
>> curious as to  why parameters for most (all?) bioperl-run modules  
>> lack the '-'  preceding them.  This came up re: StandAloneBlast  
>> last week  (something Torsten fixed), but I noticed just about  
>> every bioperl-run  module uses the dashless parameters.
>
> I didn't follow that particular thread, but from my experience  
> there is a useful distinction between bioperl options using the -  
> as normal for full consistency with core (eg. -verbose), whilst the  
> options that belong to the program the run module is a wrapper for  
> do not take dashes. Again, this seems consistent within the run  
> package.

I respectfully disagree that this is a 'useful' distinction.  My main  
point is consistency.  To me, it's counterintuitive to have two  
Bioperl classes, both which inherit Bio::Root::Root, use two  
different syntaxes for any parameters passed to the constructor, even  
if some are 'program' parameters.  It's also not consistent with  
StandAloneBlast or RemoteBlast, both which are considered bioperl-run  
modules even though they are in core, and both or which use dashed  
parameters (StandAloneBlast actually allows both).  In fact, it isn't  
consistent within bioperl-run itself.   
Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a  
hashref!

Okay, judging by the previous examples, 'consistency' isn't a word I  
would use to describe bioperl-run as a whole (back to Jason's 'cat- 
herding' analogy).  It would be easier to let it slide for now,  
especially since changing them would be a serious pain, not to  
mention an API issue.  But shouldn't there be some consistency?

And what about new modules?  Do we follow the historical (possibly  
confusing) 'dashless' route, or use the core-like dashed approach  
(thus breaking from the other run modules)?

> I'd suggest sticking to the current pattern.
>
>
> Cheers,
> Sendu.

I'll allow for both, ala StandAloneBlast.  Doesn't hurt to be safe. ; >

Have fun at the hackathon!

chris


From bix at sendu.me.uk  Mon Dec 11 16:19:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 16:19:55 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
Message-ID: <457DCB7B.8050500@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I am writing up a few bioperl-run modules and have a simple 
>>> question,  though I don't know if anyone knows the answer.  I was 
>>> curious as to  why parameters for most (all?) bioperl-run modules 
>>> lack the '-'  preceding them.  This came up re: StandAloneBlast last 
>>> week  (something Torsten fixed), but I noticed just about every 
>>> bioperl-run  module uses the dashless parameters.
>>
>> I didn't follow that particular thread, but from my experience there 
>> is a useful distinction between bioperl options using the - as normal 
>> for full consistency with core (eg. -verbose), whilst the options that 
>> belong to the program the run module is a wrapper for do not take 
>> dashes. Again, this seems consistent within the run package.
> 
> I respectfully disagree that this is a 'useful' distinction.  My main 
> point is consistency.
[snip]

We're on the same page in terms of what we think would be a Good Thing, 
and allowing both ways (dashed and dashless) sounds reasonable. I was 
just suggesting why bioperl-run might be the way it was. Further to 
that, there is the practical aspect that it is a lot simpler to figure 
out which are the program options so they can be farmed out to the 
AUTOLOAD methods - again something that isn't done in core.

If you come up with some generic way of dealing with options and farming 
to AUTOLOAD, perhaps there's scope for applying it to all the run 
wrappers (ideally via one of their base classes), so they all instantly 
gain dashed-mode capability.


From cjfields at uiuc.edu  Mon Dec 11 17:05:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 16:05:56 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DCB7B.8050500@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
	<457DCB7B.8050500@sendu.me.uk>
Message-ID: <F046DB23-35C7-414A-8616-46D3C5760B49@uiuc.edu>


On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote:
...

>>
>> I respectfully disagree that this is a 'useful' distinction.  My main
>> point is consistency.
> [snip]
>
> We're on the same page in terms of what we think would be a Good  
> Thing,
> and allowing both ways (dashed and dashless) sounds reasonable. I was
> just suggesting why bioperl-run might be the way it was. Further to
> that, there is the practical aspect that it is a lot simpler to figure
> out which are the program options so they can be farmed out to the
> AUTOLOAD methods - again something that isn't done in core.

Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly  
code maintenance.  I'm somewhat neutral on the idea of using AUTOLOAD  
as a short-term solution, though using heredoc and an eval{} block  
works well for me (and shows up when using $self->can('method') or  
when checking for methods via Class::Inspector).

> If you come up with some generic way of dealing with options and  
> farming
> to AUTOLOAD, perhaps there's scope for applying it to all the run
> wrappers (ideally via one of their base classes), so they all  
> instantly
> gain dashed-mode capability.

I think that's the crux of the problem; they do not all have the same  
base class (except Bio::Root::Root).  Most use WrapperBase.  I  
thought at one point a Run-specific root module would be a good idea,  
but WrapperBase already works well.

I'll go ahead with my modules and think about it some more.  You  
could ask the powers-that-be (jason, hilmar, etc) what they think as  
well.

chris


From bosborne11 at verizon.net  Mon Dec 11 17:24:54 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 11 Dec 2006 17:24:54 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <C1A344E6.BE53%bosborne11@verizon.net>

Amir,

Google "intron phase", you will see a number of useful links.

Brian O.


On 12/11/06 11:20 AM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:

> I agree. By the way, I'd love a reference to a simple bio-explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.


From cjfields at uiuc.edu  Mon Dec 11 22:20:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 21:20:06 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <E6F0CA09-EF9F-42AF-BF67-35E4FDBCAD8C@uiuc.edu>


On Dec 11, 2006, at 10:20 AM, Amir Karger wrote:

>> I think the use of 'frame' here is meant relative to the DNA
>> sequence (i.e.
>> ORF searching, 6 frames) and the 'phase' is relative to the mRNA  
>> (i.e.
>> translation, three frames).  At least I think that's what is meant!
>
> I agree. By the way, I'd love a reference to a simple bio- 
> explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.

Ah, Brian found some links I see...

>> It could be b/c the location coordinates delineate the exon
>> coding boundary.
>> It's conceivable the first exon in a sequence record is not
>> the first exon
>> of the mRNA (i.e. there may be one or more exons prior to or
>> past the exon
>> of interest that are in 'remote' sequence records).
>
> That's certainly not the case here, because the files have the entire
> genomes in them.
>
>> Also, the ends of the lcoation may be uncertain ('fuzzy'):
>>
>> join(complement(1009..>1260),complement(AF081827.1:<1..177))
>
> Also not the case here. These locations aren't listed as fuzzy.
>
> Any other thoughts?

Which GFF files did you use?  More specifically, which genes in which  
GFF file?  I saw a reference to S. bayanus, but it's hard to work out  
what could be the problem unless we know a bit more.

>>> I guess the real question here, which Jason alludes to, is whether
>>> SeqFeature->spliced_seq ought to take into account the phase
>>> information
>>> of the first exon. Right now, it doesn't, so when you call
>>> SeqFeature->spliced_seq->translate, you get gibberish. Are
>> there cases
>>> where you would want spliced_seq to include the first bp or
>>> two? Should there be an option to spliced_seq for whether you
>>> want to take phase information into account?
>>
>> You can already pass the frame or an offset to
>> PrimarySeqI::translate().
>>  We could add a '-phase' argument for
>> convenience which accepts 0,1,2.
>
> But as Jason pointed out, you should find the problem earlier. What  
> if I
> want to get the RNA sequence that will become the protein? then  
> having a
> phase arg to translate() doesn't help. Should there be a phase arg to
> spliced_seq?

You'll also note Jason mentioned there were possible errors in the  
gene prediction programs which produced the output

spliced_seq() is supposed to return the DNA sequence of a split  
location by splicing together the sublocation sequences in their  
'join' order.  So, if the first exon was out of phase, once spliced  
they should all be out of phase to the same degree, assuming all  
exons are joined together correctly.   Translating this using the  
phase should produce the correct amino acid sequence.

Note that Jason suggested passing the frame/phase of the first exon  
to translate(), not spliced_seq().  I also suggested translate().

> Which raises another bio question: at what point are the first 1 or  
> 2 bp
> dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
>
> -Amir Karger

Any sequence present in the sublocations (exons) would be in the  
spliced sequence.  This would have to include those nucleotides in  
exons skipped b/c of the phase since they are part of the coding region.

chris


From neetisomaiya at gmail.com  Tue Dec 12 07:06:20 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:36:20 +0530
Subject: [Bioperl-l] need help in phredPhrap
Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com>

Hi,

I am running phredPharp, which runs phred, phrap and polyphred. Please refer
to the "Using a reference sequence" section of this link
http://droog.mbt.washington.edu/poly_doc50.html#REFER.
I am using the reference sequence as described in the link above.
With this I am getting the SNP positions on the contig sequence as well as
on the reference sequence.
Does anyone know if there is some output file which can also give me mapping
between contig sequence and reference sequence?
-- 
-Neeti
Even my blood says, B positive


From akarger at CGR.Harvard.edu  Tue Dec 12 11:05:43 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 12 Dec 2006 11:05:43 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>

(sorry if this thread is boring people)

Chris Fields wrote: 

> > I agree. By the way, I'd love a reference to a simple bio- 
> > explanation of
> > what's happening here. Google searches for "coding sequence 
> phase" are
> > not all that relevant.
> 
> Ah, Brian found some links I see...

Thanks, Brian! Amazing how "coding sequence phase" finds nothing but
"intron phase" finds a ton. This is why you need to actually learn
biology, rather than Googling it.

> Which GFF files did you use?  More specifically, which genes 
> in which  
> GFF file?  I saw a reference to S. bayanus, but it's hard to 
> work out  
> what could be the problem unless we know a bit more.

http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus
.20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)

c127 (for example) has two lines in that file:
sbay_c127       AUGUSTUS        mRNA    263     723     .       +
.       ID=sbay_c127-g1.1
sbay_c127       AUGUSTUS        CDS     263     723     .       +
1       Parent=sbay_c127-g1.1

Now go to gbrowse page:
http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
Type "sbay_c127:250-300" in the search box. 

As you can see from the translation track, if you start at bp 263, you
hit a stop codon after just a few aas. But if you use frame2/phase 1,
you get no stop codons all the way to the end of the contig.

> >> You can already pass the frame or an offset to
> >> PrimarySeqI::translate().
> >>  We could add a '-phase' argument for
> >> convenience which accepts 0,1,2.
> >
> >  What if I
> > want to get the RNA sequence that will become the protein? then  
> > having a
> > phase arg to translate() doesn't help. Should there be a 
> phase arg to
> > spliced_seq?
> 
> You'll also note Jason mentioned there were possible errors in the  
> gene prediction programs which produced the output

That's certainly possible. No gene prediction program will be perfect.
In this case, though, it's clear that it found a large region without
stop codons in it, and correctly identified the place to start
translating. I guess I'm just surprised that, if it found just one exon
in a gene (in the whole contig) why it would say the exon starts at 263
with a phase 1, instead of just saying it starts at 264.

> spliced_seq() is supposed to return the DNA sequence of a split  
> location by splicing together the sublocation sequences in their  
> 'join' order.  So, if the first exon was out of phase, once spliced  
> they should all be out of phase to the same degree, assuming all  
> exons are joined together correctly.   Translating this using the  
> phase should produce the correct amino acid sequence.
> 
> Note that Jason suggested passing the frame/phase of the first exon  
> to translate(), not spliced_seq().  I also suggested translate().

You're right. This brings the number of translated polypeptide sequences
that have lots of *s in them to 9 instead of 90. 

I guess I have two requests here. The first is, if a person wants to see
exactly which bps are translated to aas -- a nucelotide sequece of
exactly 3N bp starting (usually) with ATG -- then they might want an
argument to spliced_seq that skips the first one or two bp when
necessary. After all, they might want to study the DNA, not the
peptides.

The second request is for "intelligent objects". If my SeqFeatures know
that they're in phase 1, then when I call spliced_seq I want the
resulting objects to know that they're phase one, such that when I call
translate, Bioperl automatically skips the first bp or two. Admittedly,
there might be big ramifications to this.

Both requests of course made in the knowledge that Bioperl is open
source & developers have a lot to do with their time.

-Amir Karger

> > Which raises another bio question: at what point are the 
> first 1 or  
> > 2 bp
> > dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
> >
> > -Amir Karger
> 
> Any sequence present in the sublocations (exons) would be in the  
> spliced sequence.  This would have to include those nucleotides in  
> exons skipped b/c of the phase since they are part of the 
> coding region.
> 
> chris
> 


From neetisomaiya at gmail.com  Tue Dec 12 07:14:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:44:10 +0530
Subject: [Bioperl-l] needle parser in bioperl?
Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>

Hi,

Does anyone know of a bioperl parser for needle output, basically I won't
where the target sequence aligns on the template (i.e. coordinate on the
template where the taget aligns).

-- 
-Neeti
Even my blood says, B positive


From cjfields at uiuc.edu  Tue Dec 12 11:57:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 10:57:27 -0600
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>


On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

> Hi,
>
> Does anyone know of a bioperl parser for needle output, basically I  
> won't
> where the target sequence aligns on the template (i.e. coordinate  
> on the
> template where the taget aligns).
>
> -- 
> -Neeti
> Even my blood says, B positive

I answered this a number of months back:

http://tinyurl.com/yzlbx5

Basically, newer versions of EMBOSS have changed the output for the  
AlignIO::emboss parser (which parses needle).  I don't believe the  
parser has been fixed to deal with that, but Jason has pointed out  
you can use MSF output when running needle, then parse using AlignIO  
with the format set to 'msf'.

chris


From bosborne11 at verizon.net  Tue Dec 12 11:51:05 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 12 Dec 2006 11:51:05 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C1A44829.BE76%bosborne11@verizon.net>

Neeti,

EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss'
format, so you can use AlignIO to get SimpleAlign objects. The best
description of how to use SimpleAlign is the documentation in the module.

Brian O.


On 12/12/06 7:14 AM, "neeti somaiya" <neetisomaiya at gmail.com> wrote:

> Hi,
> 
> Does anyone know of a bioperl parser for needle output, basically I won't
> where the target sequence aligns on the template (i.e. coordinate on the
> template where the taget aligns).


From kaboroev at sfu.ca  Tue Dec 12 12:14:39 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Tue, 12 Dec 2006 09:14:39 -0800
Subject: [Bioperl-l] BLAST reports
Message-ID: <457EE37F.4020000@sfu.ca>

Hi everyone,

I would like to manipulate my blast results with bioperl but would also
like to have the html output of the blast.  What would be the best way
of going about this, as I don't see any write functions in any of the
blast modules I have looked at.  Would it be better to create my own
html layout from the blast data then attempt to recover this from bioperl?

keith

p.s. - does anyone know what the most informative blast "alignment view"
output is? xml i suppose?

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From cjfields at uiuc.edu  Tue Dec 12 13:45:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 12:45:05 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <E073C68D-F5FD-4C48-A3E4-925B696E956A@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:
...

> http://fungal.genome.duke.edu/annotations/sbay/gff/ 
> saccharomyces_bayanus
> .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)
>
> c127 (for example) has two lines in that file:
> sbay_c127       AUGUSTUS        mRNA    263     723     .       +
> .       ID=sbay_c127-g1.1
> sbay_c127       AUGUSTUS        CDS     263     723     .       +
> 1       Parent=sbay_c127-g1.1
>
> Now go to gbrowse page:
> http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
> Type "sbay_c127:250-300" in the search box.
>
> As you can see from the translation track, if you start at bp 263, you
> hit a stop codon after just a few aas. But if you use frame2/phase 1,
> you get no stop codons all the way to the end of the contig.

Yes, but there are two things.  First, there is no distinct start  
codon.  Second, this is what the top NCBI BLASTX hit for that  
particular exon is:

 >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of  
the exocyst complex (Sec3p, Sec5p,
Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has
the essential function of mediating polarized targeting of
secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces
cerevisiae]
  gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex  
component SEC10
  gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product
  gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex  
component [Saccharomyces cerevisiae]
Length=871

  Score =  285 bits (728),  Expect = 7e-77
  Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152  
(0%)
  Frame = +2

Query  2     
FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY  181
             +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL 
+IEKY
Sbjct  168   
YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY  227

Query  182   
SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  361
             SEMMEN 
+LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE
Sbjct  228   
SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  287

Query  362  NEFENVFIKNVKFKERLVDFESHSVIVEASMQ  457
             NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ
Sbjct  288  NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ  319


Note the query start is well into the predicted coding sequence.   
Both the lack of a start codon and the above BLASTX hit suggest this  
is not actually the first exon in the coding region.  Therefore the  
sequence retrieved from spliced_seq() is only part of the full coding  
region (it seems to lack at least one 3' exon as well).

>>>> You can already pass the frame or an offset to
>>>> PrimarySeqI::translate().
>>>>  We could add a '-phase' argument for
>>>> convenience which accepts 0,1,2.
>>>
>>>  What if I
>>> want to get the RNA sequence that will become the protein? then
>>> having a
>>> phase arg to translate() doesn't help. Should there be a
>> phase arg to
>>> spliced_seq?
>>
>> You'll also note Jason mentioned there were possible errors in the
>> gene prediction programs which produced the output
>
> That's certainly possible. No gene prediction program will be perfect.
> In this case, though, it's clear that it found a large region without
> stop codons in it, and correctly identified the place to start
> translating. I guess I'm just surprised that, if it found just one  
> exon
> in a gene (in the whole contig) why it would say the exon starts at  
> 263
> with a phase 1, instead of just saying it starts at 264.

Maybe the gene prediction didn't find the first exon, or didn't tie  
the predicted exons together.  Not unusual considering the number of  
predictions made.

>> spliced_seq() is supposed to return the DNA sequence of a split
>> location by splicing together the sublocation sequences in their
>> 'join' order.  So, if the first exon was out of phase, once spliced
>> they should all be out of phase to the same degree, assuming all
>> exons are joined together correctly.   Translating this using the
>> phase should produce the correct amino acid sequence.
>>
>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger

You may want to post these as enhancement requests to Bugzilla just  
so we can keep track.  I think passing a phase parameter to  
spliced_seq() can be easily accomplished; it's just a matter of  
returning a subseq of the spliced sequence based on the phase if  
set.  In fact, I am testing it out now.

The second may be more problematic, since there may be a time when  
one would want those extra nucleotides, so I don't think we would  
want removal of said nucleotides to be the default behavior.

Chris


From dmessina at wustl.edu  Tue Dec 12 13:44:29 2006
From: dmessina at wustl.edu (David Messina)
Date: Tue, 12 Dec 2006 12:44:29 -0600
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
References: <457EE37F.4020000@sfu.ca>
Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu>

Hi Keith,

Take a look at:
http://www.bioperl.org/wiki/HOWTO:SearchIO

You can read in a whole bunch of different blast formats (see Table  
1), and it is possible to write out in HTML. See:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output


I'm not sure what you mean by the most informative blast output. If  
you mean which one gives the most information, I'm pretty sure the  
standard Blast report has everything.


Dave


From neetisomaiya at gmail.com  Tue Dec 12 07:09:39 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:39:39 +0530
Subject: [Bioperl-l] problem in running needle
Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>

I am trying to run needle for the attached two sequence files, on a linux
machine. It says "Uncaught exception:  Assertion failed, raised at ajmem.c
:187".
Can anyone tell me what this could be coz of?

-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SEQ_1.REF
Type: application/octet-stream
Size: 44208 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq_of_contig11
Type: application/octet-stream
Size: 44344 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0005.obj>

From cjfields at uiuc.edu  Tue Dec 12 15:55:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 14:55:07 -0600
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <E5BB270E-46D1-4A8C-A268-938FF8235B67@uiuc.edu>


On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

This would be an EMBOSS error, not a BioPerl error.  Maybe the emboss  
list is the best place for this question?

http://emboss.open-bio.org/mailman/listinfo/emboss

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec 12 16:30:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 15:30:30 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:

>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger
...

Amir,

I committed some code to CVS where I added a -phase parameter option  
to SeqFeatureI::spliced_seq().  I also added some tests to SeqFeature.t.

If you run the following after creating the SeqFeature object $sf  
(the seq object is $seq):

$sf->attach_seq($seq);

for my $phase (-1..3) {
     my $spliced = $sf->spliced_seq(-phase => $phase);
     print $spliced->seq,"\n";
     print $spliced->translate->seq,"\n";
}

You should get warnings for any other value than 0, 1, or 2.

I'll also note that the sequence you are having trouble with  
(sbay_c127) is 712 bp, so it doesn't contain the complete coding  
region.  I used it in the test case in SeqFeature.t.

Chris


From boris.steipe at utoronto.ca  Tue Dec 12 16:26:14 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 12 Dec 2006 16:26:14 -0500
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <F0B737D0-8555-4723-8B8D-50DAFF522AC8@utoronto.ca>

Looks like a memory allocation problem. Your whole sequence is in one  
single line, throwing a few linebreaks in there every 80th character  
or so will probably do the trick.

HTH
Boris

On 12-Dec-06, at 7:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Derek.Fairley at bll.n-i.nhs.uk  Wed Dec 13 05:00:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Wed, 13 Dec 2006 10:00:16 -0000
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C657@bllmail.bll.n-i.nhs.uk>

Hi Keith,

>I would like to manipulate my blast results with bioperl but would also
>like to have the html output of the blast.  What would be the best way
>of going about this, as I don't see any write functions in any of the
>blast modules I have looked at.  Would it be better to create my own
>html layout from the blast data then attempt to recover this from bioperl?

Take a look at some of the example scripts here:
http://www.bioperl.org/wiki/Bioperl_scripts
Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point.

>p.s. - does anyone know what the most informative blast "alignment view"
>output is? xml i suppose?

Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls.

Derek.


-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Dec 13 13:02:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 13 Dec 2006 12:02:14 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>

I am working on a few RNA-related things related to structure and  
have a few questions, specifically about Meta data.  This is sort of  
a proposal, but I would like to get everybody's thoughts about this  
to gauge what everyone thinks.  Jason, sorry to bug you but I thought  
it might be something that would be of use phylohackathon-wise.

Heikki has several modules present which adds meta data to sequences  
(Bio::Seq::Meta).  In this case, the meta data is stored as a string  
(Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array).  In both cases  
you can have multiple types of meta data for a sequence based on a  
particular tag.  However, this also assumes that the meta data is  
somehow attached strictly to sequence data of some type.  It also  
doesn't allow for having mixed meta data types for a single sequence,  
such as attaching array data and string data to the same sequence.

Hence, I was thinking of a having a simple, generic meta data type  
(Bio::Meta), one which could encompass simple strings  
(Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other  
structured type of data.  This could be used to annotate any  
PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,  
maybe in a collection (similar to AnnotationCollection).  I thought  
something like this may be of general use for any PrimarySeq  
(quality, structure), alignments like NEXUS and Stockholm,  
SeqFeatures where structure could be stored (tRNA or riboswitches), etc.

However, this also seems to fall into the category of sequence  
annotation.  So, would it be better to have a set of Bio::Annotation  
classes used for this purpose?

Flames and jibes welcome; I'm wearing my asbestos suit today....

chris


From stewarta at nmrc.navy.mil  Wed Dec 13 20:06:14 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 13 Dec 2006 20:06:14 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>

I am trying to StandAloneBlast->blastall an array or Bio::Seq  
objects.  The documentation claims that blastall can be passed a file  
name, a Bio::Seq object, or an array of Bio::Seq objects, while the  
usage suggests that a reference to an array of Bio::Seq objects is  
what must be passed to blastall.

(from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ 
Bio/Tools/Run/StandAloneBlast.html#POD5)
Usage:
	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
Bio::Seq objects
	$blast_report = $factory->blastall(\@seq_array);

Should this be...
$report = $factory->blastall(@seq_array);
or
$report = $factory->blastall(\@seq_array);
???

And if you are blastall'ing an array of Seq objects, then does  
blastall just return one big blast report or should I be expecting an  
array of blast reports?

I've tried $report = $factory->blastall(@seq_array); which seems to  
work ok, except that when I process the results, there are only  
results for the first Seq object in the array.


-Andrew

--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From arareko at campus.iztacala.unam.mx  Wed Dec 13 20:37:27 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 13 Dec 2006 19:37:27 -0600
Subject: [Bioperl-l] BioPerl page in Wikipedia
Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx>

Folks,

I've updated a little bit of the BioPerl page in the Wikipedia. I think 
it would be nice if we expand the article a little bit more since it's 
tagged as a "stub". Here's the link:

http://en.wikipedia.org/wiki/BioPerl

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Thu Dec 14 05:54:07 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Thu, 14 Dec 2006 11:54:07 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>

Hello,
I am new bioperl and I have been trying to run the examples available in
bptutorial.pl and other basic literature. I have installed the latest
release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to
retrieve the SwissProt and EMBL databases it gives me an error. With genbank
it seems to be fine. I wonder if the installation was not successful, as  I
would expect that these databases accesses were included in the modules of
BioPerl Core. In addition, I would like to ask whether to run Clustaw within
the setting of BioPerl I need to download and install it in the same
directory in which I have installed bioperl, or is it included in the module
of Bio::Align.
I am not sure whether this is the best place to ask these very basic
questions. If not, could anyone please refer me to the proper e mail
account?
Thank you very much in advance.

Luba Pardo MD, PhD


From bix at sendu.me.uk  Thu Dec 14 09:10:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:10:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
Message-ID: <45815B63.1020003@sendu.me.uk>

Andrew Stewart wrote:
> I am trying to StandAloneBlast->blastall an array or Bio::Seq  
> objects.  The documentation claims that blastall can be passed a file  
> name,

You're referring to 'In addition, sequence input may be in the form of 
either a Bio::Seq object or or an array of Bio::Seq objects'? I agree 
its not clear, but supplying a reference to an array is still supplying 
an array. Anyway, I'll clarify it.


In any case, the usage for the method is what you should pay attention to:

> Usage:
> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
> Bio::Seq objects
> 	$blast_report = $factory->blastall(\@seq_array);
> 
> Should this be...
> $report = $factory->blastall(@seq_array);
> or
> $report = $factory->blastall(\@seq_array);
> ???

It should be exactly what it says. A reference to the array.


> And if you are blastall'ing an array of Seq objects, then does  
> blastall just return one big blast report or should I be expecting an  
> array of blast reports?

Returns : Reference to a Blast object or BPlite object
            containing the blast report.

That means, just one big object, not an array.


From bix at sendu.me.uk  Thu Dec 14 09:42:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:42:18 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
Message-ID: <458162CA.5030803@sendu.me.uk>

Luba Pardo wrote:
> Hello, I am new bioperl and I have been trying to run the examples
> available in bptutorial.pl and other basic literature. I have
> installed the latest release of bioperl 1.5.2 in a usr/local/src
> directory. Any time I try to retrieve the SwissProt and EMBL
> databases it gives me an error.

What exactly are you trying? Paste some relevant code along with the
exact error message you get when running that code.


> I wonder if the installation was not successful, as  I would expect
> that these databases accesses were included in the modules of BioPerl
> Core.

They should work with just core installed.


  In addition, I would like to ask whether to run Clustaw within
> the setting of BioPerl I need to download and install it in the same 
> directory in which I have installed bioperl, or is it included in the
> module of Bio::Align.

The ClustalW module is in the bioperl-run package, so install that in
the same way you installed bioperl (core). The actual ClustalW program 
you need to download and install according to its own instructions. You 
let Bioperl know about where you installed ClustalW by eg. setting an 
environment variable.

See 
http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION
for details.


> I am not sure whether this is the best place to ask these very basic 
> questions. If not, could anyone please refer me to the proper e mail 
> account?

Its certainly the correct place, I hope we can resolve your problems.


From neetisomaiya at gmail.com  Thu Dec 14 03:02:37 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 14 Dec 2006 13:32:37 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
	<C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>

How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?

On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).  I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.out
Type: application/octet-stream
Size: 204960 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/1416cef5/attachment-0002.obj>

From stewarta at nmrc.navy.mil  Thu Dec 14 11:34:43 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 11:34:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <45815B63.1020003@sendu.me.uk>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>

Thanks for the reply, Sendu.

So I've tried passing a reference to an array of Seq objects with the  
following code...
	
	push @blast_run, $factory->blastall(\@query);  # where @query is an  
array of Bio::Seq objects

(In case you're wondering, I'm pushing the report into an array of  
reports because I'm running several instances of blastall with  
different parameters each time.)

....and it throws me the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/ 
common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ 
Bio/Tools/Run/StandAloneBlast.pm:557
STACK: main::run_blastall ./new_blast_script.pl:215
STACK: ./new_blast_script.pl:115
-----------------------------------------------------------

And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns...
757         my $status = system($commandstring);
758
759         $self->throw("$executable call crashed: $? $commandstring 
\n")
760           unless ($status==0) ;

So it looks like the system call isn't returning a happy $status.  At  
this point I'm pretty much stuck, though.  Blastall works just fine  
if I only send it a single Seq object.  Looking at _setinput, it  
appears a reference to an array of Seq objects should end up creating  
a multi-fasta file.  The only possibilities I can think of to explain  
this is...

- The -i file isn't be created for some reason when an (ref to) array  
of Seqs is passed
- There is something wrong with the -i file that is created and sent  
to blastall.
- Something else is wrong with the $commandstring being sent to the  
system call.

Does anyone see something here that I don't?


Thanks,
Andrew


On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:

> Andrew Stewart wrote:
>> I am trying to StandAloneBlast->blastall an array or Bio::Seq   
>> objects.  The documentation claims that blastall can be passed a  
>> file  name,
>
> You're referring to 'In addition, sequence input may be in the form  
> of either a Bio::Seq object or or an array of Bio::Seq objects'? I  
> agree its not clear, but supplying a reference to an array is still  
> supplying an array. Anyway, I'll clarify it.
>
>
> In any case, the usage for the method is what you should pay  
> attention to:
>
>> Usage:
>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of   
>> Bio::Seq objects
>> 	$blast_report = $factory->blastall(\@seq_array);
>> Should this be...
>> $report = $factory->blastall(@seq_array);
>> or
>> $report = $factory->blastall(\@seq_array);
>> ???
>
> It should be exactly what it says. A reference to the array.
>
>
>> And if you are blastall'ing an array of Seq objects, then does   
>> blastall just return one big blast report or should I be expecting  
>> an  array of blast reports?
>
> Returns : Reference to a Blast object or BPlite object
>            containing the blast report.
>
> That means, just one big object, not an array.


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Thu Dec 14 12:03:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 11:03:12 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>


On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?

The error pops up when the executable returns a bad status, so maybe  
it's choking on too many input sequences (i.e. Bioperl is doing  
everything correctly, but you are attempting to BLAST too many  
sequences in one go).  How many sequences are you attempting to use  
as input?  What happens when you use fewer input sequences?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 12:49:45 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 12:49:45 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>

> So can you look at the tempfile that is created and see if it is sane?
>
> Set -save_tempfiles => 1 whene you initialize the factory object or do
> $factory->save_tempfiles(1)
> before calling the blastall.
>
> -jason
>

Jason,
I was actually wondering how to do that.  Thanks.  Odd though, it  
still doesn't seem to be saving the tempfiles.  Might not matter  
though, because...

> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>

I was processing 738 sequences for input.  I cut that down to 20  
sequences and I'm getting some other exception thrown further  
downstream, so it appears you may be correct.  You don't happen to  
know what the max number of sequences that blastall allows for input,  
would ya? ;)  I suppose I'll have to break @query down into smaller  
doses or something.

Thanks,
Andrew


On Dec 14, 2006, at 12:03 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:
>
>> Thanks for the reply, Sendu.
>>
>> So I've tried passing a reference to an array of Seq objects with the
>> following code...
>> 	
>> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
>> array of Bio::Seq objects
>>
>> (In case you're wondering, I'm pushing the report into an array of
>> reports because I'm running several instances of blastall with
>> different parameters each time.)
>>
>> ....and it throws me the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  - 
>> d  "/
>> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
>> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ 
>> lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
>> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ 
>> perl5/5.8.6/
>> Bio/Tools/Run/StandAloneBlast.pm:557
>> STACK: main::run_blastall ./new_blast_script.pl:215
>> STACK: ./new_blast_script.pl:115
>> -----------------------------------------------------------
>>
>> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
>> returns...
>> 757         my $status = system($commandstring);
>> 758
>> 759         $self->throw("$executable call crashed: $? $commandstring
>> \n")
>> 760           unless ($status==0) ;
>>
>> So it looks like the system call isn't returning a happy $status.  At
>> this point I'm pretty much stuck, though.  Blastall works just fine
>> if I only send it a single Seq object.  Looking at _setinput, it
>> appears a reference to an array of Seq objects should end up creating
>> a multi-fasta file.  The only possibilities I can think of to explain
>> this is...
>>
>> - The -i file isn't be created for some reason when an (ref to) array
>> of Seqs is passed
>> - There is something wrong with the -i file that is created and sent
>> to blastall.
>> - Something else is wrong with the $commandstring being sent to the
>> system call.
>>
>> Does anyone see something here that I don't?
>
> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From Derek.Fairley at bll.n-i.nhs.uk  Thu Dec 14 12:58:10 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Thu, 14 Dec 2006 17:58:10 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>

Neeti,

 
>From http://emboss.sourceforge.net/apps/cvs/needle.html:

 
"The results can be output in one of several styles by using the
command-line qualifier -aformat xxx, where 'xxx' is replaced by the name
of the required format. Some of the alignment formats can cope with an
unlimited number of sequences, while others are only for pairs of
sequences. 

 
The available multiple alignment format names are: unknown, multiple,
simple, fasta, msf, trace, srs 

 
The available pairwise alignment format names are: pair, markx0, markx1,
markx2, markx3, markx10, srspair, score 

 
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
information on alignment formats."

 
Not sure based on this whether you can get pairwise alignment in .msf
format; can't think of a good reason why not. The BioPerl Align::IO
module will allow you to parse alignments in .msf format.

 
HTH,

 
Derek.

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

 
How do I run needle specifying that I want the MSF format, on a linux
box?

The help doesnt show me any format option. Is there anything available
to

pasre MSF format?

Please find an example alignment file attached. Here the seq_of_contig

aligns with the reference sequence (i.e. SEQ_1.REF) starting at position

(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from
the

output alignment, how can I parse the result to get this?

 
On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:

>

>

> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

>

> > Hi,

> >

> > Does anyone know of a bioperl parser for needle output, basically I

> > won't

> > where the target sequence aligns on the template (i.e. coordinate

> > on the

> > template where the taget aligns).

> >

> > --

> > -Neeti

> > Even my blood says, B positive

>

> I answered this a number of months back:

>

> http://tinyurl.com/yzlbx5

>

> Basically, newer versions of EMBOSS have changed the output for the

> AlignIO::emboss parser (which parses needle).  I don't believe the

> parser has been fixed to deal with that, but Jason has pointed out

> you can use MSF output when running needle, then parse using AlignIO

> with the format set to 'msf'.

>

> chris

>

 
-- 

-Neeti

Even my blood says, B positive


From cjfields at uiuc.edu  Thu Dec 14 13:36:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 12:36:09 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>


On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:

>> So can you look at the tempfile that is created and see if it is  
>> sane?
>>
>> Set -save_tempfiles => 1 whene you initialize the factory object  
>> or do
>> $factory->save_tempfiles(1)
>> before calling the blastall.
>>
>> -jason
>>
>
> Jason,
> I was actually wondering how to do that.  Thanks.  Odd though, it
> still doesn't seem to be saving the tempfiles.  Might not matter

That needs to be checked out.  Can anyone verify that?

>> The error pops up when the executable returns a bad status, so
>> maybe it's choking on too many input sequences (i.e. Bioperl is
>> doing everything correctly, but you are attempting to BLAST too
>> many sequences in one go).  How many sequences are you attempting
>> to use as input?  What happens when you use fewer input sequences?
>>
>> chris
>>
>
> I was processing 738 sequences for input.  I cut that down to 20
> sequences and I'm getting some other exception thrown further
> downstream, so it appears you may be correct.  You don't happen to
> know what the max number of sequences that blastall allows for input,
> would ya? ;)  I suppose I'll have to break @query down into smaller
> doses or something.
>
> Thanks,
> Andrew

It was a shot in the dark, really.  The fact that the return status  
was bad could be due to a number of problems (permissions issues, bad  
data, etc).  The fact that a single sequence worked indicated that  
permissions and output format likely weren't to blame.  The only  
other thing left was a problem with blastall itself.

BTW, the blast docs do not indicate whether there is a maximum number  
of sequences.  There may be a point where available memory becomes  
the limiting issue.

chris


From vaughn at cshl.edu  Thu Dec 14 14:09:34 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Thu, 14 Dec 2006 14:09:34 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking
Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>

Dear all,

I'm trying to bring some of my code into compliance with the BioPerl  
1.5.2 and am running into some design decisions that I am unclear on.  
Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking  
of the 'type' against SOFA? It seems to me that this should be  
optional behavior as is the case with the Bio::FeatureIO family. I'd  
be happy to write the patch if there is any agreement with me on this  
case.

Thanks,

Matt

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2413 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/59a9ac32/attachment-0002.bin>

From jason at bioperl.org  Thu Dec 14 11:59:20 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Dec 2006 11:59:20 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org>

So can you look at the tempfile that is created and see if it is sane?

Set -save_tempfiles => 1 whene you initialize the factory object or do
$factory->save_tempfiles(1)
before calling the blastall.

-jason
On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?
>
>
> Thanks,
> Andrew
>
>
>
> On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:
>
>> Andrew Stewart wrote:
>>> I am trying to StandAloneBlast->blastall an array or Bio::Seq
>>> objects.  The documentation claims that blastall can be passed a
>>> file  name,
>>
>> You're referring to 'In addition, sequence input may be in the form
>> of either a Bio::Seq object or or an array of Bio::Seq objects'? I
>> agree its not clear, but supplying a reference to an array is still
>> supplying an array. Anyway, I'll clarify it.
>>
>>
>> In any case, the usage for the method is what you should pay
>> attention to:
>>
>>> Usage:
>>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of
>>> Bio::Seq objects
>>> 	$blast_report = $factory->blastall(\@seq_array);
>>> Should this be...
>>> $report = $factory->blastall(@seq_array);
>>> or
>>> $report = $factory->blastall(\@seq_array);
>>> ???
>>
>> It should be exactly what it says. A reference to the array.
>>
>>
>>> And if you are blastall'ing an array of Seq objects, then does
>>> blastall just return one big blast report or should I be expecting
>>> an  array of blast reports?
>>
>> Returns : Reference to a Blast object or BPlite object
>>            containing the blast report.
>>
>> That means, just one big object, not an array.
>
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stewarta at nmrc.navy.mil  Thu Dec 14 16:23:07 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 16:23:07 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
	<97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
Message-ID: <E1CF879B-7A07-4CE7-A0D0-C7749ECFF8FC@nmrc.navy.mil>

> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris

Interesting.  I ran the 738-sequence dataset through blastall  
manually and the report only returned 198 of the 738 expected  
results.  Not only that, it seems to have just cut off right in the  
middle of the 198th result and a Segmentation fault was reported.   I  
removed the 198th sequence, wondering if it might be some issue with  
the input, and the segmentation fault occured again with the results  
ending on the 210th result.  I stuck the 198th sequence back in, but  
at the start of the file and sure enough the Segmentation error  
occurred earlier.  I think we can rule out the size of the input or  
number of sequences as the source of error here.  I'm more inclined  
to think it has something to do with the blast databases being  
queried against.

I found an old discussion on a problem that sounds fairly similar to  
this one, for anyone interested.
http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html

I think I'll try to work around the problem for now.

andrew


On Dec 14, 2006, at 1:36 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:
>
>>> So can you look at the tempfile that is created and see if it is  
>>> sane?
>>>
>>> Set -save_tempfiles => 1 whene you initialize the factory object  
>>> or do
>>> $factory->save_tempfiles(1)
>>> before calling the blastall.
>>>
>>> -jason
>>>
>>
>> Jason,
>> I was actually wondering how to do that.  Thanks.  Odd though, it
>> still doesn't seem to be saving the tempfiles.  Might not matter
>
> That needs to be checked out.  Can anyone verify that?
>
>>> The error pops up when the executable returns a bad status, so
>>> maybe it's choking on too many input sequences (i.e. Bioperl is
>>> doing everything correctly, but you are attempting to BLAST too
>>> many sequences in one go).  How many sequences are you attempting
>>> to use as input?  What happens when you use fewer input sequences?
>>>
>>> chris
>>>
>>
>> I was processing 738 sequences for input.  I cut that down to 20
>> sequences and I'm getting some other exception thrown further
>> downstream, so it appears you may be correct.  You don't happen to
>> know what the max number of sequences that blastall allows for input,
>> would ya? ;)  I suppose I'll have to break @query down into smaller
>> doses or something.
>>
>> Thanks,
>> Andrew
>
> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From lincoln.stein at gmail.com  Thu Dec 14 15:24:56 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 15:24:56 -0500
Subject: [Bioperl-l] Bio::Graphics xyplot
In-Reply-To: <4578951B.5050206@sfu.ca>
References: <4578951B.5050206@sfu.ca>
Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com>

Hi,

The way it works is that you create a single feature that spans the entire
range of the xyplot. It contains subfeatures, each of which has a score. The
graph points correspond to each of the subfeatures.

Lincoln

On 12/7/06, Keith Anthony Boroevich <kaboroev at sfu.ca> wrote:
>
> Hi everyone,
>
> I'm attempting to add an xyplot of the phred quality scores to an
> Bio::Graphics image, and cannot get it to work.
> I have the panel with a track for both the scale and the DNA displaying
> properly.  When I attempt to add the xyplot i just get a garbled track
> of, what looks like, timy xyplots for each datapoint.  I have the cvs
> (updated today) of bioperl-live running.  I think what I am missing is
> the creation of a "Sequence Feature Group" to hold the individual points
> of the plot.  However, I cannot seem to find such an object. This is
> what I attempted:
>
> -------BEGIN---CODE-----------
> # start panel
> my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
>                       -width     => $f_seqlen*10,
>                       -pad_left  => 10,
>                       -pad_right => 10,
>                       -grid      => 1
>                       );
> # add scale
> $panel->add_track(arrow =>
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
>               -double  => 1,
>               -tick    => 2,
>               -fgcolor => 'black');
> # add DNA ($feature is of type Bio::SeqFeature::Annotated)
> $panel->add_track(dna => $feature);
> # get list of quality scores from database
> my ($pqs_value) = $dbh->selectrow_array($sql);
> my @pqs_value = split(/\s/,$pqs_value);
> # create track
> my $track =  $panel->add_track(-glyph        => 'xyplot',
>                    -graph_type   => 'points',
>                    -point_symbol => 'point',
>                    -max_score    => 100,
>                    -min_score    => 0,
>                    -scale        => 'none');
> # add "subfeatures" to
> for (my $i=0;$i<$f_seqlen;$i++) {
>
>
> $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));
>
> }
> print $panel->png();
> $panel->finished;
> ------END---CODE----------
>
> I also attempted to create an array of the point features and passed
> that by reference to the panel "add_track" as it describes in the xyplot
> documentation, but that resulted in the exact same image.
>
> keith
>
> --
> ><)))?> -cGRASP- <?(((><
> Keith Anthony Boroevich
> Davidson Lab
> Dept of Molecular Biology
> Simon Fraser University
> Tel: 604-268-7276
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Thu Dec 14 17:15:07 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 17:15:07 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
Message-ID: <4581CCEB.20206@sendu.me.uk>

Matthew Vaughn wrote:
> Dear all,
> 
> I'm trying to bring some of my code into compliance with the BioPerl 
> 1.5.2 and am running into some design decisions that I am unclear on. 
> Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> the 'type' against SOFA? It seems to me that this should be optional 
> behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> write the patch if there is any agreement with me on this case.

Lots of people seem to have worked on it over the years, but perhaps 
Scott Cain is the person to talk to?

revision 1.4
date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
two things:
   * adding SOFA as an available ontology to DocumentRegistry.pm
   * modifying FeatureIO::gff to use SOFA to validate, and to parse 
Ontology_term


From lincoln.stein at gmail.com  Thu Dec 14 16:56:41 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 16:56:41 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>

Hi All,

I'm afraid that the xyplot glyph that is in the recent bioperl release has
an error that causes the content to be printed to the right of the correct
position. Unfortunately this wasn't caught before the release because the
glyph was only tested on very large (whole genome) features.

You will need to do a CVS update to get a fixed version from bioperl-live. A
future bugfix release of gbrowse will patch this glyph for you
automatically.

Lincoln

On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>
> Hi,
> I'm having a problem getting features and an xyplot properly aligned in
> Gbrowse.  For example, see this page:
>
> http://tinyurl.com/ylbq3q
>
> The feature in the "CENPK SNPs" track should actually be around the peak
> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP feature
> is at position 79, and the xyplot axes and data should span from 61 - 95.
> However, as you can see, the data in the xyplot are oddly separated from
> the axes (which seem to be in the correct place), with the data shifted over
> to about position 120-155.
> This occurs elsewhere, not just at the ends of the chromosomes.
>
> When I zoom to ~80 bp, all is well, see:
>
> http://tinyurl.com/yzav8k
>
> The relevant snippets from the GFF and the config files are below.
>
> Thanks!
> Kara
>
> GFF:
>
> chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> is 2.24506
> chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> is 3.26837
> chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> is 1.39938
> chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> is 1.4039
> chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> is 9.16134
> chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> is 10.1413
> chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> is 12.9256
> chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> is 13.195
> chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> is 22.7127
> chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> is 23.8289
> chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> is 21.9123
> chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> is 28.3344
> chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> is 35.0436
> chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> is 37.361
> chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> is 39.5408
> chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> is 28.2008
> chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> is 32.6254
> chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> is 36.0832
> chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> is 32.1205
> chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> is 41.3048
> chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> is 30.7975
> chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> is 29.4282
> chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> is 35.3586
> chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> is 34.1426
> chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> is 30.2966
> chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> is 17.8402
> chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> is 15.2637
> chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> is 12.657
> chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> is 10.2033
> chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> is 9.40143
> chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> is 6.56273
> chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> is 3.66211
> chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> is 0.394194
>
> CONFIG:
>
>
> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>
> [CENPK_all_scores_graph]
> feature = GRAPH_CENPK:SNPScanner
> glyph = xyplot
> graph_type = boxes
> fgcolor = purple
> bgcolor = purple
> height = 100
> min_score = 0
> max_score = 110
> label = 0
> key = CENPK prediction signal
> link =
> category = SNPs: signal graphs
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From dmessina at wustl.edu  Thu Dec 14 20:45:24 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 19:45:24 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>

Hey Chris,

My thoughts below.

> [Chris]
> This could be used to annotate any
> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
> maybe in a collection (similar to AnnotationCollection).  I thought
> something like this may be of general use for any PrimarySeq
> (quality, structure), alignments like NEXUS and Stockholm,
> SeqFeatures where structure could be stored (tRNA or riboswitches),  
> etc.
>
> However, this also seems to fall into the category of sequence
> annotation.  So, would it be better to have a set of Bio::Annotation
> classes used for this purpose?


To me, all meta data is equal. That is, your classic Genbank feature  
annotation and a user's arbitrary meta-tag like "Bob thinks this is a  
kinase domain" aren't different in kind even if they are different in  
content.

As resequencing projects multiply, the ability to create arbitrary  
meta tags, attach them to different types of objects, and use those  
tags to link them together will become desirable, if not essential.

Keeping a common interface to all of these meta data types would be  
advantageous, plus new users won't have to determine whether they  
need to use Bio::Meta objects or Bio::Annotation objects.

So I would argue for all of the meta data types to live "under one  
roof". Which roof isn't as important. Bio::Annotation, since it  
already exists for today's meta data, seems like a reasonable choice.  
(assuming Annotation objects are flexible enough to be extended as  
you propose)

There, and no flames or jibes even. :)

Dave


From cjfields at uiuc.edu  Thu Dec 14 21:21:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 20:21:10 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>


On Dec 14, 2006, at 7:45 PM, David Messina wrote:

> Hey Chris,
>
> My thoughts below.
>
>> [Chris]
>> This could be used to annotate any
>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>> maybe in a collection (similar to AnnotationCollection).  I thought
>> something like this may be of general use for any PrimarySeq
>> (quality, structure), alignments like NEXUS and Stockholm,
>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>> etc.
>>
>> However, this also seems to fall into the category of sequence
>> annotation.  So, would it be better to have a set of Bio::Annotation
>> classes used for this purpose?
>
>
> To me, all meta data is equal. That is, your classic Genbank feature
> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
> kinase domain" aren't different in kind even if they are different in
> content.
>
> As resequencing projects multiply, the ability to create arbitrary
> meta tags, attach them to different types of objects, and use those
> tags to link them together will become desirable, if not essential.
>
> Keeping a common interface to all of these meta data types would be
> advantageous, plus new users won't have to determine whether they
> need to use Bio::Meta objects or Bio::Annotation objects.
>
> So I would argue for all of the meta data types to live "under one
> roof". Which roof isn't as important. Bio::Annotation, since it
> already exists for today's meta data, seems like a reasonable choice.
> (assuming Annotation objects are flexible enough to be extended as
> you propose)
>
> There, and no flames or jibes even. :)

I guess what I want to know is whether there should to be a  
distinction between 'normal' sequence annotation (comments,  
references, and so on) and annotation that could be best described as  
position-specific (like RNA or protein structural annotation).  The  
current meta implementation is for sequence data only; I felt it  
would be nice to have a generic implementation that would be  
applicable to any object data.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu>

And it all seemed so clear to me when I wrote it. :)

> whether there should to be a distinction

I would argue no because it would contravene a s


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <E4629E7B-E42C-4B93-869F-FE26035052A0@wustl.edu>

[oops, accidentally hit send midsentence]


And it all seemed so clear to me when I wrote it. :)


> whether there should to be a distinction

I would argue no because it would contravene a standard interface.


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


Dave


From neetisomaiya at gmail.com  Fri Dec 15 00:21:42 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 10:51:42 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>

Hi,

Thanks a lot for your response.
I ran needle like this
 /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
get the alignment start and stop coordinates on the sequence. I mean
something like hsp->query->start which gives us the alignment start position
on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate
where the alignment starts on the sequence.

~Neeti.

On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
>  Neeti,
>
>
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html:
>
>
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
>
>
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
>
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
>
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
>
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
>
>
> HTH,
>
>
>
> Derek.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
>
>
> How do I run needle specifying that I want the MSF format, on a linux box?
>
> The help doesnt show me any format option. Is there anything available to
>
> pasre MSF format?
>
> Please find an example alignment file attached. Here the seq_of_contig
>
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
>
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
>
> output alignment, how can I parse the result to get this?
>
>
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
> >
>
> >
>
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> >
>
> > > Hi,
>
> > >
>
> > > Does anyone know of a bioperl parser for needle output, basically I
>
> > > won't
>
> > > where the target sequence aligns on the template (i.e. coordinate
>
> > > on the
>
> > > template where the taget aligns).
>
> > >
>
> > > --
>
> > > -Neeti
>
> > > Even my blood says, B positive
>
> >
>
> > I answered this a number of months back:
>
> >
>
> > http://tinyurl.com/yzlbx5
>
> >
>
> > Basically, newer versions of EMBOSS have changed the output for the
>
> > AlignIO::emboss parser (which parses needle).  I don't believe the
>
> > parser has been fixed to deal with that, but Jason has pointed out
>
> > you can use MSF output when running needle, then parse using AlignIO
>
> > with the format set to 'msf'.
>
> >
>
> > chris
>
> >
>
>
>
>
>
>
>
> --
>
> -Neeti
>
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive


From Derek.Fairley at bll.n-i.nhs.uk  Fri Dec 15 04:57:35 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Fri, 15 Dec 2006 09:57:35 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>

Neeti,

In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. 

Derek.


-----Original Message-----
From: neeti somaiya [mailto:neetisomaiya at gmail.com] 
Sent: 15 December 2006 05:22
To: Fairley, Derek; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

Hi,

Thanks a lot for your response.
I ran needle like this 
?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence.

~Neeti.
On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
Neeti,
?
>From http://emboss.sourceforge.net/apps/cvs/needle.html :
?
"The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. 
?
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs 
?
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score 
?
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats."
?
Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format.
?
HTH,
?
Derek.
?
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?
?
How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?
?
On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5 
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).? I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>
?
?
?
-- 
-Neeti
Even my blood says, B positive


-- 
-Neeti
Even my blood says, B positive 


From cain at cshl.edu  Fri Dec 15 00:01:36 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 15 Dec 2006 00:01:36 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <4581CCEB.20206@sendu.me.uk>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
Message-ID: <1166158897.2569.335.camel@localhost.localdomain>

As much as I would like to take credit for this :-)  Allen Day wrote the
original code, and then Chris Fields tried to fix it so that it actually
worked :-)  I think it would be a good idea to have a validate_terms
option like Bio::FeatureIO::gff.

Scott

On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote:
> Matthew Vaughn wrote:
> > Dear all,
> > 
> > I'm trying to bring some of my code into compliance with the BioPerl 
> > 1.5.2 and am running into some design decisions that I am unclear on. 
> > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> > the 'type' against SOFA? It seems to me that this should be optional 
> > behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> > write the patch if there is any agreement with me on this case.
> 
> Lots of people seem to have worked on it over the years, but perhaps 
> Scott Cain is the person to talk to?
> 
> revision 1.4
> date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
> two things:
>    * adding SOFA as an available ontology to DocumentRegistry.pm
>    * modifying FeatureIO::gff to use SOFA to validate, and to parse 
> Ontology_term
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/021ec42f/attachment-0002.bin>

From neetisomaiya at gmail.com  Fri Dec 15 07:46:08 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 18:16:08 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>

I ran needle like this

/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out

Please find the output attached.

When I run the following :-

use Bio::SearchIO;

my $io = Bio::SearchIO->new(-file   => "1.out",
                           -format => "fasta" );

while ( my $result = $io->next_result() )
{
       while( my $hit = $result->next_hit)
      {

               print "yes\n";
       }
}


It says :-

-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------

What should I do?

~Neeti.

On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
> Neeti,
>
> In lieu of a response from a BioPerl guru... why not use Needle to
> generate your pairwise alignment in fasta format, rather than msf format?
> The sequence you want should correspond to a single HSP which you can get
> directly from the fasta alignment with Bio::SearchIO:
> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use
> Bio::AlignIO at all.
>
> Derek.
>
>
> -----Original Message-----
> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
> Sent: 15 December 2006 05:22
> To: Fairley, Derek; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> Hi,
>
> Thanks a lot for your response.
> I ran needle like this
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
> It gave me the output in format msf.
> But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
> get the alignment start and stop coordinates on the sequence. I mean
> something like hsp->query->start which gives us the alignment start position
> on query sequence in a blast output when using Bio::SearchIO.
> Please help.
> Like I explained with an example in my previous mail, I want the
> coordinate where the alignment starts on the sequence.
>
> ~Neeti.
> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
> Neeti,
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
> HTH,
>
> Derek.
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> How do I run needle specifying that I want the MSF format, on a linux box?
> The help doesnt show me any format option. Is there anything available to
> pasre MSF format?
> Please find an example alignment file attached. Here the seq_of_contig
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
> output alignment, how can I parse the result to get this?
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
> >
> >
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > Does anyone know of a bioperl parser for needle output, basically I
> > > won't
> > > where the target sequence aligns on the template (i.e. coordinate
> > > on the
> > > template where the taget aligns).
> > >
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> >
> > I answered this a number of months back:
> >
> > http://tinyurl.com/yzlbx5
> >
> > Basically, newer versions of EMBOSS have changed the output for the
> > AlignIO::emboss parser (which parses needle). I don't believe the
> > parser has been fixed to deal with that, but Jason has pointed out
> > you can use MSF output when running needle, then parse using AlignIO
> > with the format set to 'msf'.
> >
> > chris
> >
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.out
Type: application/octet-stream
Size: 90277 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/34b05d03/attachment-0002.obj>

From jason at bioperl.org  Fri Dec 15 09:28:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:28:13 -0500
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>


On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>
>> Hey Chris,
>>
>> My thoughts below.
>>
>>> [Chris]
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> etc.
>>>
>>> However, this also seems to fall into the category of sequence
>>> annotation.  So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>>
>>
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> content.
>>
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>>
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>>
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>>
>> There, and no flames or jibes even. :)
>
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation).  The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.

my stream-of-consciousness for right now:

I was thinking Bio::Annotation is where this should go - that system  
doesn't have anything about it that makes it explicitly sequence  
related. What we're trying to hammer out here on the Alignment side -  
which fits with your RNA example - is have features, basically  
SeqFeatures - associated with alignments so columns can be annotated  
to cover things like character sets and partitions for phylogenetic  
analyses.  As for data which annotates non-contiguous things like  
RNAstems we may have  to be more creative about that or model it with  
a splitLocation.

So currently we've added code so that an Alignment is-a  
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
end, with the goal of being able to capture more of the data that can  
be represented in a NEXUS file.

It feels more like a hack than an elegant Meta-data solution, but I  
am totally sure whether the data you are thinking about doing at this  
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of  
the data into features or annotations is confusing users?


From jason at bioperl.org  Fri Dec 15 09:48:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:48:32 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
	<764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org>

I get the impression you are trying to use the wrong tool for the  
job.  Can you explain a little more generally what you want to do?

Semantically FASTA in Bio::SearchIO is much different from FASTA in  
Bio::AlignIO.  We explain this on the wiki, please have a look on the  
FASTA page.

  do not use Bio::SearchIO to parse multi-fasta alignment output  
Bio::SearchIO is for pairwise alignment reports
  use Bio::AlignIO for a multi-fasta format or for msf - you just  
provide a different field to '-format'.

But none of that is going to help you get start/end for your  
alignment because that is not part of the output format - do the  
experiment of looking at the file and figuring out what are the  
actual fields you want output, if they don't exist then you either  
have a format that won't work for your question, or you will have to  
calculate additional .  If you trying to align transcripts to genome  
please consider tools that are built for it (and referenced on the  
wiki like Sim4, est2genome, exonerate, BLAT).

-jason
On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote:

> I ran needle like this
>
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out
>
> Please find the output attached.
>
> When I run the following :-
>
> use Bio::SearchIO;
>
> my $io = Bio::SearchIO->new(-file   => "1.out",
>                           -format => "fasta" );
>
> while ( my $result = $io->next_result() )
> {
>       while( my $hit = $result->next_hit)
>      {
>
>               print "yes\n";
>       }
> }
>
>
> It says :-
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> What should I do?
>
> ~Neeti.
>
> On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>>
>> Neeti,
>>
>> In lieu of a response from a BioPerl guru... why not use Needle to
>> generate your pairwise alignment in fasta format, rather than msf  
>> format?
>> The sequence you want should correspond to a single HSP which you  
>> can get
>> directly from the fasta alignment with Bio::SearchIO:
>> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need  
>> to use
>> Bio::AlignIO at all.
>>
>> Derek.
>>
>>
>> -----Original Message-----
>> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
>> Sent: 15 December 2006 05:22
>> To: Fairley, Derek; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> Hi,
>>
>> Thanks a lot for your response.
>> I ran needle like this
>> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
>> It gave me the output in format msf.
>> But now my problem is, if I use Bio::AlignIO module of Bioperl,  
>> how can I
>> get the alignment start and stop coordinates on the sequence. I mean
>> something like hsp->query->start which gives us the alignment  
>> start position
>> on query sequence in a blast output when using Bio::SearchIO.
>> Please help.
>> Like I explained with an example in my previous mail, I want the
>> coordinate where the alignment starts on the sequence.
>>
>> ~Neeti.
>> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>> Neeti,
>>
>> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>>
>> "The results can be output in one of several styles by using the
>> command-line qualifier -aformat xxx, where 'xxx' is replaced by  
>> the name of
>> the required format. Some of the alignment formats can cope with an
>> unlimited number of sequences, while others are only for pairs of  
>> sequences.
>>
>> The available multiple alignment format names are: unknown, multiple,
>> simple, fasta, msf, trace, srs
>>
>> The available pairwise alignment format names are: pair, markx0,  
>> markx1,
>> markx2, markx3, markx10, srspair, score
>>
>> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
>> information on alignment formats."
>>
>> Not sure based on this whether you can get pairwise alignment in .msf
>> format; can't think of a good reason why not. The BioPerl  
>> Align::IO module
>> will allow you to parse alignments in .msf format.
>>
>> HTH,
>>
>> Derek.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:
>> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
>> Sent: 14 December 2006 08:03
>> To: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> How do I run needle specifying that I want the MSF format, on a  
>> linux box?
>> The help doesnt show me any format option. Is there anything  
>> available to
>> pasre MSF format?
>> Please find an example alignment file attached. Here the  
>> seq_of_contig
>> aligns with the reference sequence (i.e. SEQ_1.REF) starting at  
>> position
>> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate  
>> from the
>> output alignment, how can I parse the result to get this?
>>
>> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>> >
>> >
>> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone know of a bioperl parser for needle output,  
>> basically I
>> > > won't
>> > > where the target sequence aligns on the template (i.e. coordinate
>> > > on the
>> > > template where the taget aligns).
>> > >
>> > > --
>> > > -Neeti
>> > > Even my blood says, B positive
>> >
>> > I answered this a number of months back:
>> >
>> > http://tinyurl.com/yzlbx5
>> >
>> > Basically, newer versions of EMBOSS have changed the output for the
>> > AlignIO::emboss parser (which parses needle). I don't believe the
>> > parser has been fixed to deal with that, but Jason has pointed out
>> > you can use MSF output when running needle, then parse using  
>> AlignIO
>> > with the format set to 'msf'.
>> >
>> > chris
>> >
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive
> <1.out>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From lubapardo at gmail.com  Fri Dec 15 11:39:11 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 15 Dec 2006 17:39:11 +0100
Subject: [Bioperl-l] NO BLAST
Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>

*Hello,*
*I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
**
*I got the following error message: cannot find path to blastall.*
*The code I used is (modified from HOWTObeginners):
*

#! /local/bin/perl -w

#use strict;

use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use
Bio::Tools::Run::StandAloneBlast;

my $db_object = Bio::DB::GenBank-> new;

#my $seq_ob = $db_object->get_Seq_by_id('NM_004043');

#$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta');

#$seq ->write_seq($seq_ob);

#print $seq;

@params = (program =>'blastn',
   database =>'db.fa');

$blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params);


$seq_obj = Bio::Seq->new(-id =>"testquery",
   -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT");

$report_obj = $blast_obj->blastall($seq_obj);

$result_obj =$report_obj->next_result;

print $result_obj->num_hits;

*Whether I create a sequence the novo or retrieve one from internet I got
the same message.*


From cjfields at uiuc.edu  Fri Dec 15 12:23:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:23:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
Message-ID: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>


On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:

>
> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>
>>
>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>
>>> Hey Chris,
>>>
>>> My thoughts below.
>>>
>>>> [Chris]
>>>> This could be used to annotate any
>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- 
>>>> you,
>>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>>> something like this may be of general use for any PrimarySeq
>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>>> etc.
>>>>
>>>> However, this also seems to fall into the category of sequence
>>>> annotation.  So, would it be better to have a set of  
>>>> Bio::Annotation
>>>> classes used for this purpose?
>>>
>>>
>>> To me, all meta data is equal. That is, your classic Genbank feature
>>> annotation and a user's arbitrary meta-tag like "Bob thinks this  
>>> is a
>>> kinase domain" aren't different in kind even if they are  
>>> different in
>>> content.
>>>
>>> As resequencing projects multiply, the ability to create arbitrary
>>> meta tags, attach them to different types of objects, and use those
>>> tags to link them together will become desirable, if not essential.
>>>
>>> Keeping a common interface to all of these meta data types would be
>>> advantageous, plus new users won't have to determine whether they
>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>
>>> So I would argue for all of the meta data types to live "under one
>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>> already exists for today's meta data, seems like a reasonable  
>>> choice.
>>> (assuming Annotation objects are flexible enough to be extended as
>>> you propose)
>>>
>>> There, and no flames or jibes even. :)
>>
>> I guess what I want to know is whether there should to be a
>> distinction between 'normal' sequence annotation (comments,
>> references, and so on) and annotation that could be best described as
>> position-specific (like RNA or protein structural annotation).  The
>> current meta implementation is for sequence data only; I felt it
>> would be nice to have a generic implementation that would be
>> applicable to any object data.
>
> my stream-of-consciousness for right now:
>
> I was thinking Bio::Annotation is where this should go - that  
> system doesn't have anything about it that makes it explicitly  
> sequence related. What we're trying to hammer out here on the  
> Alignment side - which fits with your RNA example - is have  
> features, basically SeqFeatures - associated with alignments so  
> columns can be annotated to cover things like character sets and  
> partitions for phylogenetic analyses.  As for data which annotates  
> non-contiguous things like RNAstems we may have  to be more  
> creative about that or model it with a splitLocation.
>
> So currently we've added code so that an Alignment is-a  
> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
> end, with the goal of being able to capture more of the data that  
> can be represented in a NEXUS file.
>
> It feels more like a hack than an elegant Meta-data solution, but I  
> am totally sure whether the data you are thinking about doing at  
> this point, perhaps I need to spend more time thinking about it.
> Or are you worried about the idea of whether the semantic mapping  
> of the data into features or annotations is confusing users?

Sorry in advance for the longish response here...

My original thought was to have a generic abstract class capable of  
positionally describing data in any another class, similar to  
Heikki's Bio::Seq::MetaI but not constrained to sequence data only.   
Implementing classes would be capable of having different data  
structures based on their use (simple string, array, AoA, AoH, AoO).   
One MetaCollection class to contain them all in a tag-like system, so  
you could have mixed data types describe the same object.  The latter  
Collection class is so similar to AnnotationCollection that I agree  
Bio::Annotation would be the best place for this.

The way I reconfigured Stockholm alignment parsing/writing is to use  
Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is  
capable of holding a sequence and several meta strings, stored as  
tags or 'names'.  However, there is no Meta object for alignments  
(for RNA/protein structure consensus and other Rfam/Pfam markup); I  
hacked around this by using a Bio::Seq::Meta w/o a seq, but I would  
rather have a generic Meta object independent of the sequence cruft.

So for this partial Pfam alignment,

Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
#=GR Q92SV1_RHIME/122-299 pAS .........................
Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
#=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
#=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
#=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
#=GC SA_cons                 03002200312...1312414..676
#=GC seq_cons                luhhLuhsRpl...hthppth..+pG
//

'#=GC' lines would be in generic meta string objects in the  
alignment, while '#=GR' tags would be in similar meta objects in the  
relevant sequences.  As long as both aren't AnnotatableI this isn't  
an issue.

Similarly, NEXUS files which contained any position-based values  
could hold a meta string/array object in a similar tag.

The basic scheme is:

                     |--String
                     |
Annotation::Meta----|--Array
                     |
                     |--HorriblyComplexDataStruct

Then I started thinking about where this could be applied, and  
whether a true Meta object needs to be constrained only to describing  
position-based data.  This somewhat relates to this bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1825

which seems to need a simple but unconstrained hash-of-arrays-based  
meta object.

Then my head appropriately exploded...

Hope everything is going well at the hackathon!  Looks like some  
interesting stuff coming out of it.

chris


From cjfields at uiuc.edu  Fri Dec 15 12:49:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:49:45 -0600
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>

On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:

> As much as I would like to take credit for this :-)  Allen Day  
> wrote the
> original code, and then Chris Fields tried to fix it so that it  
> actually
> worked :-)  I think it would be a good idea to have a validate_terms
> option like Bio::FeatureIO::gff.
>
> Scott

I did ?!?  I committed a bug fix a while back:

Revision 1.34 / (view) - annotate - [select for diffs] ,
Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
Branch: MAIN
CVS Tags: branch-experimental
Branch point for: branch-1-5-2
Changes since 1.33: +155 -33 lines
Diff to previous 1.33

Bug 2026; Robert's enhancements

To tell the truth I don't know if this is where the mandatory checks  
were added in; I'm not too familiar with SeqFeature::Annotation yet.

I agree with Scott (and Matthew) that SOFA checks should be  
optional.  Matthew, can you write up a patch and maybe some tests?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 18:30:11 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 18:30:11 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>

I'm getting the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ 
SearchIO/blast.pm:1172
STACK: main::process_reports ./new_blast_script.pl:254
STACK: ./new_blast_script.pl:132
-----------------------------------------------------------


next_result is a pretty dense chunk of code to decipher.  I was  
wondering if anyone more familiar with that code might know what the  
"no data for midline $_" exception is referring to?

For context:

    1161                if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ 
(\-?\d+)/ ) {
    1162                    my ($full,$type,$start,$str,$end) = ($1, 
$2,$3,$4,$5);
    1163                    if( $str eq '-' ) {
    1164                        $i = 3 if $type eq 'Sbjct';
    1165                    } else {
    1166                        $data{$type} = $str;
    1167                    }
    1168                    $len = length($full);
    1169                    $self->{"\_$type"}->{'begin'} = $start  
unless $self->{"_$type"}->{'begin'};
    1170                    $self->{"\_$type"}->{'end'} = $end;
    1171                } else {
    1172                    $self->throw("no data for midline $_")
    1173                        unless (defined $_ && defined $len);
    1174                    $data{'Mid'} = substr($_,$len);
    1175                }


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason at bioperl.org  Fri Dec 15 13:56:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 13:56:13 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
Message-ID: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>

It means it is expecting alignment block of data and there is none  
(or there is none in the context it is expecting it) - so something  
is wrong with the report as it gets tripped up.

I'm not sure reading the code is going to help you - what someone  
will have to do is figure out what is different about this report  
than reports that do work for the parser.
You'll do better if you just provide an example report that is  
failing as a bug report.

Providing the version of BLAST you are using and version of bioperl  
will help.  I seem to remember NCBI changing the BLAST text format so  
that will break the parser if it is a significant change.

As has been mentioned in the past, this playing cat and mouse with  
format changes means things will periodically break. If you need rock- 
solid always going to work, I guess the XML is better route to go.

-jason
On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:

> I'm getting the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
> SearchIO/blast.pm:1172
> STACK: main::process_reports ./new_blast_script.pl:254
> STACK: ./new_blast_script.pl:132
> -----------------------------------------------------------
>
>
> next_result is a pretty dense chunk of code to decipher.  I was
> wondering if anyone more familiar with that code might know what the
> "no data for midline $_" exception is referring to?
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Dec 15 14:21:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 13:21:32 -0600
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
	<B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu>


On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote:

> It means it is expecting alignment block of data and there is none
> (or there is none in the context it is expecting it) - so something
> is wrong with the report as it gets tripped up.
>
> I'm not sure reading the code is going to help you - what someone
> will have to do is figure out what is different about this report
> than reports that do work for the parser.
> You'll do better if you just provide an example report that is
> failing as a bug report.
>
> Providing the version of BLAST you are using and version of bioperl
> will help.  I seem to remember NCBI changing the BLAST text format so
> that will break the parser if it is a significant change.
>
> As has been mentioned in the past, this playing cat and mouse with
> format changes means things will periodically break. If you need rock-
> solid always going to work, I guess the XML is better route to go.
>
> -jason

I agree that XML is the only reliable way to go, though I have been  
reading on the BioPython group about some issues with newer (2.2.13  
or greater) BLAST XML output when reports with multiple BLAST  
queries.  Don't know if this affects Bioperl or not.

As for the 'midline' error, there was a similar error a while back  
(fixed for the 1.5.2 release) that had to do with extra lines in the  
alignment section in some BLAST reports.  Unless we have a demo BLAST  
report and sample code we can't do much about it (we need to  
reproduce the error in order to fix it), so the best thing to do it  
file a bug report.

chris

> On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:
>
>> I'm getting the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
>> SearchIO/blast.pm:1172
>> STACK: main::process_reports ./new_blast_script.pl:254
>> STACK: ./new_blast_script.pl:132
>> -----------------------------------------------------------
>>
>>
>> next_result is a pretty dense chunk of code to decipher.  I was
>> wondering if anyone more familiar with that code might know what the
>> "no data for midline $_" exception is referring to?
>>
>>
>> --
>> Andrew Stewart
>> Research Assistant, Genomics Team
>> Navy Medical Research Center (NMRC)
>> Biological Defense Research Directorate (BDRD)
>> BDRD Annex
>> 12300 Washington Avenue, 2nd Floor
>> Rockville, MD 20852
>>
>> email: stewarta at nmrc.navy.mil
>> phone: 301-231-6700 Ext 270
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From vaughn at cshl.edu  Fri Dec 15 13:05:47 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Fri, 15 Dec 2006 13:05:47 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <ed625e0e0612151005o2641f019ndb5cf0ac6582e2d6@mail.gmail.com>

Yes, I will. I am working on it today. It's a little more complicated
to fix this than I expected because SeqFeature::Annotation->type()
returns a Bio::AnnotationI rather than a simple scalar like it used
to.

On 12/15/06, Chris Fields <cjfields at uiuc.edu> wrote:
> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:
>
> > As much as I would like to take credit for this :-)  Allen Day
> > wrote the
> > original code, and then Chris Fields tried to fix it so that it
> > actually
> > worked :-)  I think it would be a good idea to have a validate_terms
> > option like Bio::FeatureIO::gff.
> >
> > Scott
>
> I did ?!?  I committed a bug fix a while back:
>
> Revision 1.34 / (view) - annotate - [select for diffs] ,
> Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
> Branch: MAIN
> CVS Tags: branch-experimental
> Branch point for: branch-1-5-2
> Changes since 1.33: +155 -33 lines
> Diff to previous 1.33
>
> Bug 2026; Robert's enhancements
>
> To tell the truth I don't know if this is where the mandatory checks
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
>
> I agree with Scott (and Matthew) that SOFA checks should be
> optional.  Matthew, can you write up a patch and maybe some tests?
>
> chris
>
>
>
>


From valiente at lsi.upc.edu  Fri Dec 15 19:45:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Sat, 16 Dec 2006 01:45:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577EFD3.7090904@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>

> I don't think that can be true. Your error message contains 'Must  
> supply
> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>
> If you uninstall the fink installation and install 1.5.2 using cpan  
> (with root privileges by going sudo cpan) that should at least get  
> rid of the error messages...
>
>
>> The tree is not correct (I've parsed it from R to have a double
>> check) but don't know yet what the problem is with it.
>
> ... But if the tree is wrong anyway... Let me know what you find out.

I've uninstalled the fink installation and used the cvs instead, and  
the error message is gone. However, on a larger set of 190 species,  
which are all present in the NCBI taxonomy, the resulting tree has  
only 178 taxa. I suspect, something must be wrong with the  
merge_lineage method in the major rewrite of the taxonomy2tree  
script. Can someone please check this? I'm attaching the 190 species  
call to the script. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061216/5e392593/attachment-0002.obj>

From lincoln.stein at gmail.com  Fri Dec 15 11:02:27 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Fri, 15 Dec 2006 11:02:27 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>

This is very embarassing for me, particularly since I spent a lot of time
validating that Bio::Graphics was working properly before the 1.5.2 release
went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

Lincoln

On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>
> Hi All,
>
> I'm afraid that the xyplot glyph that is in the recent bioperl release has
> an error that causes the content to be printed to the right of the correct
> position. Unfortunately this wasn't caught before the release because the
> glyph was only tested on very large (whole genome) features.
>
> You will need to do a CVS update to get a fixed version from bioperl-live.
> A future bugfix release of gbrowse will patch this glyph for you
> automatically.
>
> Lincoln
>
> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
> >
> > Hi,
> > I'm having a problem getting features and an xyplot properly aligned in
> > Gbrowse.  For example, see this page:
> >
> > http://tinyurl.com/ylbq3q
> >
> > The feature in the "CENPK SNPs" track should actually be around the peak
> > of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
> > feature is at position 79, and the xyplot axes and data should span from
> > 61 - 95.  However, as you can see, the data in the xyplot are oddly
> > separated from the axes (which seem to be in the correct place), with the
> > data shifted over to about position 120-155.
> > This occurs elsewhere, not just at the ends of the chromosomes.
> >
> > When I zoom to ~80 bp, all is well, see:
> >
> > http://tinyurl.com/yzav8k
> >
> > The relevant snippets from the GFF and the config files are below.
> >
> > Thanks!
> > Kara
> >
> > GFF:
> >
> > chrI SNPScanner
> > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> > is 2.24506
> > chrI SNPScanner
> > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> > is 3.26837
> > chrI SNPScanner
> > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> > is 1.39938
> > chrI SNPScanner
> > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> > is 1.4039
> > chrI SNPScanner
> > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> > is 9.16134
> > chrI SNPScanner
> > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> > is 10.1413
> > chrI SNPScanner
> > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> > is 12.9256
> > chrI SNPScanner
> > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> > is 13.195
> > chrI SNPScanner
> > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> > is 22.7127
> > chrI SNPScanner
> > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> > is 23.8289
> > chrI SNPScanner
> > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> > is 21.9123
> > chrI SNPScanner
> > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> > is 28.3344
> > chrI SNPScanner
> > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> > is 35.0436
> > chrI SNPScanner
> > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> > is 37.361
> > chrI SNPScanner
> > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> > is 39.5408
> > chrI SNPScanner
> > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> > is 28.2008
> > chrI SNPScanner
> > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> > is 32.6254
> > chrI SNPScanner
> > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> > is 36.0832
> > chrI SNPScanner
> > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> > is 32.1205
> > chrI SNPScanner
> > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> > is 41.3048
> > chrI SNPScanner
> > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> > is 30.7975
> > chrI SNPScanner
> > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> > is 29.4282
> > chrI SNPScanner
> > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> > is 35.3586
> > chrI SNPScanner
> > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> > is 34.1426
> > chrI SNPScanner
> > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> > is 30.2966
> > chrI SNPScanner
> > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> > is 17.8402
> > chrI SNPScanner
> > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> > is 15.2637
> > chrI SNPScanner
> > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> > is 12.657
> > chrI SNPScanner
> > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> > is 10.2033
> > chrI SNPScanner
> > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> > is 9.40143
> > chrI SNPScanner
> > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> > is 6.56273
> > chrI SNPScanner
> > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> > is 3.66211
> > chrI SNPScanner
> > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> > is 0.394194
> >
> > CONFIG:
> >
> >
> > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
> >
> > [CENPK_all_scores_graph]
> > feature = GRAPH_CENPK:SNPScanner
> > glyph = xyplot
> > graph_type = boxes
> > fgcolor = purple
> > bgcolor = purple
> > height = 100
> > min_score = 0
> > max_score = 110
> > label = 0
> > key = CENPK prediction signal
> > link =
> > category = SNPs: signal graphs
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> >
> > _______________________________________________
> > Gmod-gbrowse mailing list
> > Gmod-gbrowse at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Sat Dec 16 01:10:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:10:07 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu>

We could feasibly have regular point releases of the 1.5 dev. series  
for bug fixes; I guess it just depends on how often these should come  
out and what critical tests must pass for a release to go forward.   
Sendu's already done a ton of work towards getting BioPerl switched  
over to Module::Build and Test::More, and fixing bugs.  As Hilmar has  
pointed out in the past, this is a developer's series, so not every  
test needs to pass before a release goes out.

When would you like this to go out?

chris

On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote:

> This is very embarassing for me, particularly since I spent a lot  
> of time
> validating that Bio::Graphics was working properly before the 1.5.2  
> release
> went out. How long before there is a 1.5.3 release? How about a  
> 1.5.2.1release?
>
> Lincoln
>
> On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>>
>> Hi All,
>>
>> I'm afraid that the xyplot glyph that is in the recent bioperl  
>> release has
>> an error that causes the content to be printed to the right of the  
>> correct
>> position. Unfortunately this wasn't caught before the release  
>> because the
>> glyph was only tested on very large (whole genome) features.
>>
>> You will need to do a CVS update to get a fixed version from  
>> bioperl-live.
>> A future bugfix release of gbrowse will patch this glyph for you
>> automatically.
>>
>> Lincoln
>>
>> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>>>
>>> Hi,
>>> I'm having a problem getting features and an xyplot properly  
>>> aligned in
>>> Gbrowse.  For example, see this page:
>>>
>>> http://tinyurl.com/ylbq3q
>>>
>>> The feature in the "CENPK SNPs" track should actually be around  
>>> the peak
>>> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
>>> feature is at position 79, and the xyplot axes and data should  
>>> span from
>>> 61 - 95.  However, as you can see, the data in the xyplot are oddly
>>> separated from the axes (which seem to be in the correct place),  
>>> with the
>>> data shifted over to about position 120-155.
>>> This occurs elsewhere, not just at the ends of the chromosomes.
>>>
>>> When I zoom to ~80 bp, all is well, see:
>>>
>>> http://tinyurl.com/yzav8k
>>>
>>> The relevant snippets from the GFF and the config files are below.
>>>
>>> Thanks!
>>> Kara
>>>
>>> GFF:
>>>
>>> chrI SNPScanner
>>> CENPK_GRAPH 61 95 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_CALL 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 61 61 2.24506 . .  
>>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
>>> is 2.24506
>>> chrI SNPScanner
>>> CENPK_SCORE 62 62 3.26837 . .  
>>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
>>> is 3.26837
>>> chrI SNPScanner
>>> CENPK_SCORE 63 63 1.39938 . .  
>>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
>>> is 1.39938
>>> chrI SNPScanner
>>> CENPK_SCORE 64 64 1.4039 . .  
>>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
>>> is 1.4039
>>> chrI SNPScanner
>>> CENPK_SCORE 65 65 9.16134 . .  
>>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
>>> is 9.16134
>>> chrI SNPScanner
>>> CENPK_SCORE 66 66 10.1413 . .  
>>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
>>> is 10.1413
>>> chrI SNPScanner
>>> CENPK_SCORE 67 67 12.9256 . .  
>>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
>>> is 12.9256
>>> chrI SNPScanner
>>> CENPK_SCORE 68 68 13.195 . .  
>>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
>>> is 13.195
>>> chrI SNPScanner
>>> CENPK_SCORE 69 69 22.7127 . .  
>>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
>>> is 22.7127
>>> chrI SNPScanner
>>> CENPK_SCORE 70 70 23.8289 . .  
>>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
>>> is 23.8289
>>> chrI SNPScanner
>>> CENPK_SCORE 71 71 21.9123 . .  
>>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
>>> is 21.9123
>>> chrI SNPScanner
>>> CENPK_SCORE 72 72 28.3344 . .  
>>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
>>> is 28.3344
>>> chrI SNPScanner
>>> CENPK_SCORE 73 73 35.0436 . .  
>>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
>>> is 35.0436
>>> chrI SNPScanner
>>> CENPK_SCORE 74 74 37.361 . .  
>>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
>>> is 37.361
>>> chrI SNPScanner
>>> CENPK_SCORE 75 75 39.5408 . .  
>>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
>>> is 39.5408
>>> chrI SNPScanner
>>> CENPK_SCORE 76 76 28.2008 . .  
>>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
>>> is 28.2008
>>> chrI SNPScanner
>>> CENPK_SCORE 77 77 32.6254 . .  
>>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
>>> is 32.6254
>>> chrI SNPScanner
>>> CENPK_SCORE 78 78 36.0832 . .  
>>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
>>> is 36.0832
>>> chrI SNPScanner
>>> CENPK_SCORE 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 80 80 32.1205 . .  
>>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
>>> is 32.1205
>>> chrI SNPScanner
>>> CENPK_SCORE 81 81 41.3048 . .  
>>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
>>> is 41.3048
>>> chrI SNPScanner
>>> CENPK_SCORE 82 82 30.7975 . .  
>>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
>>> is 30.7975
>>> chrI SNPScanner
>>> CENPK_SCORE 83 83 29.4282 . .  
>>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
>>> is 29.4282
>>> chrI SNPScanner
>>> CENPK_SCORE 84 84 35.3586 . .  
>>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
>>> is 35.3586
>>> chrI SNPScanner
>>> CENPK_SCORE 85 85 34.1426 . .  
>>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
>>> is 34.1426
>>> chrI SNPScanner
>>> CENPK_SCORE 86 86 30.2966 . .  
>>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
>>> is 30.2966
>>> chrI SNPScanner
>>> CENPK_SCORE 87 87 17.8402 . .  
>>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
>>> is 17.8402
>>> chrI SNPScanner
>>> CENPK_SCORE 88 88 15.2637 . .  
>>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
>>> is 15.2637
>>> chrI SNPScanner
>>> CENPK_SCORE 89 89 12.657 . .  
>>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
>>> is 12.657
>>> chrI SNPScanner
>>> CENPK_SCORE 90 90 10.2033 . .  
>>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
>>> is 10.2033
>>> chrI SNPScanner
>>> CENPK_SCORE 91 91 9.40143 . .  
>>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
>>> is 9.40143
>>> chrI SNPScanner
>>> CENPK_SCORE 92 92 6.56273 . .  
>>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
>>> is 6.56273
>>> chrI SNPScanner
>>> CENPK_SCORE 93 93 3.66211 . .  
>>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
>>> is 3.66211
>>> chrI SNPScanner
>>> CENPK_SCORE 94 94 0.394194 . .  
>>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
>>> is 0.394194
>>>
>>> CONFIG:
>>>
>>>
>>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>>>
>>> [CENPK_all_scores_graph]
>>> feature = GRAPH_CENPK:SNPScanner
>>> glyph = xyplot
>>> graph_type = boxes
>>> fgcolor = purple
>>> bgcolor = purple
>>> height = 100
>>> min_score = 0
>>> max_score = 110
>>> label = 0
>>> key = CENPK prediction signal
>>> link =
>>> category = SNPs: signal graphs
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -----
>>> Take Surveys. Earn Cash. Influence the Future of IT
>>> Join SourceForge.net's Techsay panel and you'll get the chance to  
>>> share
>>> your
>>> opinions on IT & business topics through brief surveys - and earn  
>>> cash
>>> http://www.techsay.com/default.php? 
>>> page=join.php&p=sourceforge&CID=DEVDEV
>>>
>>>
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>>
>> --
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Dec 16 01:28:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:28:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>


On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:

>> I don't think that can be true. Your error message contains 'Must  
>> supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using  
>> cpan (with root privileges by going sudo cpan) that should at  
>> least get rid of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
>
> I've uninstalled the fink installation and used the cvs instead,  
> and the error message is gone. However, on a larger set of 190  
> species, which are all present in the NCBI taxonomy, the resulting  
> tree has only 178 taxa. I suspect, something must be wrong with the  
> merge_lineage method in the major rewrite of the taxonomy2tree  
> script. Can someone please check this? I'm attaching the 190  
> species call to the script. Thanks,
>
> Gabriel

I can confirm that.  It is definitely dropping them in merge_lineage 
(); if you add a call to get_leaf_nodes to check how many are present  
after each merge_lineage() call, you can see it dropping nodes along  
the trace.

in taxonomy2tree.pl:

my $ct;
my ($treect, $mergect) = 0;
for my $name (@species) {
   my $ncbi_id = $db->get_taxonid($name);
   if ($ncbi_id) {
     #print "Species: $name\n\tTaxID: $ncbi_id\n";
     #$ids{$ncbi_id}++;
     my $node = $db->get_taxon(-taxonid => $ncbi_id);

     if ($tree) {
       $tree->merge_lineage($node);

     }
     else {
       $tree = Bio::Tree::Tree->new(-node => $node);
     }
     printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes));
   }
   else {
     warn "no NCBI Taxonomy node for species ",$name,"\n";
   }
   $ct++;
}

chris


From bix at sendu.me.uk  Sat Dec 16 09:37:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:37:49 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <458404BD.8030908@sendu.me.uk>

Lincoln Stein wrote:
> This is very embarassing for me, particularly since I spent a lot of time
> validating that Bio::Graphics was working properly before the 1.5.2 release
> went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

I'm happy to try a point release for critical bug fixes. Why don't you 
commit the necessary fixes to branch-1-5-2 and let me know when you're 
happy, and I'll do 1.5.2.1.


Cheers,
Sendu.


From bix at sendu.me.uk  Sat Dec 16 09:47:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:47:57 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <4584071D.3070005@sendu.me.uk>

Gabriel Valiente wrote:
>> I don't think that can be true. Your error message contains 'Must supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using cpan 
>> (with root privileges by going sudo cpan) that should at least get rid 
>> of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
> 
> I've uninstalled the fink installation and used the cvs instead, and the 
> error message is gone. However, on a larger set of 190 species, which 
> are all present in the NCBI taxonomy, the resulting tree has only 178 
> taxa. I suspect, something must be wrong with the merge_lineage method 
> in the major rewrite of the taxonomy2tree script. Can someone please 
> check this? I'm attaching the 190 species call to the script. Thanks,

Ok, I'll look into it. You're also welcome to see if you can take your 
own code from your original taxonomy2tree script and see if you can 
merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with 
your algorithms to get it working correctly. Indeed, does your original 
version of the script work on this data set?


Cheers,
Sendu.


From cjfields at uiuc.edu  Sat Dec 16 10:18:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 09:18:50 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4584071D.3070005@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<4584071D.3070005@sendu.me.uk>
Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu>


On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>>> I don't think that can be true. Your error message contains 'Must  
>>> supply
>>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>>
>>> If you uninstall the fink installation and install 1.5.2 using cpan
>>> (with root privileges by going sudo cpan) that should at least  
>>> get rid
>>> of the error messages...
>>>
>>>
>>>> The tree is not correct (I've parsed it from R to have a double
>>>> check) but don't know yet what the problem is with it.
>>>
>>> ... But if the tree is wrong anyway... Let me know what you find  
>>> out.
>>
>> I've uninstalled the fink installation and used the cvs instead,  
>> and the
>> error message is gone. However, on a larger set of 190 species, which
>> are all present in the NCBI taxonomy, the resulting tree has only 178
>> taxa. I suspect, something must be wrong with the merge_lineage  
>> method
>> in the major rewrite of the taxonomy2tree script. Can someone please
>> check this? I'm attaching the 190 species call to the script. Thanks,
>
> Ok, I'll look into it. You're also welcome to see if you can take your
> own code from your original taxonomy2tree script and see if you can
> merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with
> your algorithms to get it working correctly. Indeed, does your  
> original
> version of the script work on this data set?
>
>
> Cheers,
> Sendu.

Sendu,

Don't know if it helps, but when I tried Gabriel's shell script last  
night I ran a modification of taxonomy2tree to see what would pop  
up.  Everything is fine up to about 100 iterations, then merge_lineage 
() starts dropping leaf nodes.

chris 
  

From bix at sendu.me.uk  Sat Dec 16 10:33:35 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 15:33:35 +0000
Subject: [Bioperl-l] NO BLAST
In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
Message-ID: <458411CF.8000707@sendu.me.uk>

Luba Pardo wrote:
> *Hello,*
> *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
> **
> *I got the following error message: cannot find path to blastall.*
> *The code I used is (modified from HOWTObeginners):

Bioperl doesn't know where you installed blast. If you've actually 
installed it, you can set the environment variable BLASTDIR to point to 
the directory that contains the blastall executable.


From cain.cshl at gmail.com  Fri Dec 15 13:09:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 15 Dec 2006 13:09:48 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and
	mandatory	type	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <1166206188.2569.380.camel@localhost.localdomain>

On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote:
> 
> To tell the truth I don't know if this is where the mandatory checks  
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
> 
> I agree with Scott (and Matthew) that SOFA checks should be  
> optional.  Matthew, can you write up a patch and maybe some tests?
> 
> chris
> 
That's not where they were added in, it just that they hadn't been fully
implemented before then, so they didn't work (which probably meant they
weren't mandatory, though I don't remember (it could be that it just
croaked)).

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/b248a096/attachment-0002.bin>

From hlapp at gmx.net  Sun Dec 17 01:02:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 17 Dec 2006 01:02:04 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <458404BD.8030908@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>


On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:

> Lincoln Stein wrote:
>> This is very embarassing for me, particularly since I spent a lot  
>> of time
>> validating that Bio::Graphics was working properly before the  
>> 1.5.2 release
>> went out. How long before there is a 1.5.3 release? How about a  
>> 1.5.2.1release?
>
> I'm happy to try a point release for critical bug fixes. Why don't you
> commit the necessary fixes to branch-1-5-2 and let me know when you're
> happy, and I'll do 1.5.2.1.

Feel free to do that, but why not make a 1.5.3 off the main trunk?  
1.5.2.1 may be adding more to the version confusion (developer/stable/ 
point-release/etc) than it is worth, and there is no shame in  
releasing new developer versions every few weeks.

My $0.02 ...

	-hilmar


>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From fgarret at ub.edu  Mon Dec 18 07:07:02 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 13:07:02 +0100
Subject: [Bioperl-l] codeml
Message-ID: <45868466.508@ub.edu>

Hi all,

I've been using bioperl's PAML module (specifically the codeml part) but 
with just one tree.

Since the program accepts several trees as input (and runs the analysis 
for each tree outputting the difference in likelihoods for each one) I 
was wondering if there's some way to do it through bioperl?

thanks in adv,
FG


From heikki at sanbi.ac.za  Mon Dec 18 08:51:50 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 18 Dec 2006 15:51:50 +0200
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
Message-ID: <200612181551.51277.heikki@sanbi.ac.za>


Reading the discussion, I think it is time to draw some guidelines.

1. Base the Meta implementation to a real use cases.

   MSA is a good example.

2. Allow generalisations

   If you can see an other implementation of the same idea that can be merged 
   with the first do it but do not hurt yourself if you can not.


The most difficult question is how to separate case-specific attributes that 
are best implemented by subclassing with additional methods from truly widely 
variable meta data that is best done as a parallel track meta information 
holding class.

The problem I see with undefined, totally open meta annotation, is that if you 
can put anything in there, it is also totally confusing to a user. If you can 
put anything in, how do you know what to get get out and know that it is 
there?

That leads to the the third guideline:

3. Use separate meta classes only when there are several different ways of 
encoding data that is present in large numbers *and* when you are expecting 
to be assessing the data computationally rather than just checking if an 
attribute is there. 


	-Heikki


On Friday 15 December 2006 19:23, Chris Fields wrote:
> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
> > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
> >>> Hey Chris,
> >>>
> >>> My thoughts below.
> >>>
> >>>> [Chris]
> >>>> This could be used to annotate any
> >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
> >>>> you,
> >>>> maybe in a collection (similar to AnnotationCollection).  I thought
> >>>> something like this may be of general use for any PrimarySeq
> >>>> (quality, structure), alignments like NEXUS and Stockholm,
> >>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
> >>>> etc.
> >>>>
> >>>> However, this also seems to fall into the category of sequence
> >>>> annotation.  So, would it be better to have a set of
> >>>> Bio::Annotation
> >>>> classes used for this purpose?
> >>>
> >>> To me, all meta data is equal. That is, your classic Genbank feature
> >>> annotation and a user's arbitrary meta-tag like "Bob thinks this
> >>> is a
> >>> kinase domain" aren't different in kind even if they are
> >>> different in
> >>> content.
> >>>
> >>> As resequencing projects multiply, the ability to create arbitrary
> >>> meta tags, attach them to different types of objects, and use those
> >>> tags to link them together will become desirable, if not essential.
> >>>
> >>> Keeping a common interface to all of these meta data types would be
> >>> advantageous, plus new users won't have to determine whether they
> >>> need to use Bio::Meta objects or Bio::Annotation objects.
> >>>
> >>> So I would argue for all of the meta data types to live "under one
> >>> roof". Which roof isn't as important. Bio::Annotation, since it
> >>> already exists for today's meta data, seems like a reasonable
> >>> choice.
> >>> (assuming Annotation objects are flexible enough to be extended as
> >>> you propose)
> >>>
> >>> There, and no flames or jibes even. :)
> >>
> >> I guess what I want to know is whether there should to be a
> >> distinction between 'normal' sequence annotation (comments,
> >> references, and so on) and annotation that could be best described as
> >> position-specific (like RNA or protein structural annotation).  The
> >> current meta implementation is for sequence data only; I felt it
> >> would be nice to have a generic implementation that would be
> >> applicable to any object data.
> >
> > my stream-of-consciousness for right now:
> >
> > I was thinking Bio::Annotation is where this should go - that
> > system doesn't have anything about it that makes it explicitly
> > sequence related. What we're trying to hammer out here on the
> > Alignment side - which fits with your RNA example - is have
> > features, basically SeqFeatures - associated with alignments so
> > columns can be annotated to cover things like character sets and
> > partitions for phylogenetic analyses.  As for data which annotates
> > non-contiguous things like RNAstems we may have  to be more
> > creative about that or model it with a splitLocation.
> >
> > So currently we've added code so that an Alignment is-a
> > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
> > end, with the goal of being able to capture more of the data that
> > can be represented in a NEXUS file.
> >
> > It feels more like a hack than an elegant Meta-data solution, but I
> > am totally sure whether the data you are thinking about doing at
> > this point, perhaps I need to spend more time thinking about it.
> > Or are you worried about the idea of whether the semantic mapping
> > of the data into features or annotations is confusing users?
>
> Sorry in advance for the longish response here...
>
> My original thought was to have a generic abstract class capable of
> positionally describing data in any another class, similar to
> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
> Implementing classes would be capable of having different data
> structures based on their use (simple string, array, AoA, AoH, AoO).
> One MetaCollection class to contain them all in a tag-like system, so
> you could have mixed data types describe the same object.  The latter
> Collection class is so similar to AnnotationCollection that I agree
> Bio::Annotation would be the best place for this.
>
> The way I reconfigured Stockholm alignment parsing/writing is to use
> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
> capable of holding a sequence and several meta strings, stored as
> tags or 'names'.  However, there is no Meta object for alignments
> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
> rather have a generic Meta object independent of the sequence cruft.
>
> So for this partial Pfam alignment,
>
> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
> #=GR Q92SV1_RHIME/122-299 pAS .........................
> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
> #=GC SA_cons                 03002200312...1312414..676
> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
> //
>
> '#=GC' lines would be in generic meta string objects in the
> alignment, while '#=GR' tags would be in similar meta objects in the
> relevant sequences.  As long as both aren't AnnotatableI this isn't
> an issue.
>
> Similarly, NEXUS files which contained any position-based values
> could hold a meta string/array object in a similar tag.
>
> The basic scheme is:
>                      |--String
>
> Annotation::Meta----|--Array
>
>                      |--HorriblyComplexDataStruct
>
> Then I started thinking about where this could be applied, and
> whether a true Meta object needs to be constrained only to describing
> position-based data.  This somewhat relates to this bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>
> which seems to need a simple but unconstrained hash-of-arrays-based
> meta object.
>
> Then my head appropriately exploded...
>
> Hope everything is going well at the hackathon!  Looks like some
> interesting stuff coming out of it.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From fgarret at ub.edu  Mon Dec 18 11:18:31 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 17:18:31 +0100
Subject: [Bioperl-l] PAML files
Message-ID: <4586BF57.4090002@ub.edu>

Hi all,

does anyone knows how to get the name of the .ctl file created by the 
PAML module? Inside the tmp directory there are 2 files with random 
names (tree and ctl). Why do they have random names?? Wouldn't it be 
easier to assign them a fixed name?? For instance "codeml.ctl" and 
"tree.nwk"??

thanks in adv,
FG


From bix at sendu.me.uk  Mon Dec 18 11:15:21 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 16:15:21 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
Message-ID: <4586BE99.7020308@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
> 
>> Lincoln Stein wrote:
>>> This is very embarassing for me, particularly since I spent a lot
>>> of time validating that Bio::Graphics was working properly before
>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>> release? How about a 1.5.2.1release?
>> 
>> I'm happy to try a point release for critical bug fixes. Why don't
>> you commit the necessary fixes to branch-1-5-2 and let me know when
>> you're happy, and I'll do 1.5.2.1.
> 
> Feel free to do that, but why not make a 1.5.3 off the main trunk? 
> 1.5.2.1 may be adding more to the version confusion 
> (developer/stable/point-release/etc) than it is worth,

My feeling is that 1.5.3 should be reserved for some significant changes
and new features, and not just a few bug fixes. I'd say this causes less
confusion amongst users - they can associate '1.5.2' with a certain API
and feature-set, and the specific name of the file they download and
install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
matter at all to them.

I also won't have to make some major announcement about it; it will
simply be the most recent developer version of bioperl available so new
users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
1.5.2 users will only feel compelled to get it if they suffer from the
bugs fixed.


> and there is no shame in releasing new developer versions every few
> weeks.

I think doing frequent releases are inadvisable; such a quick release
won't have had much testing so we shouldn't encourage people to install
it: encouragement is implicit when a major new version comes out like
1.5.3. People who want to live on the edge can and should be using a
CVS checkout.


From bix at sendu.me.uk  Mon Dec 18 14:15:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 19:15:16 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
Message-ID: <4586E8C4.6030306@sendu.me.uk>

Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
> 
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>> 
>> Gabriel
> 
> I can confirm that.  It is definitely dropping them in merge_lineage
>  (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.

I confirm the 'dropped' nodes, but also claim that this is no bug.

For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.

The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.

Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.


PS. I changed the merge_lineage() algorithm to be 18x faster (from the 
absurd 3mins for making the 190 species tree to a more reasonable 10s), 
without changing the tree produced.


From fgarret at ub.edu  Mon Dec 18 15:01:38 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:01:38 +0100
Subject: [Bioperl-l] PAML files
In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
Message-ID: <4586F3A2.4010607@ub.edu>


Hi Jason,

This question is related with the one I made previously today.
I need to run codeml with 3 tree topologies. I looked on codeml module 
but it only accepts one tree as input so I thought of using the codeml 
module to prepare all the files and then I would just have to run the 
codeml with the new tree file in batch. But for that I need to know 
which one is the ctl file.

any idea?
FG

Jason Stajich wrote:
> They are temporary names so they are deliberately random and there is no 
> intention of you needing them after a run since it to be cleaned up on 
> the fly. We use an internal method for generating tempfiles that takes 
> care of cleanup afterwards.  I suppose since we do all the work within a 
> temp directory that is cleaned up, one could have a fixed name for the 
> tree, alignment, and ctl files but honestly we never expect people to be 
> reading these filenames as they are intended to be transient.
> 
> What problem are you having that you need the filename?
> 
> -jason
> On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> does anyone knows how to get the name of the .ctl file created by the 
>> PAML module? Inside the tmp directory there are 2 files with random 
>> names (tree and ctl). Why do they have random names?? Wouldn't it be 
>> easier to assign them a fixed name?? For instance "codeml.ctl" and 
>> "tree.nwk"??
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
> 
> 


From fgarret at ub.edu  Mon Dec 18 15:07:46 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:07:46 +0100
Subject: [Bioperl-l] codeml
In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
References: <45868466.508@ub.edu>
	<7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
Message-ID: <4586F512.1030209@ub.edu>


Right now it's impossible for me to write it.
By February or March I should have more time but I'll let you know.

FG

Jason Stajich wrote:
> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I 
> guess we'll need to allow the -tree option to accept and arrayref of trees.
> Are you willing to try write this patch?  It should be added as a 
> bug/feature request to bugzilla so it can be corrected in short order.
> 
> -jason
> On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> I've been using bioperl's PAML module (specifically the codeml part) but 
>> with just one tree.
>>
>> Since the program accepts several trees as input (and runs the analysis 
>> for each tree outputting the difference in likelihoods for each one) I 
>> was wondering if there's some way to do it through bioperl?
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich 
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 


From cjfields at uiuc.edu  Mon Dec 18 15:55:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 14:55:55 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4586E8C4.6030306@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>


On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that.  It is definitely dropping them in merge_lineage
>>  (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'  
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue  
> arises
> in all cases.

Makes sense now.  I personally would consider this a bug since the  
results are unexpected (so the docs need to be modified in order to  
clarify).  Some say tomato...

I suppose this is one of the issues one might run into when using  
NCBI taxonomy to build trees.

> Gabriel, please confirm this isn't a bug, or suggest how you  
> propose to
> see your taxa when they are not all leaves of the tree.

Having the nodes appear internally seems semantically correct to me.   
Is there any other way?

> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable  
> 10s),
> without changing the tree produced.

Definitely an improvement!

chris


From jason at bioperl.org  Mon Dec 18 14:33:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:33:32 -0500
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586BF57.4090002@ub.edu>
References: <4586BF57.4090002@ub.edu>
Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>

They are temporary names so they are deliberately random and there is  
no intention of you needing them after a run since it to be cleaned  
up on the fly. We use an internal method for generating tempfiles  
that takes care of cleanup afterwards.  I suppose since we do all the  
work within a temp directory that is cleaned up, one could have a  
fixed name for the tree, alignment, and ctl files but honestly we  
never expect people to be reading these filenames as they are  
intended to be transient.

What problem are you having that you need the filename?

-jason
On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:

> Hi all,
>
> does anyone knows how to get the name of the .ctl file created by the
> PAML module? Inside the tmp directory there are 2 files with random
> names (tree and ctl). Why do they have random names?? Wouldn't it be
> easier to assign them a fixed name?? For instance "codeml.ctl" and
> "tree.nwk"??
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjm at fruitfly.org  Mon Dec 18 16:50:00 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 18 Dec 2006 13:50:00 -0800
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>


I agree with everything Heikki is saying, I just wanted to highlight  
one paragraph:

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?

One solution is to give your annotation/metadata-model formal  
computational semantics and use ontologies to give additional  
semantics to your metadata tags. This provides both user information  
in the form of documentation, and a means of specifying to the  
computer exactly what should be done with the tags.

This is probably overkill for bioperl; but if the use cases being  
proposed do lean in the direction of a new metadata system that is  
not necessarily backwards compatible with the existing one, then I'd  
recommend checking out what's already out there before re-inventing  
the wheel. Perl RDF libraries are getting a little better.

If anyone is interested in pursuing this sort of thing (probably on a  
branch), let me know

On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.
>
> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.
>
>
> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.
>
> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki
>
>
>
> On Friday 15 December 2006 19:23, Chris Fields wrote:
>> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
>>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>>>> Hey Chris,
>>>>>
>>>>> My thoughts below.
>>>>>
>>>>>> [Chris]
>>>>>> This could be used to annotate any
>>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
>>>>>> you,
>>>>>> maybe in a collection (similar to AnnotationCollection).  I  
>>>>>> thought
>>>>>> something like this may be of general use for any PrimarySeq
>>>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>>>> SeqFeatures where structure could be stored (tRNA or  
>>>>>> riboswitches),
>>>>>> etc.
>>>>>>
>>>>>> However, this also seems to fall into the category of sequence
>>>>>> annotation.  So, would it be better to have a set of
>>>>>> Bio::Annotation
>>>>>> classes used for this purpose?
>>>>>
>>>>> To me, all meta data is equal. That is, your classic Genbank  
>>>>> feature
>>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this
>>>>> is a
>>>>> kinase domain" aren't different in kind even if they are
>>>>> different in
>>>>> content.
>>>>>
>>>>> As resequencing projects multiply, the ability to create arbitrary
>>>>> meta tags, attach them to different types of objects, and use  
>>>>> those
>>>>> tags to link them together will become desirable, if not  
>>>>> essential.
>>>>>
>>>>> Keeping a common interface to all of these meta data types  
>>>>> would be
>>>>> advantageous, plus new users won't have to determine whether they
>>>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>>>
>>>>> So I would argue for all of the meta data types to live "under one
>>>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>>>> already exists for today's meta data, seems like a reasonable
>>>>> choice.
>>>>> (assuming Annotation objects are flexible enough to be extended as
>>>>> you propose)
>>>>>
>>>>> There, and no flames or jibes even. :)
>>>>
>>>> I guess what I want to know is whether there should to be a
>>>> distinction between 'normal' sequence annotation (comments,
>>>> references, and so on) and annotation that could be best  
>>>> described as
>>>> position-specific (like RNA or protein structural annotation).  The
>>>> current meta implementation is for sequence data only; I felt it
>>>> would be nice to have a generic implementation that would be
>>>> applicable to any object data.
>>>
>>> my stream-of-consciousness for right now:
>>>
>>> I was thinking Bio::Annotation is where this should go - that
>>> system doesn't have anything about it that makes it explicitly
>>> sequence related. What we're trying to hammer out here on the
>>> Alignment side - which fits with your RNA example - is have
>>> features, basically SeqFeatures - associated with alignments so
>>> columns can be annotated to cover things like character sets and
>>> partitions for phylogenetic analyses.  As for data which annotates
>>> non-contiguous things like RNAstems we may have  to be more
>>> creative about that or model it with a splitLocation.
>>>
>>> So currently we've added code so that an Alignment is-a
>>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
>>> end, with the goal of being able to capture more of the data that
>>> can be represented in a NEXUS file.
>>>
>>> It feels more like a hack than an elegant Meta-data solution, but I
>>> am totally sure whether the data you are thinking about doing at
>>> this point, perhaps I need to spend more time thinking about it.
>>> Or are you worried about the idea of whether the semantic mapping
>>> of the data into features or annotations is confusing users?
>>
>> Sorry in advance for the longish response here...
>>
>> My original thought was to have a generic abstract class capable of
>> positionally describing data in any another class, similar to
>> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
>> Implementing classes would be capable of having different data
>> structures based on their use (simple string, array, AoA, AoH, AoO).
>> One MetaCollection class to contain them all in a tag-like system, so
>> you could have mixed data types describe the same object.  The latter
>> Collection class is so similar to AnnotationCollection that I agree
>> Bio::Annotation would be the best place for this.
>>
>> The way I reconfigured Stockholm alignment parsing/writing is to use
>> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
>> capable of holding a sequence and several meta strings, stored as
>> tags or 'names'.  However, there is no Meta object for alignments
>> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
>> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
>> rather have a generic Meta object independent of the sequence cruft.
>>
>> So for this partial Pfam alignment,
>>
>> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
>> #=GR Q92SV1_RHIME/122-299 pAS .........................
>> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
>> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
>> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
>> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
>> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
>> #=GC SA_cons                 03002200312...1312414..676
>> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
>> //
>>
>> '#=GC' lines would be in generic meta string objects in the
>> alignment, while '#=GR' tags would be in similar meta objects in the
>> relevant sequences.  As long as both aren't AnnotatableI this isn't
>> an issue.
>>
>> Similarly, NEXUS files which contained any position-based values
>> could hold a meta string/array object in a similar tag.
>>
>> The basic scheme is:
>>                      |--String
>>
>> Annotation::Meta----|--Array
>>
>>                      |--HorriblyComplexDataStruct
>>
>> Then I started thinking about where this could be applied, and
>> whether a true Meta object needs to be constrained only to describing
>> position-based data.  This somewhat relates to this bug:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>>
>> which seems to need a simple but unconstrained hash-of-arrays-based
>> meta object.
>>
>> Then my head appropriately exploded...
>>
>> Hope everything is going well at the hackathon!  Looks like some
>> interesting stuff coming out of it.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Dec 18 14:35:50 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:35:50 -0500
Subject: [Bioperl-l] codeml
In-Reply-To: <45868466.508@ub.edu>
References: <45868466.508@ub.edu>
Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>

This is shortcoming in the Run::Phylo::PAML::Codeml implementation -  
I guess we'll need to allow the -tree option to accept and arrayref  
of trees.
Are you willing to try write this patch?  It should be added as a bug/ 
feature request to bugzilla so it can be corrected in short order.

-jason
On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:

> Hi all,
>
> I've been using bioperl's PAML module (specifically the codeml  
> part) but
> with just one tree.
>
> Since the program accepts several trees as input (and runs the  
> analysis
> for each tree outputting the difference in likelihoods for each one) I
> was wondering if there's some way to do it through bioperl?
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From gowthaman.ramasamy at sbri.org  Mon Dec 18 17:19:09 2006
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 18 Dec 2006 14:19:09 -0800
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>


Hi List,
Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

Many thanks in advance,
gowtham


From cjfields at uiuc.edu  Mon Dec 18 17:33:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:33:34 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <FBD2CED3-EBE7-4CB9-8969-70C7A5931A04@uiuc.edu>


On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.

AlignIO::stockholm is where I'll initially test it out.

> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.

I agree.

> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.

I would probably start with a general Bio::Annotation::MetaI abstract  
class, which supplements AnnotationI with general meta-specific  
methods (meta, meta_text, named_meta, etc)?  Implement this in  
whatever way one wanted (RNA structure as strings, quality data as  
arrays, etc) under the constraints of the interface description.

Multiple meta objects, potentially of mixed data types, could be  
added in an AnnotationCollection along with other Bio::Annotation  
data, or stored in a nested meta-specific AnnotationCollection object  
(I favor the former as it's simpler).  So you could have an  
alignment, sequence, seqfeature (anything that is AnnotatableI) with  
a regular AnnotationCollection also containing possibly multiple meta  
objects, each meta object also containing possibly more than one set  
of meta data.

The key issue I have is whether or not to constrain these to  
describing positional data, similar to Bio::Seq::Meta, by ensuring  
that the data is_flush(), etc.  My current inclination is 'no', and  
to have a separate abstract class which describes these methods,  
implementing those separately.

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki

The initial use case for this would be simple data strings for  
alignment data.  I already have a partial implementation in place for  
stockholm using Bio::Seq::Meta (which led me to this proposal!).  I  
like Chris M.'s idea of ensuring that meta implementations use some  
sort of formalized ontology, but I'll probably start out very simple  
and work up from there.

chris


From cjfields at uiuc.edu  Mon Dec 18 17:38:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:38:14 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <4586BE99.7020308@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
	<4586BE99.7020308@sendu.me.uk>
Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu>


On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>>
>> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
>>
>>> Lincoln Stein wrote:
>>>> This is very embarassing for me, particularly since I spent a lot
>>>> of time validating that Bio::Graphics was working properly before
>>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>>> release? How about a 1.5.2.1release?
>>>
>>> I'm happy to try a point release for critical bug fixes. Why don't
>>> you commit the necessary fixes to branch-1-5-2 and let me know when
>>> you're happy, and I'll do 1.5.2.1.
>>
>> Feel free to do that, but why not make a 1.5.3 off the main trunk?
>> 1.5.2.1 may be adding more to the version confusion
>> (developer/stable/point-release/etc) than it is worth,
>
> My feeling is that 1.5.3 should be reserved for some significant  
> changes
> and new features, and not just a few bug fixes. I'd say this causes  
> less
> confusion amongst users - they can associate '1.5.2' with a certain  
> API
> and feature-set, and the specific name of the file they download and
> install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
> matter at all to them.
>
> I also won't have to make some major announcement about it; it will
> simply be the most recent developer version of bioperl available so  
> new
> users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
> 1.5.2 users will only feel compelled to get it if they suffer from the
> bugs fixed.
>
>
>> and there is no shame in releasing new developer versions every few
>> weeks.
>
> I think doing frequent releases are inadvisable; such a quick release
> won't have had much testing so we shouldn't encourage people to  
> install
> it: encouragement is implicit when a major new version comes out like
> 1.5.3. People who want to live on the edge can and should be using a
> CVS checkout.

I thought that 1.5.2 was considered a point release for the 1.5 dev  
series, for bug fixes along with the potential for added/experimental  
features.  Similarly, 1.6.x releases would be point releases for bug  
fixes only with all tests passing (no added features since it is a  
stable release series).  I guess one could reason that 1.5.x releases  
have both bug fixes and new features, while 1.5.x.y releases are  
simply bug fixes for the 1.5.x branch (no new features).  We probably  
should add something to the FAQ and maybe make a few changes to the  
1.5.2 wiki page.

I think having a 1.5.2.1 release is feasible as a quick one-off to  
get Lincoln's fixes in, since you would make them off the 1.5.2  
branch anyway (so I guess it could be considered a bug release from  
that branch).  It's probably not something we should make a habit of,  
but then again I'm not the Pumpkin!

chris


From bix at sendu.me.uk  Mon Dec 18 17:50:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 22:50:11 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
Message-ID: <45871B23.8070103@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
> 
>> For example, the first 'drop' happens for the 101st species which is
>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>> longer a leaf, so the overall number of leaves does not increase.
>
> Makes sense now.  I personally would consider this a bug since the 
> results are unexpected (so the docs need to be modified in order to 
> clarify).  Some say tomato...
> 
> I suppose this is one of the issues one might run into when using NCBI 
> taxonomy to build trees.

No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
deliberately then does:

# simple paths are contracted by removing degree one nodes
$tree->contract_linear_paths;

Because that is what Gabriel's script originally did.


>> Gabriel, please confirm this isn't a bug, or suggest how you propose to
>> see your taxa when they are not all leaves of the tree.
> 
> Having the nodes appear internally seems semantically correct to me.  Is 
> there any other way?

I suppose if we want to see all the input species output again we have 
to make contract_linear_paths() aware of nodes we want to keep, even 
when they are degree one nodes. Gabriel, is that what you want to see?


From cjfields at uiuc.edu  Mon Dec 18 18:14:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:14:23 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <45871B23.8070103@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
Message-ID: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>


On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'  
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now.  I personally would consider this a bug since the  
>> results are unexpected (so the docs need to be modified in order  
>> to clarify).  Some say tomato...
>> I suppose this is one of the issues one might run into when using  
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl  
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.

I think you misunderstood me.  The tree is fine; the data used to  
make the tree (NCBI taxonomy) is the issue.  One of the clear caveats  
that NCBI attaches to their taxonomy data is that should not be the  
'primary source for taxonomic or phylogenetic information':

http://tinyurl.com/y3k624

I think it works as a good guide as long as one takes the above into  
consideration.  That and the fact that not all taxids attached to  
sequence data will represent leaf nodes.

chris


From cjfields at uiuc.edu  Mon Dec 18 18:15:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:15:56 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
	<6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu>


On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote:

>
> I agree with everything Heikki is saying, I just wanted to highlight
> one paragraph:
>
>> The problem I see with undefined, totally open meta annotation, is
>> that if you
>> can put anything in there, it is also totally confusing to a user.
>> If you can
>> put anything in, how do you know what to get get out and know that
>> it is
>> there?
>
> One solution is to give your annotation/metadata-model formal
> computational semantics and use ontologies to give additional
> semantics to your metadata tags. This provides both user information
> in the form of documentation, and a means of specifying to the
> computer exactly what should be done with the tags.
>
> This is probably overkill for bioperl; but if the use cases being
> proposed do lean in the direction of a new metadata system that is
> not necessarily backwards compatible with the existing one, then I'd
> recommend checking out what's already out there before re-inventing
> the wheel. Perl RDF libraries are getting a little better.
>
> If anyone is interested in pursuing this sort of thing (probably on a
> branch), let me know
...

I like the idea of of using ontologies (although that's one of my  
many weak points!).  I'll likely start off with simple examples using  
meta data initially, then progress from there.  It is a developer  
series, after all!

Thanks everybody!  I think I have an idea on how to at least get  
started.

chris


From bix at sendu.me.uk  Mon Dec 18 18:27:15 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:27:15 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
Message-ID: <458723D3.4010908@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>>> For example, the first 'drop' happens for the 101st species which is
>>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>>>> longer a leaf, so the overall number of leaves does not increase.
>>>
>>> Makes sense now.  I personally would consider this a bug since the 
>>> results are unexpected (so the docs need to be modified in order to 
>>> clarify).  Some say tomato...
>>> I suppose this is one of the issues one might run into when using 
>>> NCBI taxonomy to build trees.
>>
>> No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
>> deliberately then does:
>>
>> # simple paths are contracted by removing degree one nodes
>> $tree->contract_linear_paths;
>>
>> Because that is what Gabriel's script originally did.
> 
> I think you misunderstood me.  The tree is fine; the data used to make 
> the tree (NCBI taxonomy) is the issue.

In what way is it the issue? The database is also fine as far as I can 
see, in so far as it is not causing any problems in this instance.

Gabriel asked for a tree featuring a species and its subspecies. The 
NCBI taxonomy database provided Bioperl the correct data to build such a 
tree. Then Gabriel asked to remove the degree one nodes of his tree. His 
problem was that doing that happened to (correctly) remove the species 
node. If he wants to see both his species and his subspecies he must 
either not remove degree one nodes, or alter the method of doing so to 
keep desired nodes. There is no possible way for NCBI to improve matters 
here.


From bix at sendu.me.uk  Mon Dec 18 18:45:59 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:45:59 +0000
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45872837.6050403@sendu.me.uk>

Gowthaman Ramasamy wrote:
> Hi List, Is there any module in bioperl which can find out the primer
> binding sites in a genomic sequence. I am interested in finding
> locations with few mismatches along the primer...not just the exact
> match (which is a very trivial task)

There's no module dedicated to that task, but Bioperl may help you to
answer the question.

Probably the easiest/reliable/clear thing to do is to do a Blast with
appropriate settings for short sequence with few mismatches. You can
write a script to only consider hits for your forward primer that are a
'primable' distance from a hit to your reverse primer (and check their
orientations are correct as well).

Or use some e-pcr tool.


From Kevin.M.Brown at asu.edu  Mon Dec 18 18:52:20 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 18 Dec 2006 16:52:20 -0700
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu>

A function I use to find the first landing site for a primer.  Should be
modifiable to handle multiple occurences:

=head1 C<match>

Match searches for a near alignment between two strings and returns the
position
at which the two strings align.  Match is based on 80% conformation

	match($this, $in_that)
	
=cut

sub match
{
	my ($primer, $gene) = @_;
	my $start   = 0;
	my $pattern = "";
	for (my $i = 0 ; $i < length($primer) ; $i++)
	{
		$pattern .= substr($primer, $i, 1);
		pos($gene) = 0;
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
		else
		{
			$start = 0;
			chop($pattern);
			$pattern .= '.';
		}
	}
	if ($pattern =~ /\.$/)
	{
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
	}
	$pattern =~ s/\.//g;

	if ((length($pattern) / length($primer)) > .8)
	{

		#print $start . "\n";
		return $start;
	}
	return 0;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, December 18, 2006 4:46 PM
> To: Gowthaman Ramasamy
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] module to find out primer binding 
> sites in a genome sequence
> 
> Gowthaman Ramasamy wrote:
> > Hi List, Is there any module in bioperl which can find out 
> the primer
> > binding sites in a genomic sequence. I am interested in finding
> > locations with few mismatches along the primer...not just the exact
> > match (which is a very trivial task)
> 
> There's no module dedicated to that task, but Bioperl may help you to
> answer the question.
> 
> Probably the easiest/reliable/clear thing to do is to do a Blast with
> appropriate settings for short sequence with few mismatches. You can
> write a script to only consider hits for your forward primer 
> that are a
> 'primable' distance from a hit to your reverse primer (and check their
> orientations are correct as well).
> 
> Or use some e-pcr tool.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From torsten.seemann at infotech.monash.edu.au  Mon Dec 18 18:52:58 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 19 Dec 2006 10:52:58 +1100
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <458729DA.9030909@infotech.monash.edu.au>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

This FAQ question may help:
http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

This software may help:
http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sdavis2 at mail.nih.gov  Mon Dec 18 21:16:19 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 18 Dec 2006 21:16:19 -0500
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45874B73.7010600@mail.nih.gov>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)
>   

See here:

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

It is designed for exactly this task, is very fast, is available as an 
executable or web-based (though watch the usage requirements), and the 
output can be parsed rather easily.

Sean


From cjfields at uiuc.edu  Mon Dec 18 21:30:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 20:30:04 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <458723D3.4010908@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>

>> I think you misunderstood me.  The tree is fine; the data used to  
>> make
>> the tree (NCBI taxonomy) is the issue.
>
> In what way is it the issue? The database is also fine as far as I can
> see, in so far as it is not causing any problems in this instance.

I should maybe have clarified a bit more: what I said has nothing to  
do with the structure of the database itself.  I was just pointing  
out that NCBI Taxonomy isn't the best source of data for building a  
phylogenetic tree, something NCBI also points out (the link in my  
last post).  Not a big deal, really.

> Gabriel asked for a tree featuring a species and its subspecies. The
> NCBI taxonomy database provided Bioperl the correct data to build  
> such a
> tree. Then Gabriel asked to remove the degree one nodes of his  
> tree. His
> problem was that doing that happened to (correctly) remove the species
> node. If he wants to see both his species and his subspecies he must
> either not remove degree one nodes, or alter the method of doing so to
> keep desired nodes. There is no possible way for NCBI to improve  
> matters
> here.

Actually, there isn't any way they could w/o digging through the  
literature in many cases.  The problem is incomplete taxonomic  
information for nodes derived from older sequence data, where a genus  
and species was designated but nothing else (strain, etc) is known.

Again, I merely was pointing out what I had mentioned above.  I  
wasn't criticizing you, Gabriel, or the methodology here.  Honest!

chris


From avilella at gmail.com  Mon Dec 18 16:43:27 2006
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 18 Dec 2006 21:43:27 +0000
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586F3A2.4010607@ub.edu>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
	<4586F3A2.4010607@ub.edu>
Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com>

Filipe, if you need to create the ctl file but not run the job, you
can use the "prepare" method in Codeml run.

Also, there is a tmpdir and save_tempfiles method that will keep the
files where you want. About the naming, you can add a ".tree" and
".aln" extension to the tempnames if you want, by altering the
$temptreefile and $tempseqfile variables in
bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version).

If you want, you can also add a couple of getters/setters there:

sub alnfilename{
    my $self = shift;

    return $self->{'alnfilename'} = shift if @_;
    return $self->{'alnfilename'};
}

and subtitute those $tempseqfile io calls for you
$self->{'alnfilename'} io calls.

$codeml->alnfilename("/path/name");
$codeml->prepare;
...
$codeml->run;

What I use to do is to have the aln and tree files in a different
place. Codeml will create the tmp files for running somewhere, and
then delete all the stuff when done.

Cheers,

    Albert.

On 12/18/06, Filipe Garrett <fgarret at ub.edu> wrote:
>
> Hi Jason,
>
> This question is related with the one I made previously today.
> I need to run codeml with 3 tree topologies. I looked on codeml module
> but it only accepts one tree as input so I thought of using the codeml
> module to prepare all the files and then I would just have to run the
> codeml with the new tree file in batch. But for that I need to know
> which one is the ctl file.
>
> any idea?
> FG
>
> Jason Stajich wrote:
> > They are temporary names so they are deliberately random and there is no
> > intention of you needing them after a run since it to be cleaned up on
> > the fly. We use an internal method for generating tempfiles that takes
> > care of cleanup afterwards.  I suppose since we do all the work within a
> > temp directory that is cleaned up, one could have a fixed name for the
> > tree, alignment, and ctl files but honestly we never expect people to be
> > reading these filenames as they are intended to be transient.
> >
> > What problem are you having that you need the filename?
> >
> > -jason
> > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> >
> >> Hi all,
> >>
> >> does anyone knows how to get the name of the .ctl file created by the
> >> PAML module? Inside the tmp directory there are 2 files with random
> >> names (tree and ctl). Why do they have random names?? Wouldn't it be
> >> easier to assign them a fixed name?? For instance "codeml.ctl" and
> >> "tree.nwk"??
> >>
> >> thanks in adv,
> >> FG
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> > http://jason.open-bio.org/
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From valiente at lsi.upc.edu  Mon Dec 18 23:18:20 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 19 Dec 2006 13:18:20 +0900
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>

Thanks a lot for the prompt answer and follow-up discussion. I think  
this turned out not to be a bug in the merge_lineage() code but a  
direct consequence of building a phylogenetic tree instead of a  
taxonomic tree, aka with internal node labels.

In order to reconstruct the NCBI taxonomy for the set of species  
present in a given phylogenetic tree, the only reasonable work-around  
seems to be a first step of merging lineages and contracting linear  
paths with the current implementation, followed by a second step of  
restricting the given phylogenetic tree to the set of species present  
in the obtained NCBI taxonomy. But this does not affect the code for  
merge_lineage().

Gabriel

>>> I think you misunderstood me.  The tree is fine; the data used to  
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I  
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing  
> to do with the structure of the database itself.  I was just  
> pointing out that NCBI Taxonomy isn't the best source of data for  
> building a phylogenetic tree, something NCBI also points out (the  
> link in my last post).  Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build  
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his  
>> tree. His
>> problem was that doing that happened to (correctly) remove the  
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing  
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve  
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the  
> literature in many cases.  The problem is incomplete taxonomic  
> information for nodes derived from older sequence data, where a  
> genus and species was designated but nothing else (strain, etc) is  
> known.
>
> Again, I merely was pointing out what I had mentioned above.  I  
> wasn't criticizing you, Gabriel, or the methodology here.  Honest!
>
> chris


From cjfields at uiuc.edu  Mon Dec 18 23:41:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 22:41:16 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
	<287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
Message-ID: <D72C19DB-B551-414E-96AF-113B32A34BCB@uiuc.edu>


On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote:

> Thanks a lot for the prompt answer and follow-up discussion. I  
> think this turned out not to be a bug in the merge_lineage() code  
> but a direct consequence of building a phylogenetic tree instead of  
> a taxonomic tree, aka with internal node labels.
>
> In order to reconstruct the NCBI taxonomy for the set of species  
> present in a given phylogenetic tree, the only reasonable work- 
> around seems to be a first step of merging lineages and contracting  
> linear paths with the current implementation, followed by a second  
> step of restricting the given phylogenetic tree to the set of  
> species present in the obtained NCBI taxonomy. But this does not  
> affect the code for merge_lineage().
>
> Gabriel

I did notice one thing, though it's minor: if you use the option to  
retrieve the data from Entrez, a few species aren't found (even  
though they show up in a local taxonomy search).  I think both were  
E. coli strains.

chris


From DGroskreutz at twt.com  Tue Dec 19 02:00:40 2006
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Tue, 19 Dec 2006 01:00:40 -0600
Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office.
Message-ID: <OFEB7AC000.56E72ED8-ON86257249.002683B4-86257249.002683B4@twt.com>


I will be out of the office starting  12/18/2006 and will not return until
01/02/2007.


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:20:56 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:20:56 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:27:59 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:27:59 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk>

Sorry, problem solved.

Mick 

-----Original Message-----
From: michael watson (IAH-C) 
Sent: 19 December 2006 12:21
To: bioperl-l at lists.open-bio.org
Subject: Problems with EMBL entries and fasta IDs?

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From roest216 at student.otago.ac.nz  Tue Dec 19 04:15:55 2006
From: roest216 at student.otago.ac.nz (Stephan Roessner)
Date: Tue, 19 Dec 2006 22:15:55 +1300
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>

Dear support team,

I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
gbrowse.
The installation seems to work (except of the test failures) but the
gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
of course it requires 1.52.

Is there a chance to find out what went wrong?

thanks a lot,
Stephan


From bix at sendu.me.uk  Tue Dec 19 10:12:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 15:12:39 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
Message-ID: <45880167.9010605@sendu.me.uk>

Stephan Roessner wrote:
> Dear support team,
> 
> I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> gbrowse.
> The installation seems to work (except of the test failures) but the
> gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> of course it requires 1.52.
> 
> Is there a chance to find out what went wrong?

Nothing went wrong with the Bioperl installation (well, expect there 
shouldn't have been any test failures - can you post those please?). 
gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
me exactly where you downloaded gbrowse from and how you went about 
installing it, and provide the exact, complete error message you got 
from it, I might be able help the authors fix the problem.

Or I'm pretty sure they can figure it our for themselves :)


From cjfields at uiuc.edu  Tue Dec 19 11:05:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 10:05:01 -0600
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>


On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:

> I really don't think the BioPerl version detection is wrong.  I  
> actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
> try reinstalling BioPerl and providing the --uninst 1 argument to  
> remove
> older versions of BioPerl:
>
>   sudo ./Build install --uninst 1
>
> Scott

Could having two Bioperl instances explain the test failures?  I'm  
not sure (maybe Sendu can answer this), but I would assume  
Module::Build uses the current working directory for test runs.

chris


From bix at sendu.me.uk  Tue Dec 19 12:02:34 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:02:34 +0000
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
Message-ID: <45881B2A.8060907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:
> 
>> I really don't think the BioPerl version detection is wrong.  I actually
>> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
>> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
>> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
>> have seen this happen when more than one BioPerl instance is installed
>> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
>> try reinstalling BioPerl and providing the --uninst 1 argument to remove
>> older versions of BioPerl:
>>
>>   sudo ./Build install --uninst 1
>>
>> Scott
> 
> Could having two Bioperl instances explain the test failures?  I'm not 
> sure (maybe Sendu can answer this), but I would assume Module::Build 
> uses the current working directory for test runs.

It does, so that shouldn't be an issue for the test failures.


From ferraria at gmail.com  Tue Dec 19 11:40:05 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 17:40:05 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
Message-ID: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From bix at sendu.me.uk  Tue Dec 19 12:06:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:06:03 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>	
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <45881BFB.7020008@sendu.me.uk>

Scott Cain wrote:
> I really don't think the BioPerl version detection is wrong.  I actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.

Yes, I saw that, which is why I thought I must be looking at something 
different to what the OP had installed.


> My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove
> older versions of BioPerl:
> 
>   sudo ./Build install --uninst 1

My confusion is that he has definitely installed 1.5.2 and this version 
is being polled for its version number (by something!) and returning the 
correct '1.0050021', whilst the something expects '1.52'. Anyway, this 
can only be resolved if Stephan provides the real error message and its 
context.


From cjfields at uiuc.edu  Tue Dec 19 12:27:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 11:27:24 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>


On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:

> Hi all,
>
> I've just installed BioPerl 1.5.2 (devel) on a linux mandrake  
> machine with
> the cpan shell.
> I want to use the Bio::DB::EUtilities to retrieve data (id's) from  
> NCBI
> 'gene' database (first step of my pipeline).
>
> But the installation of this package doesn't seem to be correct :
> The simple example given on the documentation doesn't work. (this  
> one :
> http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
>
> Here is the error message I got :
> "Can't use an undefined value as an ARRAY reference at
> /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> In the UserAgent package, line 779 is in the private "_need_proxy"
> subroutine and corresponds to the code :    ...if (@{ $self-> 
> {'no_proxy'} })
> ...
>
> If I comment this line in the UserAgent package and the  
> corresponding "}",
> the example works. But obviously, I prefer to solve the problem in  
> a regular
> way :)
>
> Indeed, my computer accesses the internet via a http proxy and I  
> didn't tell
> this to BioPerl at any moment.
> As I read on the BioPerl Wiki site, I tried to configure an  
> $http_proxy
> environment variable but it still doesn't work.
>
> One last maybe important information is that I saw during the  
> installation
> that the tests 't/EUtilities' were skipped because of an unrevealed  
> reason.
>
>
> So finally I got two questions :
> 1. Is there somebody who can figure out what is my problem ?
> 2. At the moment, is the Bio::DB::EUtilities package really  
> efficient or
> using directly the NCBI eutilities with the LWP::Simple package  
> could be an
> good alternative ?
>
> Many thanks in advance,
> Best Regards,
> Anthony Ferrari

First things first: at the moment the BioPerl EUtilities interface is  
very experimental (as specifically outlined in the POD), so I can't  
really recommend it for production use until the API is cleaned up.   
However, I do appreciate any feedback or comments re:EUtilities (the  
reason it's out there in the 1.5.2 release).

You might check out this bug report, which relates directly to your  
issue:

http://bugzilla.open-bio.org/show_bug.cgi?id=2109

After I worked out the proxy issue Torsten got it working.  Let me  
know if this doesn't help or fix the problem.

chris


From cain at cshl.edu  Tue Dec 19 10:31:50 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 19 Dec 2006 10:31:50 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <45880167.9010605@sendu.me.uk>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
Message-ID: <1166542310.6981.119.camel@localhost.localdomain>

I really don't think the BioPerl version detection is wrong.  I actually
don't check Bio::Root::Version::VERSION in Makefile.PL, I check
Bio::Graphics::Panel->api_version.  When it doesn't find the correct
api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
have seen this happen when more than one BioPerl instance is installed
and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
try reinstalling BioPerl and providing the --uninst 1 argument to remove
older versions of BioPerl:

  sudo ./Build install --uninst 1

Scott


On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> Stephan Roessner wrote:
> > Dear support team,
> > 
> > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> > gbrowse.
> > The installation seems to work (except of the test failures) but the
> > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> > of course it requires 1.52.
> > 
> > Is there a chance to find out what went wrong?
> 
> Nothing went wrong with the Bioperl installation (well, expect there 
> shouldn't have been any test failures - can you post those please?). 
> gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
> me exactly where you downloaded gbrowse from and how you went about 
> installing it, and provide the exact, complete error message you got 
> from it, I might be able help the authors fix the problem.
> 
> Or I'm pretty sure they can figure it our for themselves :)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/67132cb3/attachment-0002.bin>

From ferraria at gmail.com  Tue Dec 19 12:06:31 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 18:06:31 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <b2ec54b90612190906s2b4ddbf8g9b591372a85fdcd@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From stewarta at nmrc.navy.mil  Tue Dec 19 13:49:57 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Tue, 19 Dec 2006 13:49:57 -0500
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>

I see that Bio::Tools::Glimmer documentation clearly states that this  
module is intended for use with GlimmerM (eukaryotic version) only.   
I am wondering if anyone can recall any talk about adopting  
Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?   
I've searched the list history with little luck other than someone  
else  asking a similar question.

If not, does anyone have any thoughts on how difficult it might be to  
implement support for glimmer2/3 result parsing?  Perhaps just a  
matter of editing the _parse_predictions method?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From rvosa at sfu.ca  Tue Dec 19 13:53:47 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 10:53:47 -0800
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/276348b7/attachment-0001.pl>

From cjfields at uiuc.edu  Tue Dec 19 14:31:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 13:31:17 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu>


On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote:

> I see that Bio::Tools::Glimmer documentation clearly states that this
> module is intended for use with GlimmerM (eukaryotic version) only.
> I am wondering if anyone can recall any talk about adopting
> Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?
> I've searched the list history with little luck other than someone
> else  asking a similar question.

There is a thread here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ 
focus=12546

> If not, does anyone have any thoughts on how difficult it might be to
> implement support for glimmer2/3 result parsing?  Perhaps just a
> matter of editing the _parse_predictions method?

It depends on how different the various Glimmer formats are; I'll  
have to look at the ones Torsten added in CVS.  You could always try  
modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM  
reports, but based on the mail list thread above it may not be so  
straightforward.

chris


From MEC at stowers-institute.org  Tue Dec 19 14:57:48 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 19 Dec 2006 13:57:48 -0600
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
Message-ID: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>

Lincoln and fellow Bio::DB::SeqFeature travelers,

I find that using bp_seqfeature_load.PLS to load subfeatures of genes
already loaded using bp_seqfeature_load.PLS fails with 

------------- EXCEPTION  -------------
MSG: FBgn0017545 doesn't have a primary id
STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
STACK toplevel
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
ad.PLS:76

Where FBgn0017545 is the ID of a gene previously loaded.

I am unsure how to remedy my situation and welcome any advise on correct
or improved approach to my problem.

Here's some detail if it helps.  I am developing a pipeline to design a
microarray probes capable of distinguishing among splice variants in
drosophila (using latest Flybase dmel_r5.1 annotation).  So I

1) load a filtered selection of Flybase annotation using
bp_seqfeature_load.  (for testing purposes, I am using a single gene's
worth of annotation, FBgn0017545.gff, attached).  This is done as
follows:

	> bp_seqfeature_load.PLS  --create FBgn0017545.gff 

2) analyze all the genes in the database, and create GFF3 output each
feature of which has a 'Parent' that is a previously loaded gene (i.e.
FBgn0017545).  (These features represent the unique introns, splice
sites, and exonic design targets. Output of this analysis,
FBgn0017545_matd.gff, is also attached)

3) load these analysis results into the same database, as follows:

	> bp_seqfeature_load.PLS          FBgn0017545_matd.gff

It is at this point that I get the above error.

However, I don't get any error and the data loads fine if I load the two
files together, as follows:

	> bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
FBgn0017545_matd.gff)

So, I suspect that either I am misunderstanding when/how to use
bp_seqfeature_load.PLS or else this use case has not yet arisen and must
be provided for somehow.

I am running against bioperl-live

Thanks for your thoughts and assistance,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

From Kevin.M.Brown at asu.edu  Tue Dec 19 16:46:19 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 19 Dec 2006 14:46:19 -0700
Subject: [Bioperl-l] Bio::SimpleAlign
Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>

I'm working on a script that plays around with alignments of sequences
and one of the things I noticed is that the code for the match method
does not seem to actually use the start/end information when creating
the match between objects in the alignment.  Maybe I'm misunderstanding
what the alignment is supposed to hold in terms of sequence.  The
alignment objects I build up are based on the sequence of a gene and the
sequences of the primers that amplify that gene.

$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[0]->seq(),
				-id    => $seq[0]->id(),
				-start => $start,
				-end => $start + $seq[0]->length() - 1,
				-strand => 1
			 )
);
$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[1]->seq(),
				-id    => $seq[1]->id(),
				-start => $stop,
				-end => $stop + $seq[1]->length() - 1,
				-strand => -1
				)
);

So, you can see I input a start and stop point for the primer, but when
I use the match function all it does is match the first character of the
gene sequence to the first char of the primer sequences, then the second
gene char to the second in each primer, etc...  This doesn't seem to fit
with the documentation and seems odd that there would be holders for the
start/stop points and not use them when doing things like matching of
sequences in an alignment.


From bix at sendu.me.uk  Tue Dec 19 17:01:22 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 22:01:22 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
Message-ID: <45886132.7050505@sendu.me.uk>

Rutger Vos wrote:
> Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird
> version-string-translation way that makes 5.5 and 5.005 equivalent also?

Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 
5.500 however.


From lstein at cshl.edu  Tue Dec 19 16:58:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 19 Dec 2006 16:58:24 -0500
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
In-Reply-To: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com>

Hi Malcom,

Your second guess was right. The use case of augmenting an existing gene
with additional splice forms isn't provided for. You can get the
functionality by making direct calls to Bio::DB::SeqFeature::Store methods:

my @genes = $db->get_features_by_name('FBgn0017545');
@genes == 1 or die "Didn't get exactly one gene";
my $parent = $genes[0];

my $parent = $genes[0];
my $chr    = $parent->seq_id;
my $start  = $parent->start;
my $end    = $parent->end;
my $strand = $parent->strand;

my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA',
                       -source      => 'added',
                       -seq_id   => '4r',
                       -strand   => $strand,
                       -start    => $start+10,
                       -end      => $end,
                       );
$parent->add_SeqFeature($new_splice_form);

for my $pos ([$start+10,$start+100],[$start+200,$end]) {
  my ($e_start,$e_end) = @$pos;
  my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon',
                                      -store       => $db,
                      -seq_id      => '4r',
                      -strand     => $strand,
                      -start       => $e_start,
                      -end         => $e_end);
  $new_splice_form->add_SeqFeature($exon);
}

I found a bug in updating the seqfeature database when I wrote this script,
so you'll have to get the latest biperl live. I think you can use this to
write a splice form updating script.

In order to support the idea of adding new splice forms to an existing gene
using the GFF3 format, I will have to either modify the loader, or write a
separate script (probably better to do the latter). It shouldn't be hard if
you'd like to give it a try.

Lincoln

On 12/19/06, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln and fellow Bio::DB::SeqFeature travelers,
>
> I find that using bp_seqfeature_load.PLS to load subfeatures of genes
> already loaded using bp_seqfeature_load.PLS fails with
>
> ------------- EXCEPTION  -------------
> MSG: FBgn0017545 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
> ad.PLS:76
>
> Where FBgn0017545 is the ID of a gene previously loaded.
>
> I am unsure how to remedy my situation and welcome any advise on correct
> or improved approach to my problem.
>
> Here's some detail if it helps.  I am developing a pipeline to design a
> microarray probes capable of distinguishing among splice variants in
> drosophila (using latest Flybase dmel_r5.1 annotation).  So I
>
> 1) load a filtered selection of Flybase annotation using
> bp_seqfeature_load.  (for testing purposes, I am using a single gene's
> worth of annotation, FBgn0017545.gff, attached).  This is done as
> follows:
>
>         > bp_seqfeature_load.PLS  --create FBgn0017545.gff
>
> 2) analyze all the genes in the database, and create GFF3 output each
> feature of which has a 'Parent' that is a previously loaded gene (i.e.
> FBgn0017545).  (These features represent the unique introns, splice
> sites, and exonic design targets. Output of this analysis,
> FBgn0017545_matd.gff, is also attached)
>
> 3) load these analysis results into the same database, as follows:
>
>         > bp_seqfeature_load.PLS          FBgn0017545_matd.gff
>
> It is at this point that I get the above error.
>
> However, I don't get any error and the data loads fine if I load the two
> files together, as follows:
>
>         > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
> FBgn0017545_matd.gff)
>
> So, I suspect that either I am misunderstanding when/how to use
> bp_seqfeature_load.PLS or else this use case has not yet arisen and must
> be provided for somehow.
>
> I am running against bioperl-live
>
> Thanks for your thoughts and assistance,
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From rvosa at sfu.ca  Tue Dec 19 23:23:20 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 20:23:20 -0800
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/17ec7ff3/attachment-0001.pl>

From cjfields at uiuc.edu  Wed Dec 20 01:16:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 00:16:47 -0600
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>


On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:

> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the  
> NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more  
> loosely
> an OTU).
>
> The object would primarily function as a way to relate nodes in  
> trees to
> sequences in an alignment (a foreign key that both nodes and  
> sequences refer
> to), and secondarily as the keeper of the canonical name of the  
> OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node  
> named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer  
> to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).

Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence  
objects; at the moment LocatableSeqs don't store their own annotation  
but they could easily be made or subclassed to be AnnotatableI (i.e.  
they can store annotation collections).  I recently made SimpleAlign  
Annotatable; Jason has also made SimpleAlign implement  
FeatureHolderI, so alignments can store SeqFeatures as well; he may  
have his own designs here.

There may be a wide variety of ways to go about this.  I would  
probably do the following (bear in mind I'm a microbiologist, not a  
computer scientist).  If one could add trees as annotation to the  
alignment (i.e. if trees could be Annotation objects and kept in the  
SimpleAlign's annotation collection), and each sequence in the  
alignment contained reference to a node object of the tree (i.e. if  
Bio::Taxon/Bio::Species objects could also be Annotation objects, but  
kept in a LocatableSeq annotation collection), both could refer to  
the same node object.  This may not be exactly what you want, but  
maybe it's close?

SimpleAlign->AnnoColln->Tree->OTU(Nodes)
    \----->LocSeqs-->AnnoColln-----/

I suppose this could also be done with Seqfeatures...

> I was thinking that a (possibly expanded) Bio::Species might work  
> if there
> was some sensible way of appending references to node and sequence  
> objects
> to it (or otherwise associate them with each other), but I am  
> writing *to
> solicit any and all suggestions*. I am looking for something  
> similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos

Sendu would be the best one to speak about Bio::Taxon and  
Bio::Species and may have some ideas on the above.  The current plan  
was to deprecate Bio::Species, but who knows?

chris


From heikki at sanbi.ac.za  Wed Dec 20 05:25:08 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 20 Dec 2006 12:25:08 +0200
Subject: [Bioperl-l] Bio::SimpleAlign
In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
Message-ID: <200612201225.08862.heikki@sanbi.ac.za>

Kevin,

Sequences that are added to the alignment are supposed to be *aligned*. 
SimpleAlign does not do it for you. It seems to me that you are adding 
sequences like this:

nnnnnnnnnnnnnnnnnnnn  1 - 20, "a short gene" 
nnnnnn               21 - 26 "a short primer after the gene"

when you should be doing this

nnnnnnnnnnnnnnnnnnnn        1 - 20, "a short gene" 
--------------------nnnnnn 21 - 26 "a short primer after the gene"

Note that the default way of displaying names in SimpleAlign 
is "name/start-end". The name usually are expected to refer to the sequence 
from which this subsequence is derived from. The displayname does not change 
if you add gaps.


Yours,
	-Heikki


On Tuesday 19 December 2006 23:46, Kevin Brown wrote:
> I'm working on a script that plays around with alignments of sequences
> and one of the things I noticed is that the code for the match method
> does not seem to actually use the start/end information when creating
> the match between objects in the alignment.  Maybe I'm misunderstanding
> what the alignment is supposed to hold in terms of sequence.  The
> alignment objects I build up are based on the sequence of a gene and the
> sequences of the primers that amplify that gene.
>
> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[0]->seq(),
> 				-id    => $seq[0]->id(),
> 				-start => $start,
> 				-end => $start + $seq[0]->length() - 1,
> 				-strand => 1
> 			 )
> );

If your sequence does not contain gaps and the numbering starts from one, you 
can let the object handle start/stop:

my $a = new Bio::LocatableSeq(
      -seq   => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
      -id    => 'A00001',
      -strand => 1
}


> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[1]->seq(),
> 				-id    => $seq[1]->id(),
> 				-start => $stop,
> 				-end => $stop + $seq[1]->length() - 1,
> 				-strand => -1
> 				)
> );
>
> So, you can see I input a start and stop point for the primer, but when
> I use the match function all it does is match the first character of the
> gene sequence to the first char of the primer sequences, then the second
> gene char to the second in each primer, etc...  This doesn't seem to fit
> with the documentation and seems odd that there would be holders for the
> start/stop points and not use them when doing things like matching of
> sequences in an alignment.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From ferraria at gmail.com  Wed Dec 20 06:04:16 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 12:04:16 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
Message-ID: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>

On 19/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:
>
> > Hi all,
> >
> > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake
> > machine with
> > the cpan shell.
> > I want to use the Bio::DB::EUtilities to retrieve data (id's) from
> > NCBI
> > 'gene' database (first step of my pipeline).
> >
> > But the installation of this package doesn't seem to be correct :
> > The simple example given on the documentation doesn't work. (this
> > one :
> > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
> >
> > Here is the error message I got :
> > "Can't use an undefined value as an ARRAY reference at
> > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > In the UserAgent package, line 779 is in the private "_need_proxy"
> > subroutine and corresponds to the code :    ...if (@{ $self->
> > {'no_proxy'} })
> > ...
> >
> > If I comment this line in the UserAgent package and the
> > corresponding "}",
> > the example works. But obviously, I prefer to solve the problem in
> > a regular
> > way :)
> >
> > Indeed, my computer accesses the internet via a http proxy and I
> > didn't tell
> > this to BioPerl at any moment.
> > As I read on the BioPerl Wiki site, I tried to configure an
> > $http_proxy
> > environment variable but it still doesn't work.
> >
> > One last maybe important information is that I saw during the
> > installation
> > that the tests 't/EUtilities' were skipped because of an unrevealed
> > reason.
> >
> >
> > So finally I got two questions :
> > 1. Is there somebody who can figure out what is my problem ?
> > 2. At the moment, is the Bio::DB::EUtilities package really
> > efficient or
> > using directly the NCBI eutilities with the LWP::Simple package
> > could be an
> > good alternative ?
> >
> > Many thanks in advance,
> > Best Regards,
> > Anthony Ferrari
>
> First things first: at the moment the BioPerl EUtilities interface is
> very experimental (as specifically outlined in the POD), so I can't
> really recommend it for production use until the API is cleaned up.
> However, I do appreciate any feedback or comments re:EUtilities (the
> reason it's out there in the 1.5.2 release).
>
> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>


I carefully read this bug but that doesn't help because this has already
been modified in the now given GenericWebDBI.pm
So my problem does not come from a deep recursion loop.

As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w
t/EUtilities.t " to see what's really happening.
And actually, all tests are skipped because of the same message error
-> "Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

***
I tried the same command with the modified LWP::UserAgent package (which
means I comment the line 779 and the corresponding '}') and all 453 tests
passed.
But not always. I made the tests several times and  it often failed. And
always on a test called "eXXX->cookie->cookie() query key" (ending with
query key). In those cases, I got back a html message indicating that the
error was thrown by the internal sever of NCBI. So I guess that sometimes it
is just NCBI server fault (internal problem), and BioPerl is not implied..
But once more, I comment a line from a basic package so it is a bit
hazardous.
***

tony - a little bit lost.


From smane at vbi.vt.edu  Tue Dec 19 14:46:56 2006
From: smane at vbi.vt.edu (Shrinivasrao P. Mane)
Date: Tue, 19 Dec 2006 14:46:56 -0500
Subject: [Bioperl-l] Using Muscle parameter within bioperl
Message-ID: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>

Hi,
I need to run muscle using bioperl. This is how I do it in command line.

muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet

I used the following in perl script

my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');

The program runs and produces the result file but it doesn't create a  
log file nor does it stop sending output to STDOUT (-quiet).
Could anybody help me with this?
Thanks
Mane


From cjfields at uiuc.edu  Wed Dec 20 09:09:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 08:09:56 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>
>
> I carefully read this bug but that doesn't help because this has  
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
>
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/ 
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ 
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package  
> (which means I comment the line 779 and the corresponding '}') and  
> all 453 tests passed.
> But not always. I made the tests several times and  it often  
> failed. And always on a test called "eXXX->cookie->cookie() query  
> key" (ending with query key). In those cases, I got back a html  
> message indicating that the error was thrown by the internal sever  
> of NCBI. So I guess that sometimes it is just NCBI server fault  
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit  
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies.

EUtilities is set up to check for an env. proxy and also take a set  
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy  
to say this was a bug in LWP, but I think the problem is that  
something is undefined (i.e. an env. variable), or username/password.

 From the bug report, Torsten set his proxy variables using the  
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy.
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference.   
After the recursion fix, I'm assuming he made no changes to the env.  
settings, and according to the bug everything was fine (is that  
correct Tortsen?).

Also LWP::UserAgent has this:

--------------------------------------
"Load proxy settings from *_proxy environment variables. You might  
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy=http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these  
environment variables.

On systems with case insensitive environment variables there exists a  
name clash between the CGI environment variables and the HTTP_PROXY  
environment variable normally picked up by env_proxy(). Because of  
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY  
environment variable can be used instead."
--------------------------------------

chris


From bix at sendu.me.uk  Wed Dec 20 09:08:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:08:16 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <458943D0.10400@sendu.me.uk>

Shrinivasrao P. Mane wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
> 
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
> 
> I used the following in perl script
> 
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
> 
> The program runs and produces the result file but it doesn't create a  
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?

The Muscle arguments don't take dashed args. To make switches active you 
need to set them to some true value. So (-verbose => 1, quiet => 1, log 
=> 'inv.log'). Verbose may not do what you want since it is both a 
Bioperl option and a Muscle option; if you want the latter try using 
verbose => 1.


From bix at sendu.me.uk  Wed Dec 20 09:51:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:51:33 +0000
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
	<4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
Message-ID: <45894DF5.1060503@sendu.me.uk>

Chris Fields wrote:
> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I am looking for a bioperl object that can be abused to function as
>> a suitable 'taxon' object, where I mean 'taxon' as understood by
>> the NEXUS file format (i.e. not strictly an entity from a taxonomy,
>> but more loosely an OTU).
>> 
>> The object would primarily function as a way to relate nodes in 
>> trees to sequences in an alignment (a foreign key that both nodes
>> and sequences refer to), and secondarily as the keeper of the
>> canonical name of the OTU, such that a sequence named
>> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens
>> (constrained monophyly)' can still be understood to refer to the 
>> same thing - the OTU 'Homo sapiens sapiens' (for example).

I haven't had time to give your suggestions consideration, but I can say 
that I'm having to do the same thing for a bioperl-run module and my 
work-around is simply to set a custom name on my Bio::Taxon objects. To 
explain, I have the benefit that my tree is made up of Bio::Taxon 
objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to 
know which of my sequences corresponds to a particular taxon, I work out 
which of them has the id given by shift @{$taxon->name('seq_id')}.

Hardly ideal, but it works for now.


>> I was thinking that a (possibly expanded) Bio::Species might work
>>  if there was some sensible way of appending references to node and
>> sequence objects to it (or otherwise associate them with each
>> other), but I am writing *to solicit any and all suggestions*. I am
>> looking for something similar to Bio::Phylo::Taxa::Taxon.
>
> Sendu would be the best one to speak about Bio::Taxon and 
> Bio::Species and may have some ideas on the above.  The current plan
> was to deprecate Bio::Species, but who knows?

Given that we do plan to deprecate Bio::Species, I'd resist the 
temptation to expand on it. Use Bio::Taxon as a base if it has stuff you 
need, or base straight from Bio::Tree::Node if not.


From ferraria at gmail.com  Wed Dec 20 10:40:34 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 16:40:34 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
	<13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
Message-ID: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>

Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!)


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
>
> > You might check out this bug report, which relates directly to your
> > issue:
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> >
> > After I worked out the proxy issue Torsten got it working.  Let me
> > know if this doesn't help or fix the problem.
> >
> > chris
> >
> >
> > I carefully read this bug but that doesn't help because this has
> > already been modified in the now given GenericWebDBI.pm
> > So my problem does not come from a deep recursion loop.
> >
> > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > EUtilities.t " to see what's really happening.
> > And actually, all tests are skipped because of the same message error
> > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > ***
> > I tried the same command with the modified LWP::UserAgent package
> > (which means I comment the line 779 and the corresponding '}') and
> > all 453 tests passed.
> > But not always. I made the tests several times and  it often
> > failed. And always on a test called "eXXX->cookie->cookie() query
> > key" (ending with query key). In those cases, I got back a html
> > message indicating that the error was thrown by the internal sever
> > of NCBI. So I guess that sometimes it is just NCBI server fault
> > (internal problem), and BioPerl is not implied..
> > But once more, I comment a line from a basic package so it is a bit
> > hazardous.
> > ***
> >
> > tony - a little bit lost.
>
> I'm cc'ing Torsten as he has a bit more experience with proxies.
>
> EUtilities is set up to check for an env. proxy and also take a set
> proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> to say this was a bug in LWP, but I think the problem is that
> something is undefined (i.e. an env. variable), or username/password.
>
> From the bug report, Torsten set his proxy variables using the
> following:
>
> --------------------------------------
> "Note: I am behind an _authenticating_ proxy.
> My $http_proxy and $HTTP_PROXY are both set to
> http://USER:PASS at proxy.monash.edu.au:80/"
> --------------------------------------
>
> Note the lowercase for $http_proxy, which can make a difference.
> After the recursion fix, I'm assuming he made no changes to the env.
> settings, and according to the bug everything was fine (is that
> correct Tortsen?).
>
> Also LWP::UserAgent has this:
>
> --------------------------------------
> "Load proxy settings from *_proxy environment variables. You might
> specify proxies like this (sh-syntax):
>
>        gopher_proxy=http://proxy.my.place/
>        wais_proxy=http://proxy.my.place/
>        no_proxy="localhost,my.domain"
>        export gopher_proxy wais_proxy no_proxy
>
>      csh or tcsh users should use the setenv command to define these
> environment variables.
>
> On systems with case insensitive environment variables there exists a
> name clash between the CGI environment variables and the HTTP_PROXY
> environment variable normally picked up by env_proxy(). Because of
> this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> environment variable can be used instead."
> --------------------------------------
>
> chris
>


From cjfields at uiuc.edu  Wed Dec 20 11:10:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 10:10:48 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine>

Just to clarify: does it work it you don't have any proxy env. settings?
 
chris


  _____  

From: Anthony Ferrari [mailto:ferraria at gmail.com] 
Sent: Wednesday, December 20, 2006 9:41 AM
To: Chris Fields
Cc: bioperl-l List; Torsten Seemann
Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy


Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!) 


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote: 


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
> 
>
> I carefully read this bug but that doesn't help because this has
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
> 
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error 
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package 
> (which means I comment the line 779 and the corresponding '}') and
> all 453 tests passed.
> But not always. I made the tests several times and  it often
> failed. And always on a test called "eXXX->cookie->cookie() query 
> key" (ending with query key). In those cases, I got back a html
> message indicating that the error was thrown by the internal sever
> of NCBI. So I guess that sometimes it is just NCBI server fault 
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies. 

EUtilities is set up to check for an env. proxy and also take a set
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
to say this was a bug in LWP, but I think the problem is that
something is undefined ( i.e. an env. variable), or username/password.

>From the bug report, Torsten set his proxy variables using the
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy. 
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference. 
After the recursion fix, I'm assuming he made no changes to the env.
settings, and according to the bug everything was fine (is that
correct Tortsen?).

Also LWP::UserAgent has this:

-------------------------------------- 
"Load proxy settings from *_proxy environment variables. You might
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy= http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these 
environment variables.

On systems with case insensitive environment variables there exists a
name clash between the CGI environment variables and the HTTP_PROXY
environment variable normally picked up by env_proxy(). Because of 
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
environment variable can be used instead."
--------------------------------------

chris


From ferraria at gmail.com  Wed Dec 20 11:59:49 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 17:59:49 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine>
References: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
	<007901c72451$6ad540a0$15327e82@pyrimidine>
Message-ID: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>

First, I got a $http_proxy env. variable automatically defined by the
BioPerl installation (I don't define and export it in my .bash_profile).
So when I'm logging in,             $http_proxy=http://ip_adress:port/

Next step :  two solutions :
1) defining an $no_proxy env.variable in my .bash_profile.
It can be set to 'whatever'.

2) If I do not define '$no_proxy'; to make it work, I must call the
no_proxy() method on each Bio::DB::EUtilities object I create before I can
call the get_response() method on it.

(The bug is in the 'get_response' call)

And finally without 1) or 2) it doesn't work.

Tony

On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>  Just to clarify: does it work it you don't have any proxy env. settings?
>
One thing I didn't point out previously is that Bio::DB::GenericWebDBI
> inherits LWP::UserAgent.  You should be able to use $eutil->no_proxy()
> instead of setting it in your env.
> chris
>
>  ------------------------------
> *From:* Anthony Ferrari [mailto:ferraria at gmail.com]
> *Sent:* Wednesday, December 20, 2006 9:41 AM
> *To:* Chris Fields
> *Cc:* bioperl-l List; Torsten Seemann
> *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy
>
> Defining a "no_proxy" environment variable in my '.bashrc' file solved my
> problem. I set it to "localhost".
>
> It indeed corresponds to the line...       [    ...if (@{
> $self->{'no_proxy'} }) ...    ]   (I guess!)
>
>
> I really don't know why we are compelled to do this, but let's say that's
> the way it is.
>
> It works now !
>
> Thanks a lot.
>
> Tony
>
>
>
>
> On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> >
> > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
> >
> > > You might check out this bug report, which relates directly to your
> > > issue:
> > >
> > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> > >
> > > After I worked out the proxy issue Torsten got it working.  Let me
> > > know if this doesn't help or fix the problem.
> > >
> > > chris
> > >
> > >
> > > I carefully read this bug but that doesn't help because this has
> > > already been modified in the now given GenericWebDBI.pm
> > > So my problem does not come from a deep recursion loop.
> > >
> > > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > > EUtilities.t " to see what's really happening.
> > > And actually, all tests are skipped because of the same message error
> > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> > >
> > > ***
> > > I tried the same command with the modified LWP::UserAgent package
> > > (which means I comment the line 779 and the corresponding '}') and
> > > all 453 tests passed.
> > > But not always. I made the tests several times and  it often
> > > failed. And always on a test called "eXXX->cookie->cookie() query
> > > key" (ending with query key). In those cases, I got back a html
> > > message indicating that the error was thrown by the internal sever
> > > of NCBI. So I guess that sometimes it is just NCBI server fault
> > > (internal problem), and BioPerl is not implied..
> > > But once more, I comment a line from a basic package so it is a bit
> > > hazardous.
> > > ***
> > >
> > > tony - a little bit lost.
> >
> > I'm cc'ing Torsten as he has a bit more experience with proxies.
> >
> > EUtilities is set up to check for an env. proxy and also take a set
> > proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> > to say this was a bug in LWP, but I think the problem is that
> > something is undefined ( i.e. an env. variable), or username/password.
> >
> > From the bug report, Torsten set his proxy variables using the
> > following:
> >
> > --------------------------------------
> > "Note: I am behind an _authenticating_ proxy.
> > My $http_proxy and $HTTP_PROXY are both set to
> > http://USER:PASS at proxy.monash.edu.au:80/"
> > --------------------------------------
> >
> > Note the lowercase for $http_proxy, which can make a difference.
> > After the recursion fix, I'm assuming he made no changes to the env.
> > settings, and according to the bug everything was fine (is that
> > correct Tortsen?).
> >
> > Also LWP::UserAgent has this:
> >
> > --------------------------------------
> > "Load proxy settings from *_proxy environment variables. You might
> > specify proxies like this (sh-syntax):
> >
> >        gopher_proxy=http://proxy.my.place/
> >        wais_proxy= http://proxy.my.place/
> >        no_proxy="localhost,my.domain"
> >        export gopher_proxy wais_proxy no_proxy
> >
> >      csh or tcsh users should use the setenv command to define these
> > environment variables.
> >
> > On systems with case insensitive environment variables there exists a
> > name clash between the CGI environment variables and the HTTP_PROXY
> > environment variable normally picked up by env_proxy(). Because of
> > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> > environment variable can be used instead."
> > --------------------------------------
> >
> > chris
> >
>
>


From cjfields at uiuc.edu  Wed Dec 20 13:28:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 12:28:09 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>
Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine>


> First, I got a $http_proxy env. variable automatically 
> defined by the BioPerl installation (I don't define and 
> export it in my .bash_profile).
> So when I'm logging in,             $http_proxy=http://ip_adress:port/

BioPerl can't permanently set any env. variables out of the box since that
would mean modifying your local .bash_profile or the system profile.  If
you're a user on a system where you're not the sysadmin, then it's more
likely the sysadmin has set up user accounts with an already-defined
$http_proxy variable in the system .bash_profile (which is passed on to all
users).  

The problem I can see (going by what you have above) is there is no
username/password defined, only the address (IP:Port).  I am assuming LWP is
expecting some form of authentication when a proxy is env. defined w/o
username/password included.  If so, you'll have to supply those yourself,
either by redefining $http_proxy to include it in your local .bash_profile,

export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/'

by using $agent->proxy() for including all proxy information, or by using
$agent->authentication() so that a proxy can authorize any outgoing/incoming
requests.  The first may be preferrable if you are able to do so since you
wouldn't have to authenticate every agent.

Note that this would also explain why you had an LWP problem with an
undefined array ref: the LWP agent is likely expecting some form of
authentication, probably in the form [username, password], if a proxy env.
variable is found.

> Next step :  two solutions :
> 1) defining an $no_proxy env.variable in my .bash_profile.
> It can be set to 'whatever'.
> 
> 2) If I do not define '$no_proxy'; to make it work, I must call the
> no_proxy() method on each Bio::DB::EUtilities object I create 
> before I can call the get_response() method on it.
> 
> (The bug is in the 'get_response' call)

If you mean when the request is calling proxy_authorization_basic(), that's
not a bug.  If we didn't authorize then it likely wouldn't work for properly
set up proxies (Torsten's worked).  Note that it's supposed to be passing a
username/password from $self->authentication().  

The fact that you can set $no_proxy to anything suggests there is no proxy
in place.  
 
> And finally without 1) or 2) it doesn't work.
> 
> Tony

We can't guarantee that defining no_proxy will always work on your system,
either.  It's possible a proxy was added systemwide but a firewall hasn't
been put in place yet; once it goes up and all requests need to be
authorized, then you'll run into problems again.  Conversely, maybe this was
defined at some point systemwide in the .bash_profile but wasn't removed.
The only one who would know is the sysadmin.

If you aren't the sysadmin, you can contact them to find out about how to
properly set up your proxy, or whether it is even necessary (maybe they
neglected to remove the proxy definition from the system .bash_profile).
Who knows?  

chris


From bix at sendu.me.uk  Wed Dec 20 16:03:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 21:03:03 +0000
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
References: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <4589A507.60106@sendu.me.uk>

Chris Fields wrote:
>> First, I got a $http_proxy env. variable automatically 
>> defined by the BioPerl installation (I don't define and 
>> export it in my .bash_profile).
>> So when I'm logging in,             $http_proxy=http://ip_adress:port/
> 
> BioPerl can't permanently set any env. variables out of the box since

True, and it doesn't try to set one temporarily either.

To clarify some of the other points Chris made, the proxy variable 
certainly doesn't need username and password to be defined (from LWPs 
point of view), since not all proxies authenticate. Of course accesses 
won't work if authentication is actually required and these aren't set.

There's no reason that no_proxy should have to be set. It is used to say 
what domains shouldn't be proxied. Either this is a real LWP bug, or 
somehow EUtilities or one of its bases is doing something wrong. It 
should be investigated...

It would be very informative if Anthony could log in when he hasn't done 
anything to his environment variables (and so where the original problem 
manifests) and give us the results of:

perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }'


From avilella at gmail.com  Wed Dec 20 09:07:17 2006
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 20 Dec 2006 14:07:17 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com>

Try something like:

my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log');
my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params);

it works for me with muscle 3.6. The log only gives me a start,
commandstring and end. I dunno if that is what muscle is supposed to
spit out.

    Albert.

On 12/19/06, Shrinivasrao P. Mane <smane at vbi.vt.edu> wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
>
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
>
> I used the following in perl script
>
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
>
> The program runs and produces the result file but it doesn't create a
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?
> Thanks
> Mane
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 20 17:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 16:46:35 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <4589A507.60106@sendu.me.uk>
Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine>


> Chris Fields wrote:
> >> First, I got a $http_proxy env. variable automatically 
> defined by the 
> >> BioPerl installation (I don't define and export it in my 
> >> .bash_profile).
> >> So when I'm logging in,             
> $http_proxy=http://ip_adress:port/
> > 
> > BioPerl can't permanently set any env. variables out of the 
> box since
> 
> True, and it doesn't try to set one temporarily either.
> 
> To clarify some of the other points Chris made, the proxy 
> variable certainly doesn't need username and password to be 
> defined (from LWPs point of view), since not all proxies 
> authenticate. Of course accesses won't work if authentication 
> is actually required and these aren't set.
>
> There's no reason that no_proxy should have to be set. It is 
> used to say what domains shouldn't be proxied. Either this is 
> a real LWP bug, or somehow EUtilities or one of its bases is 
> doing something wrong. It should be investigated...

Actually, after some investigation I repeated the error and committed a fix.


If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error:

Can't use an undefined value as an ARRAY reference at
C:/Perl/lib/LWP/UserAgent.pm line 787.

It's EUtilities-specific as other WebAgents that have proxy settings do not
have the same problem, though I haven't checked any WebAgent-based classes.
I think this may also partly be an LWP bug as setting env_proxy to
TRUE/FALSE doesn't seem to have an effect, but instantiating with it
(env_proxy => 1) in the constructor fixes the problem.  Anthony, I have
committed a fix to CVS to GenericWebDBI and EUtilities.  Could you try it
out?

-chris


From cjfields at uiuc.edu  Wed Dec 20 18:19:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 17:19:59 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine>

> > First, I got a $http_proxy env. variable automatically 
> defined by the 
> > BioPerl installation (I don't define and export it in my 
> > .bash_profile).
> > So when I'm logging in,             
> $http_proxy=http://ip_adress:port/

Anthony,

Sorry about the prior long-winded response.  I managed to reproduce the
error about five minutes after I responded and managed to trace the problem
back to GenericWebDBI.  The issue seems to be with the LWP::UserAgent
env_proxy method not setting correctly post-instantiation; setting to 0 or 1
doesn't seem to do anything.  If I add it to the list of args for chained
instantiation in the constructor:

    my $self = $class->SUPER::new(@args, env_proxy => 1);

it suddenly works like a charm.  Hard to know why it's being fussy...

I'm going to try reproducing this on a few platforms and check to see if it
has been reported as an LWP bug.  I have also committed a fix to CVS if you
want to test it out.

Chris


From jnewcomer at jhu.edu  Wed Dec 20 20:56:10 2006
From: jnewcomer at jhu.edu (Joe Newcomer)
Date: Wed, 20 Dec 2006 20:56:10 -0500
Subject: [Bioperl-l]  a stupid question
Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu>

Hello Paul Leo,
I am with Johns Hopkins University Advanced Academic Programs.  I am trying
to contact a student named Paul Leo who has registered for Protein
Bioinformatics.  If this is you please email me.  I would like to send you
information about the spring course.

Respectfully, 
Joe Newcomer  (410) 516-5047
Online Education


From anhthu.tieu at gsf.de  Thu Dec 21 05:10:47 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:10:47 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5DA7.1010802@gsf.de>

Hi,

 I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From anhthu.tieu at gsf.de  Thu Dec 21 05:12:36 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:12:36 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5E14.8060409@gsf.de>

Hi,

I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From somil.sharma1 at gmail.com  Thu Dec 21 01:22:24 2006
From: somil.sharma1 at gmail.com (Somil Sharma)
Date: Thu, 21 Dec 2006 14:22:24 +0800
Subject: [Bioperl-l] problem
Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>

hello

*i  run this program*

*#!/use/bin/perl*

*use Bio::DB::GenBank;*

*$gb = new Bio::DB::GenBank;
$seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1;
*

*and got this error on cmd line--*

---------- *EXCEPTION  -------------
MSG: WebDBSeqI Request Error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
Content-Type: text/plain
Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
Client-Warning: Internal response*

*500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*

*STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
STACK Bio::DB::WebDBSeqI::get_Stream_by_id
C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
STACK toplevel C:\Perl\a2.pl:5*

plz see if u can help me out.

my ppm is also not able to install Bioperl so i did that also manually.

waiting for ur reply


From granjeau at tagc.univ-mrs.fr  Thu Dec 21 06:14:25 2006
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Thu, 21 Dec 2006 12:14:25 +0100
Subject: [Bioperl-l] BioFetch: Adding databases
Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr>

Hello!

I needed to query the Unisave database at EBI. Up to date, the only way 
to access it is the dbfetch web service 
(http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined 
in the BioFetch package 
(http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote 
these few lines to make it work, but I don't think it fits a good 
programming practice. May be it makes sense to defined a method to add 
databases to FORMATMAP, in order to follow the dbfetch service evolutions.

Cheers,
--Samuel

use Bio::DB::BioFetch;
$Bio::DB::BioFetch::FORMATMAP{unisave} = {
default   => 'swiss',
swissprot => 'swiss',
fasta     => 'fasta',
namespace => 'unisave',
};
my $bf = new Bio::DB::BioFetch(-db=>'unisave');
my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); 

print $seq->display_id();
print $seq->seq();


From cain at cshl.edu  Thu Dec 21 08:56:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 08:56:21 -0500
Subject: [Bioperl-l] problem
In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
Message-ID: <1166709381.3739.47.camel@localhost.localdomain>

Hello,

It looks to me like you have a networking problem that doesn't have
anything to do with BioPerl.  When I run your script, I get:

Bio::Seq::RichSeq=HASH(0x97013e0)

Fairly quickly, too.  Can you get to http://eutils.ncbi.nlm.nih.gov/ in
a browser without proxy settings?

As an aside, you probably don't really want the HASH stuff above, so I
modified your script to look like this, with warnings and strict to make
future debugging easier:

#!/use/bin/perl -w
use strict;

use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank;
my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1->seq;


Scott


On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote:
> hello
> 
> *i  run this program*
> 
> *#!/use/bin/perl*
> 
> *use Bio::DB::GenBank;*
> 
> *$gb = new Bio::DB::GenBank;
> $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
> print $seq1;
> *
> 
> *and got this error on cmd line--*
> 
> ---------- *EXCEPTION  -------------
> MSG: WebDBSeqI Request Error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
> Content-Type: text/plain
> Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
> Client-Warning: Internal response*
> 
> *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*
> 
> *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
> STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
> STACK Bio::DB::WebDBSeqI::get_Stream_by_id
> C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
> STACK toplevel C:\Perl\a2.pl:5*
> 
> plz see if u can help me out.
> 
> my ppm is also not able to install Bioperl so i did that also manually.
> 
> waiting for ur reply
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f63031e2/attachment-0002.bin>

From cjfields at uiuc.edu  Thu Dec 21 09:28:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 21 Dec 2006 08:28:07 -0600
Subject: [Bioperl-l] BioFetch: Adding databases
In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr>
References: <458A6C91.7090000@tagc.univ-mrs.fr>
Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu>

I've added this to the BioFetch FORMATMAP as 'unisave' and committed  
to CVS.  Thanks!

chris

On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> I needed to query the Unisave database at EBI. Up to date, the only  
> way
> to access it is the dbfetch web service
> (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet  
> defined
> in the BioFetch package
> (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote
> these few lines to make it work, but I don't think it fits a good
> programming practice. May be it makes sense to defined a method to add
> databases to FORMATMAP, in order to follow the dbfetch service  
> evolutions.
>
> Cheers,
> --Samuel
>
> use Bio::DB::BioFetch;
> $Bio::DB::BioFetch::FORMATMAP{unisave} = {
> default   => 'swiss',
> swissprot => 'swiss',
> fasta     => 'fasta',
> namespace => 'unisave',
> };
> my $bf = new Bio::DB::BioFetch(-db=>'unisave');
> my $seq = $bf->get_Seq_by_id('LAM1_MOUSE');
>
> print $seq->display_id();
> print $seq->seq();
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From anhthu.tieu at gsf.de  Thu Dec 21 09:31:45 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 15:31:45 +0100
Subject: [Bioperl-l] multiple glyph elements in one track
Message-ID: <458A9AD1.50907@gsf.de>

Hello,

 I use bioperl 1.5.2. Does anyone know how I could create two seperate 
glyph elements on the same track with the Bio::Graphics::Panel module? 
My aim is to have two (e.g. two different) clickable imagemap elements 
on the same track. Until now I can merely create two glyph elements 
(transcript2 or generic options) per track with only one imagemap 
element (e.g. the same imagemap element is used for the entire track as 
the entire (=both elements) glyph's coordinates are returned to the 
image_and_map function as one set of coordinate).

Thank you for your help.

Best regards,

Anh Thu


From cain at cshl.edu  Thu Dec 21 09:47:32 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 09:47:32 -0500
Subject: [Bioperl-l] multiple glyph elements in one track
In-Reply-To: <458A9AD1.50907@gsf.de>
References: <458A9AD1.50907@gsf.de>
Message-ID: <1166712453.3739.53.camel@localhost.localdomain>

Hello Anh Thu,

You can provide a callback for the glyph argument that returns different
glyphs depending on what you want to do (ie, how you've coded your
callback).  This example is from the perldoc for Bio::Graphics::Panel:

        $panel->add_track(\@exons,
                          -glyph => sub { my $feature = shift;
                                          $feature->source_tag eq ?curated?                                                    
                                                    ? ?ellipse? : ?generic?; }
                         );

Scott

 
On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote:
> Hello,
> 
>  I use bioperl 1.5.2. Does anyone know how I could create two seperate 
> glyph elements on the same track with the Bio::Graphics::Panel module? 
> My aim is to have two (e.g. two different) clickable imagemap elements 
> on the same track. Until now I can merely create two glyph elements 
> (transcript2 or generic options) per track with only one imagemap 
> element (e.g. the same imagemap element is used for the entire track as 
> the entire (=both elements) glyph's coordinates are returned to the 
> image_and_map function as one set of coordinate).
> 
> Thank you for your help.
> 
> Best regards,
> 
> Anh Thu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/9ec29c3e/attachment-0002.bin>

From cain.cshl at gmail.com  Thu Dec 21 15:03:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 21 Dec 2006 15:03:48 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz>
	<1166621113.3739.11.camel@localhost.localdomain>
	<1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz>
	<1166643051.3739.28.camel@localhost.localdomain>
	<1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
Message-ID: <1166731428.3739.71.camel@localhost.localdomain>

Hi Stephan,

About your bioperl mail: did you cancel it, or did it just disappear?
If the latter, I might have accidentally deleted it, sorry :-/

So 'GBrowse is running' means that you can see the sample yeast chr1
database, browse around, etc, right?  I still don't know what is up with
the warning but my guess is that everything is working there.

As for your question about writing a callback, the reason it's not
working is that the attributes method returns a list (typically but not
always with only one element), so what you are really doing in your test
is this "number of elements in the list > 1200", which is almost always
going to be false.  You should change it to this:

  my $feature = shift;
  my ($score) = $feature->attributes('score');
  if ($score > 1200) {
  ...etc...

Finally, if you really want to test that you are using the correct
bioperl, you can put this simple cgi in your cgi-bin directory as
test_biographics.pl, set it as world executable and go to
http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and
warnings even when the script is only 10 lines long :-)  :

#!/usr/bin/perl
use strict;
use warnings;
use Bio::Graphics::Panel;
use CGI qw/:standard/;

print header(),
      start_html,
      p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version),
      p("It should be 1.654 for BioPerl 1.5.2"),
      end_html;

Scott


On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote:
> Hi Scott,
> 
> responded to group but did get through.
> So I reply back to you.
> 
> I installed Class-Base-0.03 using CPAN.
> 
> Reinstalling GBrowse gives me still a warning like:
> Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021.
> Writing Makefile for Bio::Graphocs::Browser::CAlign
> Writing Makefile for Generic-Genome-Browser.
> 
> GBrowse is running but I cannot access attributes and/or the score column
> of .gff files. Is this related to the warning?
> 
> To get an attribute I use
> 
> my $feature = shift;
>                 if ($feature->attributes('score') > 1200) {
>                   return 'blue';
>                 } else {
>                   return 'pink';
>                 }
> But I retrieve not data using $feature->
> 
> Can I somehaow verify what bioperl version GBrowse is using?
> 
> Stephan,
> 
> 
> 
> Quoting Scott Cain <cain.cshl at gmail.com>:
> 
> > Stephan,
> >
> > Yes, it is in cpan:
> >
> > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm
> >
> > The cpan shell should be able to install it.
> >
> > Whether or not that works, please respond to the mailing list so that
> > the rest of the conversation can be archived.
> >
> > Scott
> >
> >
> > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote:
> > > Hi Scott,
> > >
> > > No I didn't.
> > > I had a look and couldn't find it.
> > > It is not part of CPAN?
> > >
> > > Stephan
> > >
> > >
> > > Quoting Scott Cain <cain.cshl at gmail.com>:
> > >
> > > > Stephan,
> > > >
> > > > Did you install Class::Base?  It was inadvertantly left out the
> > > > install
> > > > document, but is required.
> > > >
> > > > Scott
> > > >
> > > >
> > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote:
> > > > > Hi all,
> > > > >
> > > > > I did sudo ./Build install --uninst 1 and got the error
> > > > > * ERROR: Confiduration was initially created with MOdule::Build
> > > > version
> > > > > '0.2805', but we are now using '0.2806'. ...
> > > > >
> > > > > So I ran perl Build.PL and got the message
> > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'.
> > > > >
> > > > > I did run sudo ./Build install --uninst 1 again.
> > > > > Seems to be fine with no error messages.
> > > > >
> > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in
> > > > >
> > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have
> > 1.0050021.
> > > > > Warning: prerequisite Class::Base 0 not found.
> > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign
> > > > > Writing Makefile for Generic-Genome-Browser
> > > > >
> > > > > GBrowse is running but I have really troubles with aggregators
> > trying
> > > > to
> > > > > use xyplot. It does not plot anything. So I thought the bioperl
> > could
> > > > be
> > > > > the problem.
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > Quoting Scott Cain <cain at cshl.edu>:
> > > > >
> > > > > > I really don't think the BioPerl version detection is wrong.
> > I
> > > > > > actually
> > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I
> > check
> > > > > > Bio::Graphics::Panel->api_version.  When it doesn't find the
> > > > correct
> > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not
> > installed.
> > > >  I
> > > > > > have seen this happen when more than one BioPerl instance is
> > > > installed
> > > > > > and `perl Makefile.PL` finds the wrong one first.  My
> > suggestion is
> > > > to
> > > > > > try reinstalling BioPerl and providing the --uninst 1 argument
> > to
> > > > > > remove
> > > > > > older versions of BioPerl:
> > > > > >
> > > > > >   sudo ./Build install --uninst 1
> > > > > >
> > > > > > Scott
> > > > > >
> > > > > >
> > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> > > > > > > Stephan Roessner wrote:
> > > > > > > > Dear support team,
> > > > > > > >
> > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be
> > able
> > > > to
> > > > > > use
> > > > > > > > gbrowse.
> > > > > > > > The installation seems to work (except of the test
> > failures)
> > > > but
> > > > > > the
> > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is
> > > > > > installed, but
> > > > > > > > of course it requires 1.52.
> > > > > > > >
> > > > > > > > Is there a chance to find out what went wrong?
> > > > > > >
> > > > > > > Nothing went wrong with the Bioperl installation (well,
> > expect
> > > > there
> > > > > > > shouldn't have been any test failures - can you post those
> > > > please?).
> > > > > > > gbrowse simply defined its Bioperl requirement incorrectly.
> > If
> > > > you
> > > > > > tell
> > > > > > > me exactly where you downloaded gbrowse from and how you
> > went
> > > > about
> > > > > > > installing it, and provide the exact, complete error message
> > you
> > > > got
> > > > > > > from it, I might be able help the authors fix the problem.
> > > > > > >
> > > > > > > Or I'm pretty sure they can figure it our for themselves :)
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > --
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > Scott Cain, Ph. D.
> > > > > > cain at cshl.edu
> > > > > > GMOD Coordinator (http://www.gmod.org/)
> > > > > > 216-392-3087
> > > > > > Cold Spring Harbor Laboratory
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > --
> > > >
> > ------------------------------------------------------------------------
> > > > Scott Cain, Ph. D.
> > > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > >
> > >
> > >
> > >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f8621965/attachment-0002.bin>

From rvosa at sfu.ca  Sat Dec 23 17:17:37 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sat, 23 Dec 2006 14:17:37 -0800
Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <458DAB01.6080200@sfu.ca>

The replies I've received so far indicate I should look into Bio::Taxon. 
I will probably come back with further questions/discussions as to how 
to link and cross reference taxa, sequences and  nodes, but for now I 
should first look at the Bio::Taxon api (and unpack my other Christmas 
gifts). Thank you for all comments and suggestions.

Happy holidays!

Rutger


Rutger Vos wrote:
> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more loosely
> an OTU). 
>
> The object would primarily function as a way to relate nodes in trees to
> sequences in an alignment (a foreign key that both nodes and sequences refer
> to), and secondarily as the keeper of the canonical name of the OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).
>
> I was thinking that a (possibly expanded) Bio::Species might work if there
> was some sensible way of appending references to node and sequence objects
> to it (or otherwise associate them with each other), but I am writing *to
> solicit any and all suggestions*. I am looking for something similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Rutger A. Vos
 Postdoctoral research fellow
 University of British Columbia
 Personal site: http://www.sfu.ca/~rvosa
        CIPRES: http://www.phylo.org
    Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From paul.boutros at utoronto.ca  Sat Dec 23 22:36:59 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:36:59 -0500
Subject: [Bioperl-l] Bio::Graphics::Glyph::dna
Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca>

Hi,

I've been trying to get the dna glyph working and have had some  
problems.  I'm using a fasta file, and am having some problems.  This  
is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP.  I'm  
starting with a FASTA file, so I've tried:
$panel->add_track(
	$seq,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

where $seq is a Bio::Seq object

and I've tried it using a GFF $segment:
my $db = Bio::DB::GFF->new(
          -adaptor=>    'berkeleydb',
          -create =>    1,
          -dsn    =>    'temp'
          );

$db->load_sequence_string(
           $seq->primary_id(),
           $seq->seq()
           );

my $segment = Bio::DB::GFF::Segment->new(
           $db,
           $seq->primary_id(),
           $seq->primary)_id(),
           1,
           $seq->length()
           );

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);


From paul.boutros at utoronto.ca  Sat Dec 23 22:46:27 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:46:27 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having  
some problems.  I'm starting with a fasta file, and I am running perl  
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2

If I try simply using a Bio::Seq object like this:
$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
Can't locate object method "start" via package "Bio::Seq" at  
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this:
my $db = Bio::DB::GFF->new(
	-adaptor  => 'berkeleydb',
	-create   => 1,
	-dsn      => '/usr/local/share/gff/dmel'
	);

$db->initialize(1);

$db->load_sequence_string(
	$seq->primary_id(),
	$seq->seq()
	);

my $segment = Bio::DB::GFF::Segment->new(
	$db,
	$seq->primary_id(),
	$seq->primary_id(),
	1,
	$seq->length()
	);

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not  
implemented b
y package Bio::DB::GFF::Segment.
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented  
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures  
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
STACK: Bio::Graphics::Glyph::subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Panel::_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated!
Paul


From lstein at cshl.edu  Sun Dec 24 12:23:18 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun, 24 Dec 2006 12:23:18 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>

Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g.
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:
>
> Hello,
>
> I'm trying to get the dna glyph of Bio::Graphics to work and am having
> some problems.  I'm starting with a fasta file, and I am running perl
> 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2
>
> If I try simply using a Bio::Seq object like this:
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> Can't locate object method "start" via package "Bio::Seq" at
> C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.
>
>
> And if I try creating a Bio::DB::GFFSegment object like this:
> my $db = Bio::DB::GFF->new(
>         -adaptor  => 'berkeleydb',
>         -create   => 1,
>         -dsn      => '/usr/local/share/gff/dmel'
>         );
>
> $db->initialize(1);
>
> $db->load_sequence_string(
>         $seq->primary_id(),
>         $seq->seq()
>         );
>
> my $segment = Bio::DB::GFF::Segment->new(
>         $db,
>         $seq->primary_id(),
>         $seq->primary_id(),
>         1,
>         $seq->length()
>         );
>
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> ------------- EXCEPTION: Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
> implemented b
> y package Bio::DB::GFF::Segment.
> This is not your fault - author of Bio::DB::GFF::Segment should be blamed!
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Root::RootI::throw_not_implemented
> C:/Perl/site/lib/Bio/Root/RootI.pm:522
> STACK: Bio::FeatureHolderI::get_SeqFeatures
> C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
> STACK: Bio::Graphics::Glyph::_subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
> STACK: Bio::Graphics::Glyph::subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Panel::_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
> STACK: Bio::Graphics::Panel::_do_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
> STACK: Bio::Graphics::Panel::add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
> STACK: create_figure.pl:147
> ----------------------------------------------------------------
>
> I'm really unsure what to try next, any suggestions much appreciated!
> Paul
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From tgenahmet at gmail.com  Wed Dec 27 16:38:43 2006
From: tgenahmet at gmail.com (Ahmet Kurdoglu)
Date: Wed, 27 Dec 2006 14:38:43 -0700
Subject: [Bioperl-l] get mRNA details for a gene
Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com>

Hi,

This is my first message to the list. I hope I get it right. Here is what
I'm trying to accomplish:

Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file.

Using the web-interface I can search with this query:
DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search
'gene' database)
and get the GenBank file by clicking on NC_000001.9 and I can see the
details for its two mRNAs. (I eventually need to get exon locations for both
of its transcripts)

However trying to do this in Perl has proved to be very difficult for me.
I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and
get_Stream_by_query. Before I explain in detail what I did I'd like to hear
your ideas on how to accomplish this.

Thank you.


From sdavis2 at mail.nih.gov  Thu Dec 28 16:57:03 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 28 Dec 2006 16:57:03 -0500
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
Message-ID: <45943DAF.70100@mail.nih.gov>

Michael Muratet US-Huntsville wrote:
> Sean
>
> Thanks. I did consider the bioconductor package and downloaded your
> write-up after it was recommended by the GEO folks. I've looked at R a
> few times, but I never got proficient at it. I'm a lot better with perl.
>
> I've been looking at MINiML, too. It looked like it might be easier to
> parse the SOFT file since the data is in-line with the attributes and
> I'd have to use a SAX parser (not enough memory for DOM) for MINiML.
>
> NCBI must have parsers to get the data into their databases. Do you know
> what they use?
>   
Michael,

You might want to look more specifically at the MINiML format specs.  
The data tables are stored as separate tab-delimited files with an 
external reference in the XML, so DOM parsing is possible with just a 
few kB of memory.  Of course, to read in all of the data into memory at 
once will take a large amount of memory for some datasets.  If you are 
going to load into a database, I would suggest reading the MINiML using 
DOM and then stepping through the data files one at a time, loading as 
you go.

As for their parsers, I'm not sure what language they use, but writing a 
parser for either SOFT or MINiML isn't at all difficult.  GEO uses a 
very simplified MAGE schema. 

As for R vs. perl, if you are planning on doing analyses of microarray 
data, I would highly suggest looking again at the R/bioconductor 
project.  It will save you reinventing many wheels, such as getting 
annotation like gene ontology and pathways, doing stats, plotting, 
maintaining MIAME-compliant data structures, converting from multiple 
microarray formats, etc. 

Sean


From allenday at ucla.edu  Thu Dec 28 18:21:07 2006
From: allenday at ucla.edu (Allen Day)
Date: Thu, 28 Dec 2006 15:21:07 -0800
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <45943DAF.70100@mail.nih.gov>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
	<45943DAF.70100@mail.nih.gov>
Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com>

> As for R vs. perl, if you are planning on doing analyses of microarray
> data, I would highly suggest looking again at the R/bioconductor
> project.  It will save you reinventing many wheels, such as getting
> annotation like gene ontology and pathways, doing stats, plotting,
> maintaining MIAME-compliant data structures, converting from multiple
> microarray formats, etc.

I'll second this statement WRT the data analysis.  I'm doing all my
analysis in R, Perl is just not good at dealing with large matrices or
plotting.  OTOH, I have also found that R is particularly weak when it
comes to pipelining data and system interfacing.  If your goal is to
do ETL to a local database you're better off using Perl.

I've found they're both about equally clunky for dealing with the
experimental metadata, with a slight preference for Perl.  That's more
a reflection of the baroque MAGE model though than the programming
languages themselves.

-Allen

>
> Sean
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Paul.Boutros at utoronto.ca  Sat Dec 30 02:43:32 2006
From: Paul.Boutros at utoronto.ca (Paul Boutros)
Date: Sat, 30 Dec 2006 02:43:32 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>
Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main>

Hi Lincoln,

Thanks, that worked like a charm!  Can I suggest adding the
example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna?
Here's a patch against the 1.5.2 version of dna.pm to do that.

Paul

 
266c266,274

< in response to the dna() method.

---

> in response to the dna() method.  For example, you can use a

> Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq

> like this:

>    my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 );

>    my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800
);

>    $feature->attach_seq($dna);

>    $panel->add_track( $feature, -glyph => 'dna' );

> 

> A Bio::Graphics::Feature object may also be used.

 
  _____  

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of
Lincoln Stein
Sent: Sunday, December 24, 2006 12:23 PM
To: Paul.Boutros at utoronto.ca
Cc: BioPerl Mailing List
Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?

 
Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g. 
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having
some problems.  I'm starting with a fasta file, and I am running perl
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 

If I try simply using a Bio::Seq object like this:
$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
Can't locate object method "start" via package "Bio::Seq" at
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this: 
my $db = Bio::DB::GFF->new(
        -adaptor  => 'berkeleydb',
        -create   => 1,
        -dsn      => '/usr/local/share/gff/dmel'
        );

$db->initialize(1);

$db->load_sequence_string(
        $seq->primary_id(),
        $seq->seq()
        );

my $segment = Bio::DB::GFF::Segment->new(
        $db,
        $seq->primary_id(),
        $seq->primary_id(), 
        1,
        $seq->length()
        );

$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
implemented b
y package Bio::DB::GFF::Segment. 
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented 
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 
STACK: Bio::Graphics::Glyph::subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 
STACK: Bio::Graphics::Panel::_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track 
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated! 
Paul


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice) 
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From er at xs4all.nl  Sat Dec 30 19:05:16 2006
From: er at xs4all.nl (Erik)
Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET)
Subject: [Bioperl-l] acquiring a local refseq + index
Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>

Hi all,

I downloaded the refseq files (.gbff) and want to index the lot with
Bio::DB::Flat.

It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.

I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).

Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
 Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?

Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.

Thanks for your help,

Erikjan


-------------
use Bio::DB::Flat;

my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
   -directory  => $refseq_dir,
   -dbname     => 'refseq',
   -format     => 'genbank',
   -index      => 'bdb',
   -write_flag => 1,
);
my @files = getfiles($refseq_dir);
for my $f (@files) {
        db->build_index($f);
}


From hlapp at gmx.net  Sat Dec 30 20:48:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Dec 2006 20:48:33 -0500
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
Message-ID: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>

Can you send examples and the resulting error messages? Also, I'm  
assuming you running the 1.5.2 release of Bioperl; if not that's what  
I would try first.

	-hilmar

On Dec 30, 2006, at 7:05 PM, Erik wrote:

> Hi all,
>
> I downloaded the refseq files (.gbff) and want to index the lot with
> Bio::DB::Flat.
>
> It turns out that there are many cases where the SOURCE and  
> ORGANISM lines
> are messed up, sometimes to a degree where the indexing fails on a
> Bio::SeqIO::genbank error.
>
> I'd like to change Bio::SeqIO::genbank to let this parsing go at  
> least so
> far as to make the indexing of the refseq files possible, and  
> hopefully
> improving the taxonomic output ($seq->species->binomial is often  
> mutilated
> at the moment).
>
> Is it still worthwhile to change parsing modules like  
> Bio::SeqIO::genbank?
>  Is anyone already working on a rewrite? Because if this is the  
> case I may
> be better off writing my own indexing scheme?
>
> Below is (outline of) my indexing program, which uses  
> Bio::DB::Flat::DBD.
> If anyone knows of a better way to get a locally searchable refseq  
> flat
> file index, I would be very interested.
>
> Thanks for your help,
>
> Erikjan
>
>
> -------------
> use Bio::DB::Flat;
>
> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
> my $db=Bio::DB::Flat->new(
>    -directory  => $refseq_dir,
>    -dbname     => 'refseq',
>    -format     => 'genbank',
>    -index      => 'bdb',
>    -write_flag => 1,
> );
> my @files = getfiles($refseq_dir);
> for my $f (@files) {
>         db->build_index($f);
> }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Dec 30 21:33:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Dec 2006 20:33:23 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>

Agree with Hilmar, in that we need examples.  If you are referring to  
your submitted bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=2167

we could add this in as long as it passes (I'll try giving it a  
workout with my local bacterial seqs tonight or tomorrow).  However,  
in the not-too-distant future your patch would likely be rendered  
obsolete, as any parsing in Bio::SeqIO modules pertaining to  
Bio::Species-related matters will be deprecated in favor of simple  
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has  
optional db lookups using NCBI Taxonomy).  Bio::Species and anything  
related to it are considered marked for deprecation.  Fair warning...

chris

On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:

> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> 	-hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>>  Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>>    -directory  => $refseq_dir,
>>    -dbname     => 'refseq',
>>    -format     => 'genbank',
>>    -index      => 'bdb',
>>    -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>>         db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 31 14:36:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 31 Dec 2006 13:36:47 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
	<76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu>

As a followup, I have committed the fix Erik had in Bugzilla.  I  
don't know if this helps with the below issue Erik describes (they  
sound unrelated).

chris

On Dec 30, 2006, at 8:33 PM, Chris Fields wrote:

> Agree with Hilmar, in that we need examples.  If you are referring to
> your submitted bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow).  However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
> related to it are considered marked for deprecation.  Fair warning...
>
> chris
>
> On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
>
>> Can you send examples and the resulting error messages? Also, I'm
>> assuming you running the 1.5.2 release of Bioperl; if not that's what
>> I would try first.
>>
>> 	-hilmar
>>
>> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>>
>>> Hi all,
>>>
>>> I downloaded the refseq files (.gbff) and want to index the lot with
>>> Bio::DB::Flat.
>>>
>>> It turns out that there are many cases where the SOURCE and
>>> ORGANISM lines
>>> are messed up, sometimes to a degree where the indexing fails on a
>>> Bio::SeqIO::genbank error.
>>>
>>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>>> least so
>>> far as to make the indexing of the refseq files possible, and
>>> hopefully
>>> improving the taxonomic output ($seq->species->binomial is often
>>> mutilated
>>> at the moment).
>>>
>>> Is it still worthwhile to change parsing modules like
>>> Bio::SeqIO::genbank?
>>>  Is anyone already working on a rewrite? Because if this is the
>>> case I may
>>> be better off writing my own indexing scheme?
>>>
>>> Below is (outline of) my indexing program, which uses
>>> Bio::DB::Flat::DBD.
>>> If anyone knows of a better way to get a locally searchable refseq
>>> flat
>>> file index, I would be very interested.
>>>
>>> Thanks for your help,
>>>
>>> Erikjan
>>>
>>>
>>> -------------
>>> use Bio::DB::Flat;
>>>
>>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>>> my $db=Bio::DB::Flat->new(
>>>    -directory  => $refseq_dir,
>>>    -dbname     => 'refseq',
>>>    -format     => 'genbank',
>>>    -index      => 'bdb',
>>>    -write_flag => 1,
>>> );
>>> my @files = getfiles($refseq_dir);
>>> for my $f (@files) {
>>>         db->build_index($f);
>>> }
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 02:47:03 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 07:47:03 +0000
Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm?
In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com>
References: <519167.29410.qm@web50804.mail.yahoo.com>
Message-ID: <456FDDF7.1080403@sheffield.ac.uk>

Caitlin wrote:
> Hi all.
>
> I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references
> to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version?
> The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages
> among those deemed upgradable.
>
> Thanks,
>
> ~Katie
>
>
>   

Hi Katie,

Currently there is not an RC5 PPM package available - we are hoping to
have the official 1.5.2 release out pretty soon and there will
definitely be a PPM package for that! Are you experiencing any problems
with your current version of bioperl? If not, there is no need to worry,
once we've released an updated PPM package your PPM GUI should then be
able to see it as an upgrade - hopefully! :o)

Sendu, I know you were working on automatically generating PPM packages
- what is the current situation with regards to this?

Nath


---
avast! Antivirus: Inbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:46:58
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:47:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:00:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:00:18 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <456F27E9.70205@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>
	<456F27E9.70205@york.ac.uk>
Message-ID: <456FEF22.4090004@sendu.me.uk>

Samantha Thompson wrote:

You missed a step...


> use strict;
> use Bio::Perl;
> use Bio::Seq;
> use Bio::SeqIO;
> 
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> 
> #seq bit
> 
> #$seq_obj = Bio::Seq->new(-format => 'fasta');
> 
> my $seqio_obj = Bio::SeqIO->new(-file => 
> "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta');
> 
> my $seq_obj = $seqio_obj->next_seq;
> 
> 
> 
> #blast bit
> 
> my $remote_blast = Bio::Tools::Run::RemoteBlast->new (
>          -prog => 'blastp', -db => 'nr', -expect => '1e-15' );
> 
> my $blast_report = $remote_blast->submit_blast($seq_obj);

Go back to the Bptutorial:
http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29

And you'll see that submit_blast doesn't return a SearchIO object.

For a complete working example see the synopsis for RemoteBlast:
http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html


> #new part for SearchIO...
> 
> while( my $result = $blast_report->next_result ) {
>   while( my $hit = $result->next_hit ) {
>    while( my $hsp = $hit->next_hsp ) {
>     if( $hsp->length('total') > 100 ) {
>      if ( $hsp->percent_identity >= 75 ) {
>       print "Hit= ",       $hit->name,
>             ",Length=",     $hsp->length('total'),
>             ",Percent_id=", $hsp->percent_identity, "\n";
>      }
>     }
>    } 
>   }
> }


From bix at sendu.me.uk  Fri Dec  1 04:03:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:03:13 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <456FEFD1.4070704@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: The supplied lineage does not start near 'Photorhabdus luminescens
> subsp. laumondii'

In your uniprot_sprot.dat file there'll be some kind of entry with that 
Photorhabdus species. Can you post that entry (sans sequence if it has 
one) so I can take a look at it? Maybe post a few that cause problems, 
and a few that don't.


From bix at sendu.me.uk  Fri Dec  1 04:19:09 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:19:09 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
Message-ID: <456FF38D.3070508@sendu.me.uk>

Chris Fields wrote:
>> Nathan S. Haigh wrote:
>>> More updates:
>>>
>>> After the failed install I updating Module::Build, and re-ran the 
>>> install, I get:
>>>
>>> -- snip --
>>> Creating new 'Build' script for 'bioperl' version '1.005002005'
>>> Warning: while trying to determine prerequisites for 
>>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of 
>>> Module::Build the following error occurred: 'Failed to re-load 
>>> 'ModuleBuildBiope
>>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: 
>>> _build\lib C:\Perl\site\lib C:\
>>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1.
>>> '
>>>
>>> Falling back to META.yml for prerequisites 'YAML' not installed, 
>>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml'
>>> -- snip --
>> I had that problem fleetingly and it drove me crazy because 
>> later I couldn't reproduce it. Is it reproducible on your end?
> 
> During Module::Build installation I see this:
> 
> ...
> t\metadata........ok
>         8/43 skipped: YAML_support feature is not enabled

You were pointing out the YAML issue? I think I'm less concerned with 
that (solution: install YAML) and much more concerned with why it can't 
reload ModuleBuildBioperl (claiming it isn't in @INC). The module in 
question is in the same dir as the Build script, so it should be found 
automatically.

The only thing I can think of is that CPAN doesn't manage to chdir to 
the directory. Hopefully I'll be able to reproduce this and then I can 
investigate further.


From n.haigh at sheffield.ac.uk  Fri Dec  1 04:26:22 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 09:26:22 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <456FF53E.90907@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>>
>> I know that setting up the PPM is a pain, but I have to say it is 
>> much faster, and all required PPMs are available.  Which makes me 
>> curious: why bother with trying out a CPAN installation process at 
>> this point, especially when you have to use PPM to install some of 
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all 
> platforms, not just Windows. So thanks for trying it out and reporting 
> back. Secondly, the PPM method, like Bundle::BioPerl, is 
> all-or-nothing. The CPAN installation method allows an interactive 
> choice of which optional things to install.
>
> If what you say about DB_File is true, then that's a great shame!
>
>
> So I can do further trouble-shooting of my own, what is the sure-fire 
> way to completely clean-out an ActivePerl install, including any 
> modules you might have installed with PPMs or CPAN?
>
>

In addition, using CPAN allows you to run the test suite easily without 
the need to download it separately and run it after a PPM install.

I don't know of a way to clean out ActivePerl - I use VMWare Workstation 
and have a virtual machine with a fresh install of WinXP and ActivePerl 
5.8.8.819 - maybe someone else has ideas?

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 09:26:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 04:13:23 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:13:23 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
Message-ID: <456FF233.6040704@sendu.me.uk>

Chris Fields wrote:
> 
> I know that setting up the PPM is a pain, but I have to say it is much 
> faster, and all required PPMs are available.  Which makes me curious: 
> why bother with trying out a CPAN installation process at this point, 
> especially when you have to use PPM to install some of the prereqs 
> properly anyway?

Firstly, problems discovered and resulting fixes will help all 
platforms, not just Windows. So thanks for trying it out and reporting 
back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. 
The CPAN installation method allows an interactive choice of which 
optional things to install.

If what you say about DB_File is true, then that's a great shame!


So I can do further trouble-shooting of my own, what is the sure-fire 
way to completely clean-out an ActivePerl install, including any modules 
you might have installed with PPMs or CPAN?


From cjfields at uiuc.edu  Fri Dec  1 09:08:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:08:55 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>


On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I know that setting up the PPM is a pain, but I have to say it is  
>> much faster, and all required PPMs are available.  Which makes me  
>> curious: why bother with trying out a CPAN installation process at  
>> this point, especially when you have to use PPM to install some of  
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all  
> platforms, not just Windows. So thanks for trying it out and  
> reporting back. Secondly, the PPM method, like Bundle::BioPerl, is  
> all-or-nothing. The CPAN installation method allows an interactive  
> choice of which optional things to install.

Yes, I understand that.  My point is, you are generally forced to use  
PPM anyway due to several modules not installing properly (all the  
'trouble' distributions, like DB_File, are available via PPM).  I can  
see using CPAN as an alternative way of installing Bioperl for a  
distribution, or as the primary method via CVS or manually, but not  
for distributions.  At least not until the kinks are worked out for  
Windows users.

What are the significant issues for a bioperl PPM installation, based  
on the last PPM Nathan set up?  If there is a redirection problem,  
could we just modify the installation docs to address that ('due to  
problem X, you must install the following modules prior to installing  
BioPerl 1.5.2...').

> If what you say about DB_File is true, then that's a great shame!

We need to go through the various prereqs to see which ones need PPM  
vs CPAN.  In general, anything that requires C code compilation (and  
thus needs a recent VC++) will likely be an issue.

> So I can do further trouble-shooting of my own, what is the sure- 
> fire way to completely clean-out an ActivePerl install, including  
> any modules you might have installed with PPMs or CPAN?

Not sure, beyond uninstalling and cleaning out the Perl directory (I  
think you might be able to delete the site/ directory, but I haven't  
tried it).  ActivePerl comes preloaded with a number of non-core  
modules which makes it tricky to uninstall them one-by-one.

chris


From cjfields at uiuc.edu  Fri Dec  1 09:10:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:10:34 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <456FF38D.3070508@sendu.me.uk>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>


On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:

> You were pointing out the YAML issue? I think I'm less concerned  
> with that (solution: install YAML) and much more concerned with why  
> it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The  
> module in question is in the same dir as the Build script, so it  
> should be found automatically.
>
> The only thing I can think of is that CPAN doesn't manage to chdir  
> to the directory. Hopefully I'll be able to reproduce this and then  
> I can investigate further.

My thought was the two were related in some way.  I'm not sure to  
tell the truth.

-chris


From bix at sendu.me.uk  Fri Dec  1 09:17:41 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:17:41 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
	<10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
Message-ID: <45703985.5050203@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I know that setting up the PPM is a pain, but I have to say it is 
>>> much faster, and all required PPMs are available.  Which makes me 
>>> curious: why bother with trying out a CPAN installation process at 
>>> this point, especially when you have to use PPM to install some of 
>>> the prereqs properly anyway?
>>
>> Firstly, problems discovered and resulting fixes will help all 
>> platforms, not just Windows. So thanks for trying it out and reporting 
>> back. Secondly, the PPM method, like Bundle::BioPerl, is 
>> all-or-nothing. The CPAN installation method allows an interactive 
>> choice of which optional things to install.
> 
> Yes, I understand that.  My point is, you are generally forced to use 
> PPM anyway due to several modules not installing properly (all the 
> 'trouble' distributions, like DB_File, are available via PPM).  I can 
> see using CPAN as an alternative way of installing Bioperl for a 
> distribution, or as the primary method via CVS or manually, but not for 
> distributions.  At least not until the kinks are worked out for Windows 
> users.

CPAN isn't being suggested as the primary or preferred installation 
method for Windows. That will still be PPM. I'm mentioning CPAN / manual 
installation in the Windows INSTALL docs for the benefit of anyone who 
wants a simple install and test environment when checking out from CVS.


> What are the significant issues for a bioperl PPM installation

None that I'm aware of - I just need to find the time to start looking 
into generating an appropriate PPD. Hopefully Nathan's wiki page on the 
subject will be all I need.


From bix at sendu.me.uk  Fri Dec  1 09:18:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:18:43 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
	<6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
Message-ID: <457039C3.30907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:
> 
>> You were pointing out the YAML issue? I think I'm less concerned with 
>> that (solution: install YAML) and much more concerned with why it 
>> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The 
>> module in question is in the same dir as the Build script, so it 
>> should be found automatically.
>>
>> The only thing I can think of is that CPAN doesn't manage to chdir to 
>> the directory. Hopefully I'll be able to reproduce this and then I can 
>> investigate further.
> 
> My thought was the two were related in some way.  I'm not sure to tell 
> the truth.

They weren't, using YAML is the fall-back position incase of earlier 
failure.

I've fixed it now in any case.


From gwu at molbio.mgh.harvard.edu  Fri Dec  1 10:19:42 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Fri, 01 Dec 2006 10:19:42 -0500
Subject: [Bioperl-l] One more load_seqdatabase.pl question
In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com>	<53C6D534-6E36-4061-B955-E74537839265@gmx.net>	<456CA667.6010609@molbio.mgh.harvard.edu>
	<ED3F5F49-78A7-4E63-ACB8-5E8F745F0C34@gmx.net>
	<456F5648.6070207@molbio.mgh.harvard.edu>
	<70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu>

Thanks Hilmar. I did include the -lookup switch on the command line. The 
warning messages say that the code failed to "INSERT" instead of 
"UPDATE", which sounds like a match was not found. But I was just 
loading the same Genbank file for the second time. To test if it 
actually updated the records, I made a minor modification on one of the 
COMMENT feature. Unfortunately it's not updated. By the way, the test 
genbank file has four "COMMENT" features but they are different. Any 
idea what's happening there?

I wonder if it's a bad idea to "UPDATE" a sequence.  Say I got a new 
sequence version with 5 features removed, 5 features modified and 5 
features new. If only --lookup is included, according to the POD, the 5 
new features will be inserted, the 5 modified features will be updated 
and the 5 removed features will be in the database untouched. This 
rendered the new sequence records a mixture of old and new versions. I 
did not see a reason anyone would like to have a sequence like this. 
Either include -remove to replace the old version if only one version is 
needed, or put the new version under a different name space if multiple 
versions are needed. Do I have the correct understanding of these issues?

I deeply appreciate your help.

Gang


Hilmar Lapp wrote:
> Right. You need to tell it to lookup sequences first if you know that 
> you are loading sequences which may be in the database already (see 
> the POD of load_seqdatabase.pl, switch --lookup; there are several 
> other command line options that control what will happen if a sequence 
> entry is already present in the database.).
>
> The messages in you report are warnings, not errors. It looks like 
> some of the comments are duplicated for a sequence, it doesn't look 
> like reason for concern. Is not so good if you get errors thrown.
>
>     -hilmar
>
> On Nov 30, 2006, at 5:08 PM, gang wu wrote:
>
>> Thanks Hilmar. Do you mean the NVL() clause will make 
>> load_seqdatabase.pl not work when update?
>>
>> I have problem with updating. Seems load_seqdatabase.pl only tries to 
>> insert instead of update. I used one of the test genbank file coming 
>> whith bioperl-db. Please take a look at the attached output.
>>
>> Thanks.
>>
>> Gang
>>
>> =========================================
>> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle 
>> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank 
>> -namespace test 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb
>> Loading 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb 
>> ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("This sequence was reannotated via the Ensembl system. 
>> Please visit the Ensembl web site, http://www.ensembl.org/ for more 
>> information. ","1") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("The /gene indicates a unique id for a gene, /cds a 
>> unique id for a translation and a /exon a unique id for an exon. 
>> These ids are maintained wherever possible between versions. For more 
>> information on how to interpret the feature table, please visit 
>> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>> ...
>> ...
>> ==========================================================
>> Hilmar Lapp wrote:
>>> These are the protein translations stored in the feature table as 
>>> tags of features, right? You can change the type of the column 
>>> (although there may be some issues when you update the column 
>>> because the NVL() clause won't work if I recall that correctly), but 
>>> doing so will deprive you of any 'normal' searches against that 
>>> column. (You can still use functions >from the DBMS_LOB package, but 
>>> they will be much slower and are completely non-standard.) It is up 
>>> to you whether that is too big of a price to pay for having some 
>>> redundant protein translations (translating the feature's DNA 
>>> sequence should give you the same) in the database. I always trimmed 
>>> those feature tags off (using a custom SeqProcessor). An alternative 
>>> is to convert these feature tags into actual bioentries (i.e., 
>>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do 
>>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote:
>>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank 
>>>> genome sequences to my Oracle BioSQL database. I saw some 
>>>> errors(See attached warning message) related to 
>>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE 
>>>> column), which has Varchar2 data type of maximum 4000 bytes. Did 
>>>> anybody mention this issue before? Should I just modify the column 
>>>> to a type being able store more data such as LONG or CLOB? Thanks. 
>>>> Gang Log information: ============================================ 
>>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc 
>>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace 
>>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading 
>>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- 
>>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: 
>>>> unexpected failure of statement execution: ORA-01461: can bind a 
>>>> LONG value only for insert into a LONG column (DBD ERROR: error 
>>>> possibly near <*> indicator at char 12 in 'INSERT INTO 
>>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) 
>>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] 
>>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: 
>>>> FK[Bio::SeqFeature::Generic]:14898, 
>>>> FK[Bio::Annotation::SimpleValue]:800, 
>>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV 
>>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR 
>>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI 
>>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP 
>>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA 
>>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY 
>>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA 
>>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI 
>>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW 
>>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL 
>>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN 
>>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY 
>>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT 
>>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL 
>>>> VQATYQASA! 
>>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV 
>>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY 
>>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV 
>>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE 
>>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG 
>>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV 
>>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL 
>>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL 
>>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT 
>>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL 
>>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV 
>>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY 
>>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD 
>>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR 
>>>> VKLDFNFM! 
>>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS 
>>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN 
>>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL 
>>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD 
>>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE 
>>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV 
>>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL 
>>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS 
>>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF 
>>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL 
>>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA 
>>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL 
>>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN 
>>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE 
>>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL 
>>>> WLSVGADAS! 
>>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY 
>>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND 
>>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES 
>>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS 
>>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV 
>>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW 
>>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV 
>>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS 
>>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV 
>>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM 
>>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI 
>>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK 
>>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR 
>>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG 
>>>> QRKFIPAK! 
>>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ 
>>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", 
>>>> rank:"1" -------------------------------------------------- 
>>>> =============================================   
>>>> _______________________________________________ Bioperl-l mailing 
>>>> list Bioperl-l at lists.open-bio.org 
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From bosborne11 at verizon.net  Fri Dec  1 09:55:18 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 01 Dec 2006 09:55:18 -0500
Subject: [Bioperl-l] An announcement
Message-ID: <C195AC86.BB6A%bosborne11@verizon.net>

bioperl-l,

I would like to call your attention to a job posting and in doing so I
realize that I?m probably breaking a rule of this list. I apologize and and
acknowledge that I?ve transgressed. The reason I do this is because this is
an interesting job that is relevant to a lot of what we do in this mailing
list, and some of you might want to consider it. The posting is here:

http://www.nescent.org/main/employment.html#gmodhelpdesk

I encourage you to pass this on to anyone who you think might be interested.

Thanks again,

Brian O.


From cjfields at uiuc.edu  Fri Dec  1 11:49:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 10:49:32 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF53E.90907@sheffield.ac.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk>
Message-ID: <D464535F-E70F-44B4-AD48-3CC79181869C@uiuc.edu>


On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote:
...
> In addition, using CPAN allows you to run the test suite easily  
> without the need to download it separately and run it after a PPM  
> install.

A PPM, by design, is supposed to imply that the distribution passes  
tests for the specified platform, at that point in time, after all  
prereqs are installed and any additional postinstall operations  
(install C libraries, modify config files, etc) are complete.  The  
ActiveState automated PPM building process dictates that; if it fails  
any test, it will not be made into a PPM.  It's sort of a 'stamp of  
approval' that all tests pass, so you don't need to run them.

However, a test may fail (and a PPM may not get generated) for pretty  
superficial reasons, such as the makefile not specifying that a  
module is needed, server issues, etc, so the automated process isn't  
fullproof.  That's why Kobes and the other repositories are  
available, where the PPM/PPD is manually generated and made to work  
specifically for Windows (or whatever other platform).

Saying that, it is completely up to the person packaging the  
distribution to follow those rules if one were to make a PPM  
manually.  You don't even have to run tests prior to using 'nmake  
ppd'.  We can currently state, though, that all tests pass when all  
prereqs are installed for this distribution.  At least at this point  
in time!

> I don't know of a way to clean out ActivePerl - I use VMWare  
> Workstation and have a virtual machine with a fresh install of  
> WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas?

I haven't tried it that way.  I have Parallels on Mac OS X (I run a  
SigmaPlot/Excel combo off it).  My tests were using a native WinXP  
installation (i.e. not virtually) on my old Dell.  It shouldn't make  
a difference; VMWare, Parallels, and the like should all run  
ActivePerl for WinXP since it's a virtual machine.  Windows Vista, on  
the other hand...

I think with PPM4 you can install to a custom directory.  It may be  
possible to install all new modules to that directory, then you would  
at least have an idea of what was there (though I don't think you can  
delete it directly w/o screwing up the PPM database).

chris


From bix at sendu.me.uk  Fri Dec  1 12:12:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 17:12:49 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <45706291.80201@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:

I extracted just Q7N3Q6 from 
ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
and was able to load it in using load_seqdatabase.pl under linux with no 
errors. If you make a file with just that sequence do you still get the 
error?

Is anyone else able to reproduce the problem?


From cjfields at uiuc.edu  Fri Dec  1 12:57:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 11:57:18 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45703985.5050203@sendu.me.uk>
Message-ID: <006301c71572$24be8830$15327e82@pyrimidine>


> Chris Fields wrote:
> PPM).  I can 
> > see using CPAN as an alternative way of installing Bioperl for a 
> > distribution, or as the primary method via CVS or manually, but not 
> > for distributions.  At least not until the kinks are worked out for 
> > Windows users.
> 
> CPAN isn't being suggested as the primary or preferred 
> installation method for Windows. That will still be PPM. I'm 
> mentioning CPAN / manual installation in the Windows INSTALL 
> docs for the benefit of anyone who wants a simple install and 
> test environment when checking out from CVS.

That's fine by me.  I think the focus is making sure the PPM works, but that
shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
was never released concurrently with the distribution (if at all); it
generally followed by a few weeks to a few months past a final release.

> > What are the significant issues for a bioperl PPM installation
> 
> None that I'm aware of - I just need to find the time to 
> start looking into generating an appropriate PPD. Hopefully 
> Nathan's wiki page on the subject will be all I need.

I'll try testing it out today and next week (the more people we have looking
into the issue the better).  I'm sure that Module::Build hasn't updated to
using PPM4 XML formatting, but the tags are similar enough.  I can always
create a local PPM database using a similar directory structure to
bioperl.org/DIST and test an installation from it.

chris


From n.haigh at sheffield.ac.uk  Fri Dec  1 13:52:55 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 18:52:55 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707A07.7000106@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   
>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>   

To clarify a few things about PPM4 XML and to highlight the main 
differences:

1) The use of PROVIDE and REQUIRE tags
2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma 
separated tuples like PPM3 XML
4) the VERSION in PROVIDE and REQUIRE are used internally to do version 
comparisons for upgrades and solving prereqs etc
5) Module names should all contain '::' either natively according their 
namespace, if it doesn't have one natively, then one is appended to the 
end e.g. "GD::"
6) the VERSION in the SOFTPKG key is for human readability only
7) the NAME in SOFTPKG is used to identify which packages are actually 
the same.

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 18:52:57
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 13:52:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 18:52:44 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <457079FC.7010209@sendu.me.uk>

Sendu Bala wrote:
> pelikan at cs.pitt.edu wrote:
[snip]
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
> 
> I extracted just Q7N3Q6 from 
> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux with no 
> errors. If you make a file with just that sequence do you still get the 
> error?
> 
> Is anyone else able to reproduce the problem?

In fact, if I just try and load it again I reproduce the problem.
The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092

And I have a tentative fix that extends Brian's fix there. Committed to 
HEAD only atm. I don't know anything about bioperl-db and don't have the 
faintest clue why this is happening, nor the time to figure it out. Can 
someone please have a proper look at this and decide if my fix is sane?

All I can say is the the test suites for bioperl-live and bioperl-db 
continue to pass, but that isn't really saying much.


PS. having used load_seqdatabase.pl to load a sequence, how do I remove 
it afterwards?


From cjfields at uiuc.edu  Fri Dec  1 14:00:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:00:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <EAE311A7-DB66-4CFC-9598-EA6FCAED9B7F@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

I can reproduce on both WinXP and Mac OS X using the latest bioperl- 
db/bioperl-live and a BioSQL database preloaded with taxonomy.   
Notably the bug doesn't show up with a database lacking taxonomy,  
where no lookup is used (I guess).

Here's some overly verbose debugging (apologies):

Loading saved.flat ...
attempting to load adaptor class for Bio::Seq::RichSeq
	attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
	attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
	attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Tree::Tree
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Root::Root
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
	attempting to load module Bio::DB::BioSQL::RootIAdaptor
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Tree::TreeI
	attempting to load module Bio::DB::BioSQL::TreeIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Tree::NodeI
	attempting to load module Bio::DB::BioSQL::NodeIAdaptor
	attempting to load module Bio::DB::BioSQL::NodeAdaptor
attempting to load adaptor class for Bio::Tree::TreeFunctionsI
	attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor
no adaptor found for class Bio::Tree::Tree
attempting to load adaptor class for Bio::DB::Taxonomy::list
	attempting to load module Bio::DB::BioSQL::listAdaptor
attempting to load adaptor class for Bio::DB::Taxonomy
	attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load adaptor class for Bio::Annotation::Collection
	attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
	attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
	attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
	attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
	attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
	attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
	attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
	attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
	attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
	attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
	attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
	attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
	attempting to load module Bio::DB::BioSQL::LocationIAdaptor
	attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
	attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,  
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority)  
VALUES (?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "Swiss- 
Prot" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load driver for adaptor class  
Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for  
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon,  
taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class  
= ? AND ncbi_taxon_id = ?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid)
prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM  
taxon node, taxon taxon, taxon_name name WHERE name.taxon_id =  
node.taxon_id AND taxon.left_value BETWEEN node.left_value AND  
node.right_value AND taxon.taxon_id = ? AND name.name_class =  
'scientific name' ORDER BY node.left_value
attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::SeqAdaptor
Could not store Q7N3Q6:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The supplied lineage does not start near 'Photorhabdus  
luminescens subsp. laumondii'
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ 
Bio/Species.pm:166
STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:552
STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ 
Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK: load_seqdatabase.pl:620
-----------------------------------------------------------

at load_seqdatabase.pl line 633


chris


From cjfields at uiuc.edu  Fri Dec  1 14:01:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:01:59 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45707A07.7000106@sheffield.ac.uk>
References: <006301c71572$24be8830$15327e82@pyrimidine>
	<45707A07.7000106@sheffield.ac.uk>
Message-ID: <C233572F-BD36-4DBE-BE9B-2C097F4C939B@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>>> Chris Fields wrote:
>>> PPM).  I can
>>>> see using CPAN as an alternative way of installing Bioperl for a  
>>>> distribution, or as the primary method via CVS or manually, but  
>>>> not for distributions.  At least not until the kinks are worked  
>>>> out for Windows users.
>>>>
>>> CPAN isn't being suggested as the primary or preferred  
>>> installation method for Windows. That will still be PPM. I'm  
>>> mentioning CPAN / manual installation in the Windows INSTALL docs  
>>> for the benefit of anyone who wants a simple install and test  
>>> environment when checking out from CVS.
>>>
>>
>> That's fine by me.  I think the focus is making sure the PPM  
>> works, but that
>> shouldn't hold up the final 1.5.2 release.  The PPM for previous  
>> releases
>> was never released concurrently with the distribution (if at all); it
>> generally followed by a few weeks to a few months past a final  
>> release.
>>
>>
>>>> What are the significant issues for a bioperl PPM installation
>>>>
>>> None that I'm aware of - I just need to find the time to start  
>>> looking into generating an appropriate PPD. Hopefully Nathan's  
>>> wiki page on the subject will be all I need.
>>>
>>
>> I'll try testing it out today and next week (the more people we  
>> have looking
>> into the issue the better).  I'm sure that Module::Build hasn't  
>> updated to
>> using PPM4 XML formatting, but the tags are similar enough.  I can  
>> always
>> create a local PPM database using a similar directory structure to
>> bioperl.org/DIST and test an installation from it.
>>
>> chris
>>
>
> To clarify a few things about PPM4 XML and to highlight the main  
> differences:
>
> 1) The use of PROVIDE and REQUIRE tags
> 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
> 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma  
> separated tuples like PPM3 XML
> 4) the VERSION in PROVIDE and REQUIRE are used internally to do  
> version comparisons for upgrades and solving prereqs etc
> 5) Module names should all contain '::' either natively according  
> their namespace, if it doesn't have one natively, then one is  
> appended to the end e.g. "GD::"
> 6) the VERSION in the SOFTPKG key is for human readability only
> 7) the NAME in SOFTPKG is used to identify which packages are  
> actually the same.
>
> Nath

Okay.  Maybe place this in the wiki (PPM page) for future reference?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 14:05:38 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 19:05:38 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707D02.9070504@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   

Forgot to say, one really annoying thing about PPM is that it seems to 
display all the versions of Bioperl defined in the XML file. An 
addition, I think a bug in PPM4 means that if a package is available in 
ActiveStates repo PPM4 always want to install it rather than a more 
recent version in a different repo (this includes upgrades). This 
results in this annoying behaviour:
1) If activestate and bioperl repos are active, searching for bioperl 
lists several versions
2) If you are using PPM4 GUI, and have installed a non activestate 
version, then it says you can upgrade to the version in activestates 
repo (even if it's actually a downgrade).
3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" 
it will always install the version in the activestate repo.
4) I'm sure there are also some other annoyances.

In the end, it means the best way to install and upgrade bioperl, is to 
search for bioperl packages and install the latest version by eye rather 
than relying in the "upgrade feature" (at least for the time being). You 
may also need to remove an old version of bioperl before installing a 
more recent version. NOTE: using "upgrade" runs the risk of installing 
bioperl 1.2.3 from activestate and not the latest version in any other repo!

I'll update the wiki when I have time.
Nath


>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0652-4, 30/11/2006
> Tested on: 01/12/2006 18:29:23
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
>
>
>
>   


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 19:05:39
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From cjfields at uiuc.edu  Fri Dec  1 14:06:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:06:53 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

Okay, just updated to get your latest CVS fixes for bioperl-live and  
it passes now for both Mac OS X and WinXP.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Dec  1 14:09:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:09:15 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <A85B86B9-3DCD-4855-AC06-675D19E3689E@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote:

>
> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

There's not much documentation on it, but it demonstrated several  
times in the test suite.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Dec  1 14:39:17 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 19:39:17 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
	<0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
Message-ID: <457084E5.2050300@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:
> 
>> pelikan at cs.pitt.edu wrote:
>>> Hello all,
>>>
>>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>>> without Cygwin. The "make test"s have all completed without error. This
>>> is my first time dealing with bioperl, so bear with me.
>>>
>>>    I've successfully loaded the most recent taxonomy information 
>>> using the
>>> biosql-schema scripts. After this, I attempted to load the most recent
>>> release of the uniprot flat file dataset with the following command:
>>>
>>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>>
>>> I am subsequently greeted by many of the following errors:
>>>
>>> Could not store Q7N3Q6:
>>
>> I extracted just Q7N3Q6 from
>> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz 
>>
>> and was able to load it in using load_seqdatabase.pl under linux with no
>> errors. If you make a file with just that sequence do you still get the
>> error?
>>
>> Is anyone else able to reproduce the problem?
> 
> Okay, just updated to get your latest CVS fixes for bioperl-live and it 
> passes now for both Mac OS X and WinXP.

Can you confirm if it is actually working correctly though? Like, having 
stored a previously-problem sequence, can you get it back out from the 
database and is its ->species() correct?


From cjfields at uiuc.edu  Fri Dec  1 14:52:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:52:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457084E5.2050300@sendu.me.uk>
Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine>

> > 
> > Okay, just updated to get your latest CVS fixes for 
> bioperl-live and 
> > it passes now for both Mac OS X and WinXP.
> 
> Can you confirm if it is actually working correctly though? 
> Like, having stored a previously-problem sequence, can you 
> get it back out from the database and is its ->species() correct?

I would assume so, if we can trust the species tests.  I will have to try it
again over the weekend.  I planned on loading a ton of protein sequences in
anyway, most of which are bacterial; if anything breaks it will probably be
with those.

I think Jason and Hilmar were going to get together about the BioSQL paper
at the hackathon.  That may be a good place to bring some of the species
issues, if they persist.

chris


From hlapp at gmx.net  Fri Dec  1 20:42:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 1 Dec 2006 20:42:05 -0500
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net>

Either using SQL:

	-- theoretically you should convince yourself first that there
	-- is only one such record (the UK is over acc,version,namespace)
	SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6';

or through bioperl-db (see the delete test for examples):

	my $db = Bio::DB::BioDB->new(....);
	my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6',
	                               -namespace=>'whatever you used when  
loading');
	my $adp = $db->get_persistence_adaptor($seq);
	my $pseq = $adp->find_by_unique_key($seq);
	$pseq->remove();
	$pseq->commit();

-hilmar

On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote:

> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From chhalling at verizon.net  Sun Dec  3 20:56:51 2006
From: chhalling at verizon.net (Conrad Halling)
Date: Sun, 03 Dec 2006 20:56:51 -0500
Subject: [Bioperl-l] BioPerl Wiki is down
Message-ID: <45738063.1070504@verizon.net>

When I attempted to navigate to http://www.bioperl.org/, I got the 
following message:

A database query syntax error has occurred. This may indicate a bug in 
the software. The last attempted database query was:

    (SQL query hidden)

from within function "MediaWikiBagOStuff::_doquery". MySQL returned 
error "1205: Lock wait timeout exceeded; try restarting transaction 
(localhost)".

-- 
Conrad Halling
chhalling at verizon.net


From rbirnie at totalise.co.uk  Sun Dec  3 16:38:02 2006
From: rbirnie at totalise.co.uk (richard)
Date: Sun, 3 Dec 2006 21:38:02 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
Message-ID: <200612032138.02522.rbirnie@totalise.co.uk>

Hi all,

I'm having a little trouble getting Bio::Graphics to give me the correct 
output and I'm looking for some help. I am trying to extend from example 5 of 
the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 
Eventually I intend the script to follow example 6 but I thought I'd try the 
simpler version first.

The basic aim of the script is that it takes as input a file containing a list 
of GenBank IDs plus some other info for alternative transcripts of a gene. 
This information is stored in a hash and the GenBank IDs are used to retrieve 
the appropriate entries from GenBank. I then want to use Bio::Graphics to 
generate a figure from the feature tables showing the CDSs from the 
alternative transcripts. 

So far I have managed to retrieve the GenBank entries extract the feature 
tables and store a reference to these in the hash mentioned above. I've also 
got Bio::Graphics to draw a basic image but some of the details aren't right 
and I don't understand why. I have attached the code I have so far, the input 
file and the output image to this mail. I didn't want to display it all in 
the main message but I'm not actually sure which bit is causing the problem. 
The code is very rough and in need of polishing but I need to get it to work 
correctly first.

These are the problems:
1) As I understand it this:

my $wholeseq = Bio::SeqFeature::Generic->new (
		-start => 1,
		-end => $refseq->length,
		-display_name =>$refseq->display_name
		);

should display the name of the gene (CD133/Prominin1) near the top of image. 
It doesn't, am I misunderstanding or is there an error in the code?

2) In the quoted example the CDS is broken up into smaller regions which are 
then linked together in example 6. This isn't happening in my code and I 
think it should be, I get one solid block for the CDS. I don't understand why 
this is because I'm not clear which parts of the feature table are used to 
define where the CDS should be split. I think this is the relevant bit of 
code:

foreach my $alt_trans (keys %main) {
	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

		my $feature = $main{$alt_trans}{'features'}{$tag};

		$panel->add_track($feature,
				-glyph => 'generic',
				-bgcolor => $colors[$idx++ % @colors],
				-fgcolor => 'black',
				-font2color => 'black',
				-key => $alt_trans,
				-bump => +1,
				-height => 8,
				-label => 1,
				-description => 1,
				) if ($tag eq 'CDS');

}
}

Can anyone tell me what I am doing wrong?

RefSeq entry for the gene of interest is here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386
If I understand correctly the example file used in the HOWTO is this gene:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320

Final question, does bioperl come with example scripts and is so where whould 
they normally be found on a Linux system?

If anyone is still reading this thanks for your patience. Any clarification 
will be appreciated.

regards,
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133_graphic_code
Type: application/x-perl
Size: 2702 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0003.bin>
-------------- next part --------------
sequence_ID	Exon_Boundary	Assay_location	Amplicon_length
NM_006017	9 - 10	1118	106
AF027208.1	9 - 10	1118	106
AK027420.1	9 - 10	1312	106
AK027422.1	9 - 10	1334	106
BC012089.1	9 - 10	1289	106
AY449689.1	8 - 9	1054	106
AY449690.1	8 - 9	1054	106
AY449691.1	8 - 9	1054	106
AY449692.1	9 - 10	1081	106
AY449693.1	9 - 10	1081	106
AF507034.1	8 - 9	1091	106
AK075411.1	9 - 10	1289	106
AF117225.1	9 - 10	1334	106
AK226033.1	-	1312	106
DQ895452.1	-	1054	106
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133.png
Type: image/png
Size: 4322 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0003.png>

From cjfields at uiuc.edu  Sun Dec  3 22:35:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Dec 2006 21:35:17 -0600
Subject: [Bioperl-l] BioPerl Wiki is down
In-Reply-To: <45738063.1070504@verizon.net>
References: <45738063.1070504@verizon.net>
Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu>

On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote:

> When I attempted to navigate to http://www.bioperl.org/, I got the
> following message:
>
> A database query syntax error has occurred. This may indicate a bug in
> the software. The last attempted database query was:
>
>     (SQL query hidden)
>
> from within function "MediaWikiBagOStuff::_doquery". MySQL returned
> error "1205: Lock wait timeout exceeded; try restarting transaction
> (localhost)".
>
> -- Conrad Halling
> chhalling at verizon.net

This has been an ongoing problem with the server; I have reported it  
previously to open-bio support.  There have been a few attempts to  
fix it which seem to work short-term but something else must be  
wrong.  Jason?  Chris D?

For my part, Googling found the following link, which indicates that  
this error may be due to heavy server load:

http://tibia.erig.net/TibiaWiki:Bug_reports

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Derek.Fairley at bll.n-i.nhs.uk  Mon Dec  4 05:18:37 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Mon, 4 Dec 2006 10:18:37 -0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C63D@bllmail.bll.n-i.nhs.uk>

Richard,

 
You can find instructions for installing the example scripts directory
here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_SCRIPTS 

 
or you can get individual scripts from here:

http://www.bioperl.org/wiki/Bioperl_scripts11 

 
Derek.

 
-----Original Message-----

From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard

Sent: 03 December 2006 21:38

To: Bioperl list

Subject: [Bioperl-l] confused by Bio::Graphics

 
Hi all,

 
I'm having a little trouble getting Bio::Graphics to give me the correct


output and I'm looking for some help. I am trying to extend from example
5 of 

the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 

Eventually I intend the script to follow example 6 but I thought I'd try
the 

simpler version first.

 
The basic aim of the script is that it takes as input a file containing
a list 

of GenBank IDs plus some other info for alternative transcripts of a
gene. 

This information is stored in a hash and the GenBank IDs are used to
retrieve 

the appropriate entries from GenBank. I then want to use Bio::Graphics
to 

generate a figure from the feature tables showing the CDSs from the 

alternative transcripts. 

 
So far I have managed to retrieve the GenBank entries extract the
feature 

tables and store a reference to these in the hash mentioned above. I've
also 

got Bio::Graphics to draw a basic image but some of the details aren't
right 

and I don't understand why. I have attached the code I have so far, the
input 

file and the output image to this mail. I didn't want to display it all
in 

the main message but I'm not actually sure which bit is causing the
problem. 

The code is very rough and in need of polishing but I need to get it to
work 

correctly first.

 
These are the problems:

1) As I understand it this:

 
my $wholeseq = Bio::SeqFeature::Generic->new (

            -start => 1,

            -end => $refseq->length,

            -display_name =>$refseq->display_name

            );

 
should display the name of the gene (CD133/Prominin1) near the top of
image. 

It doesn't, am I misunderstanding or is there an error in the code?

 
2) In the quoted example the CDS is broken up into smaller regions which
are 

then linked together in example 6. This isn't happening in my code and I


think it should be, I get one solid block for the CDS. I don't
understand why 

this is because I'm not clear which parts of the feature table are used
to 

define where the CDS should be split. I think this is the relevant bit
of 

code:

 
foreach my $alt_trans (keys %main) {

      foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

 
            my $feature = $main{$alt_trans}{'features'}{$tag};

 
            $panel->add_track($feature,

                        -glyph => 'generic',

                        -bgcolor => $colors[$idx++ % @colors],

                        -fgcolor => 'black',

                        -font2color => 'black',

                        -key => $alt_trans,

                        -bump => +1,

                        -height => 8,

                        -label => 1,

                        -description => 1,

                        ) if ($tag eq 'CDS');

 
}

}

 
Can anyone tell me what I am doing wrong?

 
RefSeq entry for the gene of interest is here:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386

If I understand correctly the example file used in the HOWTO is this
gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053
20

 
Final question, does bioperl come with example scripts and is so where
whould 

they normally be found on a Linux system?

 
If anyone is still reading this thanks for your patience. Any
clarification 

will be appreciated.

 
regards,

Richard

 
From rbirnie at totalise.co.uk  Mon Dec  4 04:30:36 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 09:30:36 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/551f1442/attachment-0003.html>

From bix at sendu.me.uk  Mon Dec  4 09:37:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:37:16 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <45706671.9000201@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>	<456F27E9.70205@york.ac.uk>
	<456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk>
Message-ID: <4574329C.2030905@sendu.me.uk>

Samantha Thompson wrote:
> Hi,
> Thanks for all your help so far, I am still trying to understand a 
> couple of things...

You should make sure your replies are sent to the list, as you're likely 
to get a faster response.


[where $blast_report is the value returned by 
Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)]
> when I run this line..
> 
> $searchio = Bio::SearchIO->new(-format <http://www.perldoc.com/perl5.6/pod/func/format.html> => 'blast',
>                                -file   => $blast_report);
> 
> between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not open 1: No such file or directory
[snip]
> Does this mean that my BLAST is failing when I submit it?

No, the -file option of SearchIO->new() takes, unsurprisingly, a 
filename. I'd tell you to pay careful attention to the docs, but sadly 
the RemoteBlast docs are currently wrong.

submit_blast() claims to return 'Blast report object' (which in any case 
certainly wouldn't be a filename) when in fact it returns, as you 
discovered, a (for our purposes) meaningless number.

As I suggested before, you need to look at the synopsis for 
Bio::Tools::Run::RemoteBlast instead.

(having called submit_blast you must do the each_rid loop)


Does anyone care to go through the POD for RemoteBlast and update it to 
an accurate state?


From bix at sendu.me.uk  Mon Dec  4 09:40:27 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:40:27 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
	<BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
Message-ID: <4574335B.805@sendu.me.uk>

rbirnie at totalise.co.uk wrote:
> Hi all,
> 
> I've just seen my previous mail come through on the digest and I noticed 
> that the code I attached has been scrubbed which means that the message 
> won't make much sense. If I've contravened list rules by posting 
> attachments then apologies, I did look for a posting guide but couldn't 
> see one on the wiki. I deliberatley didn't put the whole code in the 
> main message because it's quite long. I'm not sure which part is wrong 
> so I don't know which part to post I'm just not seeing the output I 
> would expect from the example. What is the best thing for me to do?

I saw a few attachments on your post (including your code example), so I 
think what you did was fine.


From cjfields at uiuc.edu  Mon Dec  4 10:40:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 09:40:20 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <4574335B.805@sendu.me.uk>
Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine>


> rbirnie at totalise.co.uk wrote:
> > Hi all,
> > 
> > I've just seen my previous mail come through on the digest and I 
> > noticed that the code I attached has been scrubbed which means that 
> > the message won't make much sense. If I've contravened list 
> rules by 
> > posting attachments then apologies, I did look for a 
> posting guide but 
> > couldn't see one on the wiki. I deliberatley didn't put the 
> whole code 
> > in the main message because it's quite long. I'm not sure 
> which part 
> > is wrong so I don't know which part to post I'm just not seeing the 
> > output I would expect from the example. What is the best 
> thing for me to do?
> 
> I saw a few attachments on your post (including your code 
> example), so I think what you did was fine.

Same here.  I received a PNG file and two text files (a script and a data
file).

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

 
From rbirnie at totalise.co.uk  Mon Dec  4 11:06:51 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 16:06:51 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine>
References: <002001c717ba$823c1500$15327e82@pyrimidine>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612041606510.37306@webm5.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/22c3c5e0/attachment-0003.html>

From dmessina at wustl.edu  Mon Dec  4 11:46:16 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 4 Dec 2006 10:46:16 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
References: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <ACE259C3-DC1C-41CC-88F3-7ACF8B9D66AA@wustl.edu>

Hi Richard,


> [richard]
>
> These are the problems:
> 1) As I understand it this:
>
> my $wholeseq = Bio::SeqFeature::Generic->new (
> 		-start => 1,
> 		-end => $refseq->length,
> 		-display_name =>$refseq->display_name
> 		);
>
> should display the name of the gene (CD133/Prominin1) near the top  
> of image.
> It doesn't, am I misunderstanding or is there an error in the code?

The contents of a sequence object's display_name varies depending on  
the type of sequence record; for a sequence object created from a  
Genbank record, it's the value of the LOCUS field on the first line  
of the record.

If you want the gene name, you'll have to dig it out of the feature  
table. If you look at the  Genbank record for your first sequence,  
you'll see that under both the gene and CDS primary features, the  
HUGO gene abbreviation is stored under the "gene" secondary tag, and  
various synonyms are under the "note" and "product" secondary tags.

LOCUS       NM_006017               3794 bp    mRNA    linear   PRI  
17-NOV-2006
DEFINITION  Homo sapiens prominin 1 (PROM1), mRNA.
ACCESSION   NM_006017
VERSION     NM_006017.1  GI:5174386
[...skipping irrelevant part of the Genbank record...]
FEATURES             Location/Qualifiers
      source          1..3794
                      /organism="Homo sapiens"
                      /mol_type="mRNA"
                      /db_xref="taxon:9606"
                      /chromosome="4"
                      /map="4p15.32"
      gene            1..3794
                      /gene="PROM1"
                      /note="prominin 1; synonyms: AC133, CD133, PROML1,
                      MSTP061"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
      CDS             38..2635
                      /gene="PROM1"
                      /go_component="integral to plasma membrane  
[pmid 9389720];
                      membrane"
                      /go_process="response to stimulus; visual  
perception"
                      /note="hProminin; prominin (mouse)-like 1;  
hematopoietic
                      stem cell antigen"
                      /codon_start=1
                      /product="prominin 1"
                      /protein_id="NP_006008.1"
                      /db_xref="GI:5174387"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
[....more...]

In your script, you grab the primary features between lines 34-60.  
You can grab the secondary feature you want with something like:

[cribbed from the Feature-Annotation HOWTO]
for my $feat_object ($seq_object->get_SeqFeatures) {
    push @ids, $feat_object->get_tag_values("gene") if ($feat_object- 
 >has_tag("gene"));
}


> 2) In the quoted example the CDS is broken up into smaller regions  
> which are
> then linked together in example 6. This isn't happening in my code  
> and I
> think it should be, I get one solid block for the CDS. I don't  
> understand why
> this is because I'm not clear which parts of the feature table are  
> used to
> define where the CDS should be split. I think this is the relevant  
> bit of
> code:
>
> foreach my $alt_trans (keys %main) {
> 	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {
>
> 		my $feature = $main{$alt_trans}{'features'}{$tag};
>
> 		$panel->add_track($feature,
> 				-glyph => 'generic',
> 				-bgcolor => $colors[$idx++ % @colors],
> 				-fgcolor => 'black',
> 				-font2color => 'black',
> 				-key => $alt_trans,
> 				-bump => +1,
> 				-height => 8,
> 				-label => 1,
> 				-description => 1,
> 				) if ($tag eq 'CDS');
>
> }
> }


The problem here is that RefSeq mRNA records don't contain intron- 
exon boundary information. I think you'll have to get that from an  
assembly record. From the Entrez gene page for PROM1, I obtained a  
Genbank record for the PROM1 genomic locus:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb

Saving that as 'PROM1.gb' (the suffix is important), and running the  
bp_embl2picture.pl script on it, I got an image similar to Figure 6  
(attached).

Hope this helps,
Dave


?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PROM1.png
Type: image/png
Size: 8646 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0003.png>

From bix at sendu.me.uk  Mon Dec  4 14:37:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 19:37:13 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
Message-ID: <457478E9.3060405@sendu.me.uk>

Chris Fields wrote:
> Sendu,
> 
> Are current plans to still try getting the final 1.5.2 release out
> before the hackathon next week?

Yes, I seriously hope so. I was kind of hoping to see test results from 
you and Nathan on the wiki though...


> There are a few commits I want to make, but I may wait until after
> 1.5.2 is out before I add them.

But don't let the release stop you. As long as you don't commit to the
1.5.2 branch it will be fine.


From cjfields at uiuc.edu  Mon Dec  4 14:34:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 13:34:34 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine>

Sendu,

Are current plans to still try getting the final 1.5.2 release out before
the hackathon next week?  There are a few commits I want to make, but I may
wait until after 1.5.2 is out before I add them.

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Dec  4 15:23:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 14:23:45 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine>

> Chris Fields wrote:
> > Sendu,
> > 
> > Are current plans to still try getting the final 1.5.2 release out 
> > before the hackathon next week?
> 
> Yes, I seriously hope so. I was kind of hoping to see test 
> results from you and Nathan on the wiki though...

Ah, forgot to post those!  Working on that now...

> > There are a few commits I want to make, but I may wait until after
> > 1.5.2 is out before I add them.
> 
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.

There are a few things I plan on adding over the next few weeks, including
some things for Bio::Location::SplitLocation.  However I'm sure some of the
latter will break tests, so I'll be adding it in a bit at a time.

It all depends when I can squeeze time in to work on them!

chris 


From pelikan at cs.pitt.edu  Mon Dec  4 17:34:59 2006
From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu)
Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST)
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>

Hello,

    My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the
latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
memory. "make test"s past fine.

The problem is that I'm not getting similar numbers of anything when I
load datasets using load_seqdatabase.pl. For instance, if I want to load
only protiens from Homo Sapiens,
I go to UniProt,
use the database search function,
do a text search for Homo Sapiens (returns 70914 hits),
export the hits to flat file format (--format swiss) using the data set
manager,
and load it using load_seqdatabase.pl.

The result of  "select count(*) from bioentry;" results in only 1003 entries.
Moreover it seems like the entries don't go past the B's in the alphabet -
I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%',
but I can find apolipoproteins, for example.

I know this is an annoying question, but if someone has more experience in
dealing with this issue, I would be grateful for any assistance. I don't
get any error messages, so it's difficult for me to tell what's going on.

-Richard


From n.haigh at sheffield.ac.uk  Tue Dec  5 01:53:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 06:53:34 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <4575176E.3020906@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

OK, I'll get onto this today.

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From n.haigh at sheffield.ac.uk  Tue Dec  5 06:43:16 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 11:43:16 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <45755B54.7080902@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

I've added my test results for Debian to the wiki.
Nath

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From bix at sendu.me.uk  Tue Dec  5 06:47:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 05 Dec 2006 11:47:06 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <45755B54.7080902@sheffield.ac.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk>
Message-ID: <45755C3A.9050903@sendu.me.uk>

Nathan S. Haigh wrote:
> Sendu Bala wrote:
>> Chris Fields wrote:
>>   
>>> Sendu,
>>>
>>> Are current plans to still try getting the final 1.5.2 release out
>>> before the hackathon next week?
>>>     
>> Yes, I seriously hope so. I was kind of hoping to see test results from 
>> you and Nathan on the wiki though...
>
> I've added my test results for Debian to the wiki.

Thanks (and to Chris as well). I can't tell you how much I loath and 
despise TCoffee and Tmhmm now ;)


From cjfields at uiuc.edu  Tue Dec  5 11:04:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Dec 2006 10:04:38 -0600
Subject: [Bioperl-l] Build.PL changes
Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine>

Sendu,

I think the Build.PL commits which force installation of XML::SAX::Expat
should be rolled back.  XML::Simple works with any XML::SAX backend, not
just XML::SAX::Expat, which hasn't been actively maintained since 2003 and
is deprecated in favor of XML::SAX::ExpatXS.  In fact, forcing
XML::SAX::Expat to install as the default XML::SAX backend currently breaks
blastxml parsing.

Note that forcing this also forces one to install the Expat library (now at
v 2), which now has some compatibility problems with XML::SAX::Expat (but
not ExpatXS).

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From qetzal at tutopia.com.br  Wed Dec  6 10:21:20 2006
From: qetzal at tutopia.com.br (giovani)
Date: Wed, 06 Dec 2006 10:21:20 -0500
Subject: [Bioperl-l] Biodiversity graphic
Message-ID: <auto-000222418003@frontend01.cg.ifxnetworks.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061206/9d9e4a09/attachment-0003.html>

From benoit at ebi.ac.uk  Wed Dec  6 12:30:12 2006
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 06 Dec 2006 17:30:12 +0000
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <4576FE24.1030807@ebi.ac.uk>

giovani wrote:
> 
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having troubles. 
> here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.


It looks to me that the graph you pointing too has been made by gnuplot.
Why don't you use gnuplot or R instead ?

Ben

> 
> #!/usr/bin/perl -w
> 
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
> 
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
> 
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
> 
>  
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gwu at molbio.mgh.harvard.edu  Wed Dec  6 16:12:57 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 06 Dec 2006 16:12:57 -0500
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <45773259.3010405@molbio.mgh.harvard.edu>

Do you mean the GD code can not run or it does not generate image as you 
wanted?

Gang

giovani wrote:
>
>
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having 
> troubles. here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.
>
> #!/usr/bin/perl -w
>
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
>
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
>
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Dec  6 17:39:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 06 Dec 2006 22:39:49 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
Message-ID: <457746B5.2020006@sendu.me.uk>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.


From cjfields at uiuc.edu  Wed Dec  6 21:30:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 6 Dec 2006 20:30:44 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


From hlapp at gmx.net  Wed Dec  6 22:20:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:20:14 -0500
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net>

I seriously doubt that load_seqdatabase.pl would have deliberately  
stopped loading the file. Either there was an error in loading an  
entry (which you should see, and you can also ask the script to just  
keep going by providing the --safe option), or the file only  
contained 1003 entries.

Note that you can get progress logging by using the --logchunk  
option, which will also give you a final count of the number of  
sequences loaded.

I'm not sure how you ran your search and your download on Uniprot. If  
I try what you describe I get 70491 hits, and if I try to export them  
using the data set manager I get the message:

This download mechanism only supports 1000 proteins. The first 1000  
proteins have been added from the selected.

Which perfectly explains what you see.

Did you convince yourself that the file contains 70491 entries? If  
you don't have grep and wc on your windows machine, you can use perl  
one-liners directly, e.g.,

perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' <your-file- 
here>

	-hilmar

On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote:

> Hello,
>
>     My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC,  
> and the
> latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
> memory. "make test"s past fine.
>
> The problem is that I'm not getting similar numbers of anything when I
> load datasets using load_seqdatabase.pl. For instance, if I want to  
> load
> only protiens from Homo Sapiens,
> I go to UniProt,
> use the database search function,
> do a text search for Homo Sapiens (returns 70914 hits),
> export the hits to flat file format (--format swiss) using the data  
> set
> manager,
> and load it using load_seqdatabase.pl.
>
> The result of  "select count(*) from bioentry;" results in only  
> 1003 entries.
> Moreover it seems like the entries don't go past the B's in the  
> alphabet -
> I can't find bioentry.descriptions like '%cytochrome%' or '% 
> myoglobin%',
> but I can find apolipoproteins, for example.
>
> I know this is an annoying question, but if someone has more  
> experience in
> dealing with this issue, I would be grateful for any assistance. I  
> don't
> get any error messages, so it's difficult for me to tell what's  
> going on.
>
> -Richard
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lzhtom at hotmail.com  Wed Dec  6 22:13:47 2006
From: lzhtom at hotmail.com (zhihua li)
Date: Thu, 07 Dec 2006 03:13:47 +0000
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
Message-ID: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>

Hi netters,

Recently I found this:

For constructing a new SeqI object, I had to write:
$seq_obj=Bio::SeqIO->new(
      -file => '/home/myfile',
      -format => 'Fasta');              #Note the dash before the two 
arguments.

If I omitted the dash:
$seq_obj=Bio::SeqIO->new(
     file => '/home/myfile',
     format => 'Fasta');
I'd get error:
MSG: Unknown format given or could not determine it []
STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377

So it seems to me that the dashes before the arguments are essential.  
However, when I tried to build a factory for StandaloneBlast, I found the 
other way around.

If the script had the dash:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             -program => 'blastn',
             -database => '/home/mydatabase');

I'd get the error message: 
MSG: Unallowed parameter: - !
STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
STACK Bio::Tools::Run::StandAloneBlast::new 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400

If I left out the dash by saying:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             program => 'blastn',
             database => '/home/mydatabase');

Everyting is fine.

Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm 
not allowed to?

Thanks in advance!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From hlapp at gmx.net  Wed Dec  6 22:56:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:56:44 -0500
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <CE76F074-5897-431C-9E39-9E096DBD1973@gmx.net>

Congrats! Great work, Sendu! Don't forget to celebrate.

	-hilmar

On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From arareko at campus.iztacala.unam.mx  Wed Dec  6 22:53:21 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 06 Dec 2006 21:53:21 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <45779031.3050202@campus.iztacala.unam.mx>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Thu Dec  7 00:06:36 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Dec 2006 21:06:36 -0800
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org>

hear! hear!  Excellent work.   Thanks for leading the effort on this  
release and all of the behind the scenes work, attention to detail,   
and cat herding work it took make this possible.

-jason

On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From n.haigh at sheffield.ac.uk  Thu Dec  7 02:23:47 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 07 Dec 2006 07:23:47 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <4577C183.7010501@sheffield.ac.uk>

I know I'm very new to Bioperl development and don't know very much yet,
so I'm probably not the best person to express the views of the Bioperl
developers or users. However, I'm sure I'm safe in saying that on behalf
of everyone associated with Bioperl a *huge* thank you must go out to
Sendu for the gargantuan effort he has put into this release.

Just looking over some of the e-mails he's sent over the past few weeks
alone, it's clear that he has devoted a huge amount of time to the
effort and in some cases with little sleep. Since there is very little
(or should I say no) monetary recognition in such an important and time
consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe
in the knowledge that his efforts have helped enormously and are clearly
recognised and fully appreciated by the Bioperl community.

Therefore, I'd just like to iterate what others have already
said.....Well done, excellent work!!!

Nath


From valiente at lsi.upc.edu  Thu Dec  7 03:25:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 09:25:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
In-Reply-To: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>

The following popped out when input more the 110 species to  
taxonomy2tree script version 1.4:

         (in cleanup)
------------- EXCEPTION  -------------
MSG: Must supply a Bio::Taxon
STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
flatfile.pm:260
STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
STACK (eval) taxonomy2tree.pl:0
STACK toplevel taxonomy2tree.pl:0

Any clues? Thanks,

Gabriel


From bix at sendu.me.uk  Thu Dec  7 04:24:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:24:39 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DDD7.7060208@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Are you able to narrow the problem down? What was your command line, 
what species were you using? Does it work with the first 110 species you 
tried? Is there anything special about the 111th?

Do I understand correctly that this was a problem during cleanup only, 
and didn't affect the correctness and completeness of the result?


From bix at sendu.me.uk  Thu Dec  7 04:33:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:33:18 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DFDE.6000500@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Oh, does it work with option -e? Or does it work if you delete your old 
indexes of the nodes and names files and let it re-create them?


From valiente at lsi.upc.edu  Thu Dec  7 04:38:03 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 10:38:03 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577DDD7.7060208@sendu.me.uk>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
	<4577DDD7.7060208@sendu.me.uk>
Message-ID: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>

Hi,

If you run the attached shell script you should be able to reproduce  
the problem. It is not about any species in particular, but about the  
total number of species: it crushes with more than 120 species. The  
resulting tree is not correct, I'm checking it further now. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/00f0aeda/attachment-0003.obj>
-------------- next part --------------

On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>> The following popped out when input more the 110 species to   
>> taxonomy2tree script version 1.4:
>>          (in cleanup)
>> ------------- EXCEPTION  -------------
>> MSG: Must supply a Bio::Taxon
>> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/  
>> flatfile.pm:260
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
>> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
>> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
>> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
>> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
>> STACK (eval) taxonomy2tree.pl:0
>> STACK toplevel taxonomy2tree.pl:0
>> Any clues? Thanks,
>
> Are you able to narrow the problem down? What was your command  
> line, what species were you using? Does it work with the first 110  
> species you tried? Is there anything special about the 111th?
>
> Do I understand correctly that this was a problem during cleanup  
> only, and didn't affect the correctness and completeness of the  
> result?


From cjfields at uiuc.edu  Thu Dec  7 10:22:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 09:22:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110species
In-Reply-To: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine>

> Hi,
> 
> If you run the attached shell script you should be able to 
> reproduce the problem. It is not about any species in 
> particular, but about the total number of species: it crushes 
> with more than 120 species. The resulting tree is not 
> correct, I'm checking it further now. Thanks,
> 
> Gabriel

Gabriel, 

My guess is this may have to do with using an old taxonomy dump file.  I got
this to work on winXP using the latest NCBI taxonomy.  I had to modify
taxonomy2tree and your shell script to get it to play nice with Windows, but
I didn't get the error and I did get a tree (abbreviated for brevity):

(((((("Agrobacterium tumefaciens str. C58","Sinorhizobium
meliloti")Rhizobiaceae,...

chris


From cjfields at uiuc.edu  Thu Dec  7 13:44:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 12:44:32 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
References: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>


On Dec 6, 2006, at 9:13 PM, zhihua li wrote:

> Hi netters,
>
> Recently I found this:
>
> For constructing a new SeqI object, I had to write:
> $seq_obj=Bio::SeqIO->new(
>      -file => '/home/myfile',
>      -format => 'Fasta');              #Note the dash before the  
> two arguments.
>
> If I omitted the dash:
> $seq_obj=Bio::SeqIO->new(
>     file => '/home/myfile',
>     format => 'Fasta');
> I'd get error:
> MSG: Unknown format given or could not determine it []
> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>
> So it seems to me that the dashes before the arguments are  
> essential.  However, when I tried to build a factory for  
> StandaloneBlast, I found the other way around.
>
> If the script had the dash:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             -program => 'blastn',
>             -database => '/home/mydatabase');
>
> I'd get the error message: MSG: Unallowed parameter: - !
> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>
> If I left out the dash by saying:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             program => 'blastn',
>             database => '/home/mydatabase');
>
> Everyting is fine.
>
> Now I'm confused. Why sometimes I have to add the dash, while  
> sometimes I'm not allowed to?
>
> Thanks in advance!

I agree that this should be more consistent.  Does anyone know the  
reasoning for this?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Thu Dec  7 14:32:21 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Dec 2006 14:32:21 -0500
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
 constructor?
In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>
Message-ID: <C19DD675.BD72%bosborne11@verizon.net>

Chris,

The latest StandAloneBlast takes "dashed parameters", as in:

 @params = (-database => 'swissprot',-outfile => 'blast1.out');
 $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

Or

 my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp",
                                                     -database=>"swissprot",
                                                     -e => 1e-20);

So that's why I asked "what version?"

Someone made the change to allow dashes in @params a few months ago and I
believe that that someone was you!

Brian O.


On 12/7/06 1:44 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> 
> On Dec 6, 2006, at 9:13 PM, zhihua li wrote:
> 
>> Hi netters,
>> 
>> Recently I found this:
>> 
>> For constructing a new SeqI object, I had to write:
>> $seq_obj=Bio::SeqIO->new(
>>      -file => '/home/myfile',
>>      -format => 'Fasta');              #Note the dash before the
>> two arguments.
>> 
>> If I omitted the dash:
>> $seq_obj=Bio::SeqIO->new(
>>     file => '/home/myfile',
>>     format => 'Fasta');
>> I'd get error:
>> MSG: Unknown format given or could not determine it []
>> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>> 
>> So it seems to me that the dashes before the arguments are
>> essential.  However, when I tried to build a factory for
>> StandaloneBlast, I found the other way around.
>> 
>> If the script had the dash:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             -program => 'blastn',
>>             -database => '/home/mydatabase');
>> 
>> I'd get the error message: MSG: Unallowed parameter: - !
>> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
>> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>> 
>> If I left out the dash by saying:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             program => 'blastn',
>>             database => '/home/mydatabase');
>> 
>> Everyting is fine.
>> 
>> Now I'm confused. Why sometimes I have to add the dash, while
>> sometimes I'm not allowed to?
>> 
>> Thanks in advance!
> 
> I agree that this should be more consistent.  Does anyone know the
> reasoning for this?
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Dec  7 14:44:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 13:44:19 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <C19DD675.BD72%bosborne11@verizon.net>
References: <C19DD675.BD72%bosborne11@verizon.net>
Message-ID: <A12BC418-6400-46FC-8383-66E21D997E56@uiuc.edu>


On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote:

> Chris,
>
> The latest StandAloneBlast takes "dashed parameters", as in:
>
>  @params = (-database => 'swissprot',-outfile => 'blast1.out');
>  $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>
> Or
>
>  my $factory = Bio::Tools::Run::StandAloneBlast->new(-program  
> =>"wublastp",
>                                                      - 
> database=>"swissprot",
>                                                      -e => 1e-20);
>
> So that's why I asked "what version?"
>
> Someone made the change to allow dashes in @params a few months ago  
> and I
> believe that that someone was you!
>
> Brian O.

Nope, I plead innocent (at least to this!).  I haven't made any  
commits to StandAloneBlast.  These were added in by Torsten (see  
commits 1.59, 1.60), so you'll need to blame/thank him...

http://tinyurl.com/y7ym9g

So they're now a bit more consistent.  That's not to say  
StandAloneBlast doesn't need some major revisions....

BTW, I didn't see a post from you asking about the version.

Chris


From akarger at CGR.Harvard.edu  Thu Dec  7 16:32:51 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 7 Dec 2006 16:32:51 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>

I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.

I'm reading in some fungal GFFs generated by Jason Stajich. I

- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq

(Code below)

Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.

If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?

I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University

P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
    # Create a new object representing the exons' gene
    my $coding_loc_obj = new Bio::Location::Split;
    foreach my $exon (@sorted_exons) {
        $coding_loc_obj->add_sub_Location($exon->location);
    }

    # Build a spliced feature representing the whole gene
    my $spliced_feat = new Bio::SeqFeature::Generic(
        -start  => $coding_loc_obj->start,
        -end    => $coding_loc_obj->end,
        -strand => $strand_num,
        -primary=> "splicedGene",
    );
    $spliced_feat->location($coding_loc_obj);

    # Attach a contig object containing the sequence
    $spliced_feat->attach_seq($contig_obj->bioperl_object);

    # Get the spliced seq and translate to protein:
    my $coding_seq = $spliced_feat->spliced_seq->seq;
    my $protein = $spliced_feat->spliced_seq->translate->seq;


From bix at sendu.me.uk  Thu Dec  7 17:45:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 7 Dec 2006 15:45:32 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release
Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.
_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cjfields at uiuc.edu  Thu Dec  7 18:00:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 16:00:43 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From kaboroev at sfu.ca  Thu Dec  7 17:26:35 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Thu, 07 Dec 2006 14:26:35 -0800
Subject: [Bioperl-l] Bio::Graphics xyplot
Message-ID: <4578951B.5050206@sfu.ca>

Hi everyone,

I'm attempting to add an xyplot of the phred quality scores to an
Bio::Graphics image, and cannot get it to work.
I have the panel with a track for both the scale and the DNA displaying
properly.  When I attempt to add the xyplot i just get a garbled track
of, what looks like, timy xyplots for each datapoint.  I have the cvs
(updated today) of bioperl-live running.  I think what I am missing is
the creation of a "Sequence Feature Group" to hold the individual points
of the plot.  However, I cannot seem to find such an object. This is
what I attempted:

-------BEGIN---CODE-----------
# start panel
my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
                      -width     => $f_seqlen*10,
                      -pad_left  => 10,
                      -pad_right => 10,
                      -grid      => 1
                      );
# add scale
$panel->add_track(arrow =>
Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
              -double  => 1,
              -tick    => 2,
              -fgcolor => 'black');
# add DNA ($feature is of type Bio::SeqFeature::Annotated)
$panel->add_track(dna => $feature);
# get list of quality scores from database
my ($pqs_value) = $dbh->selectrow_array($sql);
my @pqs_value = split(/\s/,$pqs_value);
# create track
my $track =  $panel->add_track(-glyph        => 'xyplot',
                   -graph_type   => 'points',
                   -point_symbol => 'point',
                   -max_score    => 100,
                   -min_score    => 0,
                   -scale        => 'none');
# add "subfeatures" to
for (my $i=0;$i<$f_seqlen;$i++) {
   
$track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));

}
print $panel->png();
$panel->finished;
------END---CODE----------

I also attempted to create an array of the point features and passed
that by reference to the panel "add_track" as it describes in the xyplot
documentation, but that resulted in the exact same image.

keith

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From arareko at campus.iztacala.unam.mx  Thu Dec  7 18:15:53 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 7 Dec 2006 16:15:53 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cain at cshl.edu  Thu Dec  7 17:46:09 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 07 Dec 2006 17:46:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	a	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <1165531569.2569.49.camel@localhost.localdomain>

Amir,

I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:

  http://www.sequenceontology.org/gff3.shtml

(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).

Scott


On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
> 
> I'm reading in some fungal GFFs generated by Jason Stajich. I
> 
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
> 
> (Code below)
> 
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
> 
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
> 
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
> 
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
> 
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
> 
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
> 
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> 
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment-0003.bin>

From cjfields at uiuc.edu  Thu Dec  7 21:52:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 20:52:47 -0600
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain>
Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine>

Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented.  However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond).  It's one of the things I want to work out for the next release.

If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.

chris

> Amir,
> 
> I don't know for sure what the problem is, but here is one 
> possibility:
> the number in column 8 of a GFF file is not the frame, it is 
> the phase.
> See the GFF3 spec for a description of what the phase is:
> 
>   http://www.sequenceontology.org/gff3.shtml
> 
> (It doesn't matter if you are using GFF3 or GFF2, as the 
> phase is the same in both).
> 
> Scott
> 
> 
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features 
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be 
> > translated into a protein.
> > 
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > 
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's 
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above 
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> > 
> > (Code below)
> > 
> > Unfortunately, I get the wrong result when the GFF features 
> have frame 
> > != 0. This happens for only a few percent of the exons, but when it 
> > does, I end up translating in the wrong frame.
> > 
> > If I read the docs correctly, Location objects don't have a 
> frame. So 
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the 
> > beginning of certain exons?
> > 
> > I suspect the answer to this is that I'm going about this in 
> > completely the wrong way, in which case, please tell me how 
> I ought to be doing it.
> > 
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> > 
> > P.S. In case you want to see actual code, here it is. After using 
> > Bio::Tools::GFF to create a sorted list of features for each exon 
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> > 
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> > 
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > 
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;


From jason at bioperl.org  Thu Dec  7 21:01:33 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 7 Dec 2006 18:01:33 -0800
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org>

This was a problem in the gene prediction output I suspect, more  
recent versions of the program should have fixed this.  I do not  
currently have free time to deal with the errors in the small number  
of ORFs where this has happened.

I think you just need to do
  start -= start- (frame*strand)
for 1st exons.

You can also probably provide the 1st exon's frame to the translate  
function as another possibility but you should try and get the CDS  
correct first depending on your downstream analyses.

-jason
On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:

> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in  
> completely
> the wrong way, in which case, please tell me how I ought to be  
> doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Fri Dec  8 05:21:50 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 8 Dec 2006 15:51:50 +0530
Subject: [Bioperl-l] need help with phrap parser
Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>

Can anyone point me to a Phrap parser which parses the ace file to extract
what reads make up each contig (eg. read_a and read_b make contig1; read_d
read_e and read_z make contig2, and other information of the reads (like
whether the read is complemented or not with respect to the contig, what
region of the contig does each read contribute etc), basically the AF and BS
lines of the ACE output.

-- 
-Neeti
Even my blood says, B positive


From pmiguel at purdue.edu  Fri Dec  8 09:17:02 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 08 Dec 2006 09:17:02 -0500
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
Message-ID: <457973DE.6050900@purdue.edu>

neeti somaiya wrote:
> Can anyone point me to a Phrap parser which parses the ace file to extract
> what reads make up each contig (eg. read_a and read_b make contig1; read_d
> read_e and read_z make contig2, and other information of the reads (like
> whether the read is complemented or not with respect to the contig, what
> region of the contig does each read contribute etc), basically the AF and BS
> lines of the ACE output.
>
>   
neeti,

    To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. 
I write this because of an email sent me by David Gordon in 2001 included here 
without his permission:


> > Phrap writes BS lines which
> > indicate, for each consensus position, which read phrap uses at that
> > position to become the consensus.  These BS ("base segments") are 
> > manipulated by Consed when there are changes to the assembly, such as
> > joins, tears, removing reads, or changing the consensus.
>   
    The simplest way is:

egrep '^CO|AF|RD' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF|RD/);
}

But then you would need to parse the fields of interest. You get the 
position/strand in the contig from AF, then you get the length of the 
read from RD.

There does look like there is a part of bioperl that meant to perform 
this task--including Bio::Assembly::IO::ace but it looks like it was 
started, but never completed.


From cjfields at uiuc.edu  Fri Dec  8 10:17:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:17:31 -0600
Subject: [Bioperl-l] NAR Database Issue Papers
Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine>

For those interested, the Nucleic Acids Research Database issue papers have
been popping up in the Advance Access section of the NAR website:

http://nar.oxfordjournals.org/papbyrecent.dtl

Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are
represented.  Of particular note are a few mentions of formatting changes to
UniProt, EMBL, and other records, which should be taken care of in the
latest BioPerl release (fingers crossed!).

chris


From cjfields at uiuc.edu  Fri Dec  8 10:31:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:31:19 -0600
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <457973DE.6050900@purdue.edu>
Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine>

...
> But then you would need to parse the fields of interest. You get the 
> position/strand in the contig from AF, then you get the length of the 
> read from RD.
> 
> There does look like there is a part of bioperl that meant to perform 
> this task--including Bio::Assembly::IO::ace but it looks like it was 
> started, but never completed.

...and if anyone wants to chip in and work on it, let us know!   The various
Bio::Assembly modules are one of many areas that needs some updating.

chris


From akarger at CGR.Harvard.edu  Fri Dec  8 13:25:47 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:25:47 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>

> This was a problem in the gene prediction output I suspect, more  
> recent versions of the program should have fixed this.  I do not  
> currently have free time to deal with the errors in the small number  
> of ORFs where this has happened.
> 
> I think you just need to do
>   start -= start- (frame*strand)
> for 1st exons.

I used
    if (strand==1) {start += exon->frame}
    else {end -= exon->frame}

This took me from 90 translations that had * within the sequence to just
9, out of 5500 CDS in S bayanus.

> You can also probably provide the 1st exon's frame to the translate  
> function as another possibility but you should try and get the CDS  
> correct first depending on your downstream analyses.

Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase",
which I had never heard of before. My current, very limited,
understanding is that sometimes you'll have an exon with, say, 31 bp,
followed by an exon with 29 bp. When the intron gets spliced out, you
eventually get an mRNA of 60 bp, which translates to a protein of 20 aa.
But the second exon has a phase of 1, not 0, because you can't just
start translating at the first bp of the second exon and expect to get
nice amino acids.

By the way, whether or not phase is the same thing as frame, when I call
the frame() method on the features created by Bio::Tools::GFF, I get the
phase info. I assume that's a feature (no pun intended), not a bug?

I'm still confused as to why you would have a phase in the first exon,
though. Why not just say the CDS starts 1 or 2 bp later? (This is
probably a bio question, not a bioperl question, but a quick Google
didn't get me an answer. "Phase" isn't a very good search term.)

I guess the real question here, which Jason alludes to, is whether
SeqFeature->spliced_seq ought to take into account the phase information
of the first exon. Right now, it doesn't, so when you call
SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
where you would want spliced_seq to include the first bp or two? Should
there be an option to spliced_seq for whether you want to take phase
information into account?

I can't submit a bug report until we confirm it's a bug.

Thanks,
-Amir Karger

> -jason
> On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:
> 
> > I need to know how to get the frame information in exon features
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be
> > translated into a protein.
> >
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> >
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> >
> > (Code below)
> >
> > Unfortunately, I get the wrong result when the GFF features 
> have frame
> > != 0. This happens for only a few percent of the exons, but when it
> > does, I end up translating in the wrong frame.
> >
> > If I read the docs correctly, Location objects don't have a 
> frame. So
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the
> > beginning of certain exons?
> >
> > I suspect the answer to this is that I'm going about this in  
> > completely
> > the wrong way, in which case, please tell me how I ought to be  
> > doing it.
> >
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> >
> > P.S. In case you want to see actual code, here it is. After using
> > Bio::Tools::GFF to create a sorted list of features for each exon
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> >
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> >
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> >
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From akarger at CGR.Harvard.edu  Fri Dec  8 13:33:09 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:33:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>

> Another issue is the splittype() is not defined, though I 
> don't think that
> would kill anything as currently implemented.  However, one 
> thing we have
> passingly discussed is having Bio::Location::Split objects 
> possibly exhibit
> different (but expected) behaviors based upon the splittype() 
> (order, join,
> or bond).  It's one of the things I want to work out for the 
> next release.

Should I be writing -splittype => "JOIN" or some such in my new()?

-Amir Karger

> 
> chris
> 
> > Amir,
> > 
> > I don't know for sure what the problem is, but here is one 
> > possibility:
> > the number in column 8 of a GFF file is not the frame, it is 
> > the phase.
> > See the GFF3 spec for a description of what the phase is:
> > 
> >   http://www.sequenceontology.org/gff3.shtml
> > 
> > (It doesn't matter if you are using GFF3 or GFF2, as the 
> > phase is the same in both).
> > 
> > Scott
> > 
> > 
> > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > > I need to know how to get the frame information in exon features 
> > > (created by Bio::Tools::GFF) into a whole-gene feature 
> that will be 
> > > translated into a protein.
> > > 
> > > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > > 
> > > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > > - Create a Bio::Location::Split object containing each feature's 
> > > location
> > > - Create a Bio::SeqFeature::Generic object whose location 
> > is the above 
> > > BL::Split
> > > - Attach my contig Bio::Seq to the feature
> > > - get the protein with feature->spliced_seq->translate->seq
> > > 
> > > (Code below)
> > > 
> > > Unfortunately, I get the wrong result when the GFF features 
> > have frame 
> > > != 0. This happens for only a few percent of the exons, 
> but when it 
> > > does, I end up translating in the wrong frame.
> > > 
> > > If I read the docs correctly, Location objects don't have a 
> > frame. So 
> > > how do I get the correct spliced_seq, which skips one or 
> > two bp at the 
> > > beginning of certain exons?
> > > 
> > > I suspect the answer to this is that I'm going about this in 
> > > completely the wrong way, in which case, please tell me how 
> > I ought to be doing it.
> > > 
> > > Thanks,
> > > - Amir Karger
> > > Research Computing
> > > Life Sciences Division
> > > Harvard University
> > > 
> > > P.S. In case you want to see actual code, here it is. After using 
> > > Bio::Tools::GFF to create a sorted list of features for each exon 
> > > (basically stolen from the module POD), I:
> > >     # Create a new object representing the exons' gene
> > >     my $coding_loc_obj = new Bio::Location::Split;
> > >     foreach my $exon (@sorted_exons) {
> > >         $coding_loc_obj->add_sub_Location($exon->location);
> > >     }
> > > 
> > >     # Build a spliced feature representing the whole gene
> > >     my $spliced_feat = new Bio::SeqFeature::Generic(
> > >         -start  => $coding_loc_obj->start,
> > >         -end    => $coding_loc_obj->end,
> > >         -strand => $strand_num,
> > >         -primary=> "splicedGene",
> > >     );
> > >     $spliced_feat->location($coding_loc_obj);
> > > 
> > >     # Attach a contig object containing the sequence
> > >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > > 
> > >     # Get the spliced seq and translate to protein:
> > >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> > >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> 
> 
> 


From cjfields at uiuc.edu  Fri Dec  8 14:04:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 13:04:55 -0600
Subject: [Bioperl-l] Using frame info from GFF
	ingettinga	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>
Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine>


> > Another issue is the splittype() is not defined, though I 
> don't think 
> > that would kill anything as currently implemented.  
> However, one thing 
> > we have passingly discussed is having Bio::Location::Split objects 
> > possibly exhibit different (but expected) behaviors based upon the 
> > splittype() (order, join, or bond).  It's one of the things 
> I want to 
> > work out for the next release.
> 
> Should I be writing -splittype => "JOIN" or some such in my new()?
> 
> -Amir Karger

I missed the fact that 'JOIN' is the default splittype() from looking at the
constructor in Location::Split, so you actually don't have to explicitly set
it; apologies for that.  

If we make any changes that affect how Location::Split behaves we'll likely
leave the default splittype() as 'JOIN' as it's by far the most common join
operator.  

chris


From cjfields at uiuc.edu  Fri Dec  8 15:03:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 14:03:16 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>
Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine>

> Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> "phase", which I had never heard of before. My current, very 
> limited, understanding is that sometimes you'll have an exon 
> with, say, 31 bp, followed by an exon with 29 bp. When the 
> intron gets spliced out, you eventually get an mRNA of 60 bp, 
> which translates to a protein of 20 aa.
> But the second exon has a phase of 1, not 0, because you 
> can't just start translating at the first bp of the second 
> exon and expect to get nice amino acids.

I think the use of 'frame' here is meant relative to the DNA sequence (i.e.
ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
translation, three frames).  At least I think that's what is meant!

> By the way, whether or not phase is the same thing as frame, 
> when I call the frame() method on the features created by 
> Bio::Tools::GFF, I get the phase info. I assume that's a 
> feature (no pun intended), not a bug?
> 
> I'm still confused as to why you would have a phase in the 
> first exon, though. Why not just say the CDS starts 1 or 2 bp 
> later? (This is probably a bio question, not a bioperl 
> question, but a quick Google didn't get me an answer. "Phase" 
> isn't a very good search term.)

It could be b/c the location coordinates delineate the exon coding boundary.
It's conceivable the first exon in a sequence record is not the first exon
of the mRNA (i.e. there may be one or more exons prior to or past the exon
of interest that are in 'remote' sequence records).  Like this admittedly
extreme example (GB acc AF130134):

join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174,
AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595,
AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115,
AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428,
AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401,
AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128)

Also, the ends of the lcoation may be uncertain ('fuzzy'):

join(complement(1009..>1260),complement(AF081827.1:<1..177))

> I guess the real question here, which Jason alludes to, is whether
> SeqFeature->spliced_seq ought to take into account the phase 
> information
> of the first exon. Right now, it doesn't, so when you call
> SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
> where you would want spliced_seq to include the first bp or 
> two? Should there be an option to spliced_seq for whether you 
> want to take phase information into account?
> 
> I can't submit a bug report until we confirm it's a bug.
> 
> Thanks,
> -Amir Karger

You can already pass the frame or an offset to PrimarySeqI::translate().
Here are the args:

 Args    : -terminator    - character for terminator        default is *
           -unknown       - character for unknown           default is X
           -frame         - frame                           default is 0
           -codontable_id - codon table id                  default is 1
           -complete      - complete CDS expected           default is 0
           -throw         - throw exception if not complete default is 0
           -orf           - find 1st ORF                    default is 0
           -start         - alternative initiation codon
           -codontable    - Bio::Tools::CodonTable object
           -offset        - offset for fuzzy locations      default is 0

The offset comes from some GenBank seqfeatures which have an '\codon_start'
tag indicating which nucleotide to start translation from (1,2,3).  This is
essentially just the phase+1.  We could add a '-phase' argument for
convenience which accepts 0,1,2.

chris


From bobfreemanma at speakeasy.net  Fri Dec  8 15:47:15 2006
From: bobfreemanma at speakeasy.net (Bob Freeman)
Date: Fri, 8 Dec 2006 15:47:15 -0500
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
Message-ID: <p0623090bc19f7f46bd1d@[10.0.107.251]>

Can't seem to find a good post on this to answer my question:

Does anyone know a good way to (re)write BLAST reports in XML format? 
I've got about 30,000 reports I need to rewrite for a (good!) piece 
of java software that will only import xml formatted BLAST reports. 
Right now, all mine are plain text.

I don't think bioperl can do this yet, correct? If not, any 
suggestions, besides reblasting all 30,000? I'd like to save a few 
trees and lumps of coal.

TIA,
Bob

-- 

-----------------------------------------------------
Bob Freeman, Ph.D.
Bioinformatics consultant
51 Downer Avenue, #2
Dorchester, MA  02125
617/699.7057, vox

If brains were taxed, he'd get a refund.
-- Anonymous


From camp_boot at hotmail.com  Sun Dec 10 05:00:55 2006
From: camp_boot at hotmail.com (synapse)
Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC)
Subject: [Bioperl-l] Driver program for PestFind.pm
Message-ID: <loom.20061210T105614-429@post.gmane.org>

   Dear All, 

   I apologize in advance for my almost total lack of knowledge of perl as a 
programming language. 

   I need to use PestFind program, part of the biop_run package of bioperl. My 
understanding is that I will need a simple wrapper program that will read 
arguments from the command line, and pass them to that module. 

   - Is there such program available that I can just use?

   - Does anyone know if pestfind can work on multiple sequence files (in fasta 
format), or does it only process single sequence files?

   Thanks a lot for the feedback. 


From cjfields at uiuc.edu  Sun Dec 10 13:45:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:45:26 -0600
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <p0623090bc19f7f46bd1d@[10.0.107.251]>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
	<p0623090bc19f7f46bd1d@[10.0.107.251]>
Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu>


On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote:

> Can't seem to find a good post on this to answer my question:
>
> Does anyone know a good way to (re)write BLAST reports in XML format?
> I've got about 30,000 reports I need to rewrite for a (good!) piece
> of java software that will only import xml formatted BLAST reports.
> Right now, all mine are plain text.
>
> I don't think bioperl can do this yet, correct? If not, any
> suggestions, besides reblasting all 30,000? I'd like to save a few
> trees and lumps of coal.
>
> TIA,
> Bob

The only BioPerl writers for BLAST reports are in BSML and HTML, not  
BLAST XML.  I don't think there there have been any requests for it,  
and no one has really stepped forward to submit one.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 10 13:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:55:16 -0600
Subject: [Bioperl-l] Driver program for PestFind.pm
In-Reply-To: <loom.20061210T105614-429@post.gmane.org>
References: <loom.20061210T105614-429@post.gmane.org>
Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu>


On Dec 10, 2006, at 4:00 AM, synapse wrote:

>    Dear All,
>
>    I apologize in advance for my almost total lack of knowledge of  
> perl as a
> programming language.
>
>    I need to use PestFind program, part of the biop_run package of  
> bioperl. My
> understanding is that I will need a simple wrapper program that  
> will read
> arguments from the command line, and pass them to that module.

PestFind is part of the EMBOSS suite of programs:

http://emboss.sourceforge.net/

The PestFind module in bioperl-run is actually used via Pise.

>    - Is there such program available that I can just use?

See above

>    - Does anyone know if pestfind can work on multiple sequence  
> files (in fasta
> format), or does it only process single sequence files?
>
>    Thanks a lot for the feedback.

No idea there, but the EMBOSS docs should tell you.

chris


From cjfields at uiuc.edu  Mon Dec 11 00:38:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 23:38:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>

I am writing up a few bioperl-run modules and have a simple question,  
though I don't know if anyone knows the answer.  I was curious as to  
why parameters for most (all?) bioperl-run modules lack the '-'  
preceding them.  This came up re: StandAloneBlast last week  
(something Torsten fixed), but I noticed just about every bioperl-run  
module uses the dashless parameters.

chris


From n.haigh at sheffield.ac.uk  Mon Dec 11 01:44:25 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Mon, 11 Dec 2006 06:44:25 +0000
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457CFE49.5010201@sheffield.ac.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

No idea!

Is there any reason for/against using dashed/dashless parameters? I
suppose dshed parameters allow you to easy see which tokens on the
command line are parameters and which are values. Should modules be able
to accept both? Should dashed be preferred?

Nath


From cjfields at uiuc.edu  Mon Dec 11 08:06:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 07:06:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457CFE49.5010201@sheffield.ac.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457CFE49.5010201@sheffield.ac.uk>
Message-ID: <D223B6BF-7C0C-41BF-B267-8C07F82FDD7D@uiuc.edu>


On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple question,
>> though I don't know if anyone knows the answer.  I was curious as to
>> why parameters for most (all?) bioperl-run modules lack the '-'
>> preceding them.  This came up re: StandAloneBlast last week
>> (something Torsten fixed), but I noticed just about every bioperl-run
>> module uses the dashless parameters.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> No idea!
>
> Is there any reason for/against using dashed/dashless parameters? I
> suppose dshed parameters allow you to easy see which tokens on the
> command line are parameters and which are values. Should modules be  
> able
> to accept both? Should dashed be preferred?
>
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

I'm thinking about it from the point of consistency.  When using a  
mix of core and run modules it can be a bit confusing, particularly  
when (as pointed out in the previous thread on StandAloneBlast) you  
can use only dashed parameters with core modules, while most (all?)  
run modules only accept dashless ones (in most cases some exception  
is thrown).  Torsten fixed this in StandAloneBlast so it accepts  
both, but shouldn't this rule also apply to all run modules?

Much of this probably is probably due to the donated nature of much  
of the bioperl-run code and Jason's 'cat-herding', and I understand  
that it would be a lot of work to change this for all run modules.   
However, we could at least try to start enforcing some loose rules  
with new bioperl-run wrappers (e.g. implement WrapperBase, use core- 
like parameters, etc).

chris


From akarger at CGR.Harvard.edu  Mon Dec 11 11:20:03 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 11 Dec 2006 11:20:03 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>

Chris Fields wrote:
> 
> > Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> > "phase", which I had never heard of before. My current, very 
> > limited, understanding is that sometimes you'll have an exon 
> > with, say, 31 bp, followed by an exon with 29 bp. When the 
> > intron gets spliced out, you eventually get an mRNA of 60 bp, 
> > which translates to a protein of 20 aa.
> > But the second exon has a phase of 1, not 0, because you 
> > can't just start translating at the first bp of the second 
> > exon and expect to get nice amino acids.
> 
> I think the use of 'frame' here is meant relative to the DNA 
> sequence (i.e.
> ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
> translation, three frames).  At least I think that's what is meant!

I agree. By the way, I'd love a reference to a simple bio-explanation of
what's happening here. Google searches for "coding sequence phase" are
not all that relevant.

> > I'm still confused as to why you would have a phase in the 
> > first exon, though. Why not just say the CDS starts 1 or 2 bp 
> > later? (This is probably a bio question, not a bioperl 
> > question, but a quick Google didn't get me an answer. "Phase" 
> > isn't a very good search term.)
> 
> It could be b/c the location coordinates delineate the exon 
> coding boundary.
> It's conceivable the first exon in a sequence record is not 
> the first exon
> of the mRNA (i.e. there may be one or more exons prior to or 
> past the exon
> of interest that are in 'remote' sequence records).

That's certainly not the case here, because the files have the entire
genomes in them.

> Also, the ends of the lcoation may be uncertain ('fuzzy'):
> 
> join(complement(1009..>1260),complement(AF081827.1:<1..177))

Also not the case here. These locations aren't listed as fuzzy.

Any other thoughts?

> > I guess the real question here, which Jason alludes to, is whether
> > SeqFeature->spliced_seq ought to take into account the phase 
> > information
> > of the first exon. Right now, it doesn't, so when you call
> > SeqFeature->spliced_seq->translate, you get gibberish. Are 
> there cases
> > where you would want spliced_seq to include the first bp or 
> > two? Should there be an option to spliced_seq for whether you 
> > want to take phase information into account?
> 
> You can already pass the frame or an offset to 
> PrimarySeqI::translate().
>  We could add a '-phase' argument for
> convenience which accepts 0,1,2.

But as Jason pointed out, you should find the problem earlier. What if I
want to get the RNA sequence that will become the protein? then having a
phase arg to translate() doesn't help. Should there be a phase arg to
spliced_seq?

Which raises another bio question: at what point are the first 1 or 2 bp
dropped when you have a phase of 1 or 2? Do they appear in the mRNA? 

-Amir Karger


From bix at sendu.me.uk  Mon Dec 11 13:21:42 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 13:21:42 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457DA1B6.1060706@sendu.me.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.

I didn't follow that particular thread, but from my experience there is 
a useful distinction between bioperl options using the - as normal for 
full consistency with core (eg. -verbose), whilst the options that 
belong to the program the run module is a wrapper for do not take 
dashes. Again, this seems consistent within the run package.

I'd suggest sticking to the current pattern.


Cheers,
Sendu.


From cjfields at uiuc.edu  Mon Dec 11 15:07:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 14:07:16 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DA1B6.1060706@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
Message-ID: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>


On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple  
>> question,  though I don't know if anyone knows the answer.  I was  
>> curious as to  why parameters for most (all?) bioperl-run modules  
>> lack the '-'  preceding them.  This came up re: StandAloneBlast  
>> last week  (something Torsten fixed), but I noticed just about  
>> every bioperl-run  module uses the dashless parameters.
>
> I didn't follow that particular thread, but from my experience  
> there is a useful distinction between bioperl options using the -  
> as normal for full consistency with core (eg. -verbose), whilst the  
> options that belong to the program the run module is a wrapper for  
> do not take dashes. Again, this seems consistent within the run  
> package.

I respectfully disagree that this is a 'useful' distinction.  My main  
point is consistency.  To me, it's counterintuitive to have two  
Bioperl classes, both which inherit Bio::Root::Root, use two  
different syntaxes for any parameters passed to the constructor, even  
if some are 'program' parameters.  It's also not consistent with  
StandAloneBlast or RemoteBlast, both which are considered bioperl-run  
modules even though they are in core, and both or which use dashed  
parameters (StandAloneBlast actually allows both).  In fact, it isn't  
consistent within bioperl-run itself.   
Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a  
hashref!

Okay, judging by the previous examples, 'consistency' isn't a word I  
would use to describe bioperl-run as a whole (back to Jason's 'cat- 
herding' analogy).  It would be easier to let it slide for now,  
especially since changing them would be a serious pain, not to  
mention an API issue.  But shouldn't there be some consistency?

And what about new modules?  Do we follow the historical (possibly  
confusing) 'dashless' route, or use the core-like dashed approach  
(thus breaking from the other run modules)?

> I'd suggest sticking to the current pattern.
>
>
> Cheers,
> Sendu.

I'll allow for both, ala StandAloneBlast.  Doesn't hurt to be safe. ; >

Have fun at the hackathon!

chris


From bix at sendu.me.uk  Mon Dec 11 16:19:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 16:19:55 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
Message-ID: <457DCB7B.8050500@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I am writing up a few bioperl-run modules and have a simple 
>>> question,  though I don't know if anyone knows the answer.  I was 
>>> curious as to  why parameters for most (all?) bioperl-run modules 
>>> lack the '-'  preceding them.  This came up re: StandAloneBlast last 
>>> week  (something Torsten fixed), but I noticed just about every 
>>> bioperl-run  module uses the dashless parameters.
>>
>> I didn't follow that particular thread, but from my experience there 
>> is a useful distinction between bioperl options using the - as normal 
>> for full consistency with core (eg. -verbose), whilst the options that 
>> belong to the program the run module is a wrapper for do not take 
>> dashes. Again, this seems consistent within the run package.
> 
> I respectfully disagree that this is a 'useful' distinction.  My main 
> point is consistency.
[snip]

We're on the same page in terms of what we think would be a Good Thing, 
and allowing both ways (dashed and dashless) sounds reasonable. I was 
just suggesting why bioperl-run might be the way it was. Further to 
that, there is the practical aspect that it is a lot simpler to figure 
out which are the program options so they can be farmed out to the 
AUTOLOAD methods - again something that isn't done in core.

If you come up with some generic way of dealing with options and farming 
to AUTOLOAD, perhaps there's scope for applying it to all the run 
wrappers (ideally via one of their base classes), so they all instantly 
gain dashed-mode capability.


From cjfields at uiuc.edu  Mon Dec 11 17:05:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 16:05:56 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DCB7B.8050500@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
	<457DCB7B.8050500@sendu.me.uk>
Message-ID: <F046DB23-35C7-414A-8616-46D3C5760B49@uiuc.edu>


On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote:
...

>>
>> I respectfully disagree that this is a 'useful' distinction.  My main
>> point is consistency.
> [snip]
>
> We're on the same page in terms of what we think would be a Good  
> Thing,
> and allowing both ways (dashed and dashless) sounds reasonable. I was
> just suggesting why bioperl-run might be the way it was. Further to
> that, there is the practical aspect that it is a lot simpler to figure
> out which are the program options so they can be farmed out to the
> AUTOLOAD methods - again something that isn't done in core.

Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly  
code maintenance.  I'm somewhat neutral on the idea of using AUTOLOAD  
as a short-term solution, though using heredoc and an eval{} block  
works well for me (and shows up when using $self->can('method') or  
when checking for methods via Class::Inspector).

> If you come up with some generic way of dealing with options and  
> farming
> to AUTOLOAD, perhaps there's scope for applying it to all the run
> wrappers (ideally via one of their base classes), so they all  
> instantly
> gain dashed-mode capability.

I think that's the crux of the problem; they do not all have the same  
base class (except Bio::Root::Root).  Most use WrapperBase.  I  
thought at one point a Run-specific root module would be a good idea,  
but WrapperBase already works well.

I'll go ahead with my modules and think about it some more.  You  
could ask the powers-that-be (jason, hilmar, etc) what they think as  
well.

chris


From bosborne11 at verizon.net  Mon Dec 11 17:24:54 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 11 Dec 2006 17:24:54 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <C1A344E6.BE53%bosborne11@verizon.net>

Amir,

Google "intron phase", you will see a number of useful links.

Brian O.


On 12/11/06 11:20 AM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:

> I agree. By the way, I'd love a reference to a simple bio-explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.


From cjfields at uiuc.edu  Mon Dec 11 22:20:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 21:20:06 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <E6F0CA09-EF9F-42AF-BF67-35E4FDBCAD8C@uiuc.edu>


On Dec 11, 2006, at 10:20 AM, Amir Karger wrote:

>> I think the use of 'frame' here is meant relative to the DNA
>> sequence (i.e.
>> ORF searching, 6 frames) and the 'phase' is relative to the mRNA  
>> (i.e.
>> translation, three frames).  At least I think that's what is meant!
>
> I agree. By the way, I'd love a reference to a simple bio- 
> explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.

Ah, Brian found some links I see...

>> It could be b/c the location coordinates delineate the exon
>> coding boundary.
>> It's conceivable the first exon in a sequence record is not
>> the first exon
>> of the mRNA (i.e. there may be one or more exons prior to or
>> past the exon
>> of interest that are in 'remote' sequence records).
>
> That's certainly not the case here, because the files have the entire
> genomes in them.
>
>> Also, the ends of the lcoation may be uncertain ('fuzzy'):
>>
>> join(complement(1009..>1260),complement(AF081827.1:<1..177))
>
> Also not the case here. These locations aren't listed as fuzzy.
>
> Any other thoughts?

Which GFF files did you use?  More specifically, which genes in which  
GFF file?  I saw a reference to S. bayanus, but it's hard to work out  
what could be the problem unless we know a bit more.

>>> I guess the real question here, which Jason alludes to, is whether
>>> SeqFeature->spliced_seq ought to take into account the phase
>>> information
>>> of the first exon. Right now, it doesn't, so when you call
>>> SeqFeature->spliced_seq->translate, you get gibberish. Are
>> there cases
>>> where you would want spliced_seq to include the first bp or
>>> two? Should there be an option to spliced_seq for whether you
>>> want to take phase information into account?
>>
>> You can already pass the frame or an offset to
>> PrimarySeqI::translate().
>>  We could add a '-phase' argument for
>> convenience which accepts 0,1,2.
>
> But as Jason pointed out, you should find the problem earlier. What  
> if I
> want to get the RNA sequence that will become the protein? then  
> having a
> phase arg to translate() doesn't help. Should there be a phase arg to
> spliced_seq?

You'll also note Jason mentioned there were possible errors in the  
gene prediction programs which produced the output

spliced_seq() is supposed to return the DNA sequence of a split  
location by splicing together the sublocation sequences in their  
'join' order.  So, if the first exon was out of phase, once spliced  
they should all be out of phase to the same degree, assuming all  
exons are joined together correctly.   Translating this using the  
phase should produce the correct amino acid sequence.

Note that Jason suggested passing the frame/phase of the first exon  
to translate(), not spliced_seq().  I also suggested translate().

> Which raises another bio question: at what point are the first 1 or  
> 2 bp
> dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
>
> -Amir Karger

Any sequence present in the sublocations (exons) would be in the  
spliced sequence.  This would have to include those nucleotides in  
exons skipped b/c of the phase since they are part of the coding region.

chris


From neetisomaiya at gmail.com  Tue Dec 12 07:06:20 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:36:20 +0530
Subject: [Bioperl-l] need help in phredPhrap
Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com>

Hi,

I am running phredPharp, which runs phred, phrap and polyphred. Please refer
to the "Using a reference sequence" section of this link
http://droog.mbt.washington.edu/poly_doc50.html#REFER.
I am using the reference sequence as described in the link above.
With this I am getting the SNP positions on the contig sequence as well as
on the reference sequence.
Does anyone know if there is some output file which can also give me mapping
between contig sequence and reference sequence?
-- 
-Neeti
Even my blood says, B positive


From akarger at CGR.Harvard.edu  Tue Dec 12 11:05:43 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 12 Dec 2006 11:05:43 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>

(sorry if this thread is boring people)

Chris Fields wrote: 

> > I agree. By the way, I'd love a reference to a simple bio- 
> > explanation of
> > what's happening here. Google searches for "coding sequence 
> phase" are
> > not all that relevant.
> 
> Ah, Brian found some links I see...

Thanks, Brian! Amazing how "coding sequence phase" finds nothing but
"intron phase" finds a ton. This is why you need to actually learn
biology, rather than Googling it.

> Which GFF files did you use?  More specifically, which genes 
> in which  
> GFF file?  I saw a reference to S. bayanus, but it's hard to 
> work out  
> what could be the problem unless we know a bit more.

http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus
.20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)

c127 (for example) has two lines in that file:
sbay_c127       AUGUSTUS        mRNA    263     723     .       +
.       ID=sbay_c127-g1.1
sbay_c127       AUGUSTUS        CDS     263     723     .       +
1       Parent=sbay_c127-g1.1

Now go to gbrowse page:
http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
Type "sbay_c127:250-300" in the search box. 

As you can see from the translation track, if you start at bp 263, you
hit a stop codon after just a few aas. But if you use frame2/phase 1,
you get no stop codons all the way to the end of the contig.

> >> You can already pass the frame or an offset to
> >> PrimarySeqI::translate().
> >>  We could add a '-phase' argument for
> >> convenience which accepts 0,1,2.
> >
> >  What if I
> > want to get the RNA sequence that will become the protein? then  
> > having a
> > phase arg to translate() doesn't help. Should there be a 
> phase arg to
> > spliced_seq?
> 
> You'll also note Jason mentioned there were possible errors in the  
> gene prediction programs which produced the output

That's certainly possible. No gene prediction program will be perfect.
In this case, though, it's clear that it found a large region without
stop codons in it, and correctly identified the place to start
translating. I guess I'm just surprised that, if it found just one exon
in a gene (in the whole contig) why it would say the exon starts at 263
with a phase 1, instead of just saying it starts at 264.

> spliced_seq() is supposed to return the DNA sequence of a split  
> location by splicing together the sublocation sequences in their  
> 'join' order.  So, if the first exon was out of phase, once spliced  
> they should all be out of phase to the same degree, assuming all  
> exons are joined together correctly.   Translating this using the  
> phase should produce the correct amino acid sequence.
> 
> Note that Jason suggested passing the frame/phase of the first exon  
> to translate(), not spliced_seq().  I also suggested translate().

You're right. This brings the number of translated polypeptide sequences
that have lots of *s in them to 9 instead of 90. 

I guess I have two requests here. The first is, if a person wants to see
exactly which bps are translated to aas -- a nucelotide sequece of
exactly 3N bp starting (usually) with ATG -- then they might want an
argument to spliced_seq that skips the first one or two bp when
necessary. After all, they might want to study the DNA, not the
peptides.

The second request is for "intelligent objects". If my SeqFeatures know
that they're in phase 1, then when I call spliced_seq I want the
resulting objects to know that they're phase one, such that when I call
translate, Bioperl automatically skips the first bp or two. Admittedly,
there might be big ramifications to this.

Both requests of course made in the knowledge that Bioperl is open
source & developers have a lot to do with their time.

-Amir Karger

> > Which raises another bio question: at what point are the 
> first 1 or  
> > 2 bp
> > dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
> >
> > -Amir Karger
> 
> Any sequence present in the sublocations (exons) would be in the  
> spliced sequence.  This would have to include those nucleotides in  
> exons skipped b/c of the phase since they are part of the 
> coding region.
> 
> chris
> 


From neetisomaiya at gmail.com  Tue Dec 12 07:14:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:44:10 +0530
Subject: [Bioperl-l] needle parser in bioperl?
Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>

Hi,

Does anyone know of a bioperl parser for needle output, basically I won't
where the target sequence aligns on the template (i.e. coordinate on the
template where the taget aligns).

-- 
-Neeti
Even my blood says, B positive


From cjfields at uiuc.edu  Tue Dec 12 11:57:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 10:57:27 -0600
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>


On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

> Hi,
>
> Does anyone know of a bioperl parser for needle output, basically I  
> won't
> where the target sequence aligns on the template (i.e. coordinate  
> on the
> template where the taget aligns).
>
> -- 
> -Neeti
> Even my blood says, B positive

I answered this a number of months back:

http://tinyurl.com/yzlbx5

Basically, newer versions of EMBOSS have changed the output for the  
AlignIO::emboss parser (which parses needle).  I don't believe the  
parser has been fixed to deal with that, but Jason has pointed out  
you can use MSF output when running needle, then parse using AlignIO  
with the format set to 'msf'.

chris


From bosborne11 at verizon.net  Tue Dec 12 11:51:05 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 12 Dec 2006 11:51:05 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C1A44829.BE76%bosborne11@verizon.net>

Neeti,

EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss'
format, so you can use AlignIO to get SimpleAlign objects. The best
description of how to use SimpleAlign is the documentation in the module.

Brian O.


On 12/12/06 7:14 AM, "neeti somaiya" <neetisomaiya at gmail.com> wrote:

> Hi,
> 
> Does anyone know of a bioperl parser for needle output, basically I won't
> where the target sequence aligns on the template (i.e. coordinate on the
> template where the taget aligns).


From kaboroev at sfu.ca  Tue Dec 12 12:14:39 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Tue, 12 Dec 2006 09:14:39 -0800
Subject: [Bioperl-l] BLAST reports
Message-ID: <457EE37F.4020000@sfu.ca>

Hi everyone,

I would like to manipulate my blast results with bioperl but would also
like to have the html output of the blast.  What would be the best way
of going about this, as I don't see any write functions in any of the
blast modules I have looked at.  Would it be better to create my own
html layout from the blast data then attempt to recover this from bioperl?

keith

p.s. - does anyone know what the most informative blast "alignment view"
output is? xml i suppose?

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From cjfields at uiuc.edu  Tue Dec 12 13:45:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 12:45:05 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <E073C68D-F5FD-4C48-A3E4-925B696E956A@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:
...

> http://fungal.genome.duke.edu/annotations/sbay/gff/ 
> saccharomyces_bayanus
> .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)
>
> c127 (for example) has two lines in that file:
> sbay_c127       AUGUSTUS        mRNA    263     723     .       +
> .       ID=sbay_c127-g1.1
> sbay_c127       AUGUSTUS        CDS     263     723     .       +
> 1       Parent=sbay_c127-g1.1
>
> Now go to gbrowse page:
> http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
> Type "sbay_c127:250-300" in the search box.
>
> As you can see from the translation track, if you start at bp 263, you
> hit a stop codon after just a few aas. But if you use frame2/phase 1,
> you get no stop codons all the way to the end of the contig.

Yes, but there are two things.  First, there is no distinct start  
codon.  Second, this is what the top NCBI BLASTX hit for that  
particular exon is:

 >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of  
the exocyst complex (Sec3p, Sec5p,
Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has
the essential function of mediating polarized targeting of
secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces
cerevisiae]
  gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex  
component SEC10
  gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product
  gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex  
component [Saccharomyces cerevisiae]
Length=871

  Score =  285 bits (728),  Expect = 7e-77
  Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152  
(0%)
  Frame = +2

Query  2     
FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY  181
             +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL 
+IEKY
Sbjct  168   
YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY  227

Query  182   
SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  361
             SEMMEN 
+LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE
Sbjct  228   
SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  287

Query  362  NEFENVFIKNVKFKERLVDFESHSVIVEASMQ  457
             NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ
Sbjct  288  NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ  319


Note the query start is well into the predicted coding sequence.   
Both the lack of a start codon and the above BLASTX hit suggest this  
is not actually the first exon in the coding region.  Therefore the  
sequence retrieved from spliced_seq() is only part of the full coding  
region (it seems to lack at least one 3' exon as well).

>>>> You can already pass the frame or an offset to
>>>> PrimarySeqI::translate().
>>>>  We could add a '-phase' argument for
>>>> convenience which accepts 0,1,2.
>>>
>>>  What if I
>>> want to get the RNA sequence that will become the protein? then
>>> having a
>>> phase arg to translate() doesn't help. Should there be a
>> phase arg to
>>> spliced_seq?
>>
>> You'll also note Jason mentioned there were possible errors in the
>> gene prediction programs which produced the output
>
> That's certainly possible. No gene prediction program will be perfect.
> In this case, though, it's clear that it found a large region without
> stop codons in it, and correctly identified the place to start
> translating. I guess I'm just surprised that, if it found just one  
> exon
> in a gene (in the whole contig) why it would say the exon starts at  
> 263
> with a phase 1, instead of just saying it starts at 264.

Maybe the gene prediction didn't find the first exon, or didn't tie  
the predicted exons together.  Not unusual considering the number of  
predictions made.

>> spliced_seq() is supposed to return the DNA sequence of a split
>> location by splicing together the sublocation sequences in their
>> 'join' order.  So, if the first exon was out of phase, once spliced
>> they should all be out of phase to the same degree, assuming all
>> exons are joined together correctly.   Translating this using the
>> phase should produce the correct amino acid sequence.
>>
>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger

You may want to post these as enhancement requests to Bugzilla just  
so we can keep track.  I think passing a phase parameter to  
spliced_seq() can be easily accomplished; it's just a matter of  
returning a subseq of the spliced sequence based on the phase if  
set.  In fact, I am testing it out now.

The second may be more problematic, since there may be a time when  
one would want those extra nucleotides, so I don't think we would  
want removal of said nucleotides to be the default behavior.

Chris


From dmessina at wustl.edu  Tue Dec 12 13:44:29 2006
From: dmessina at wustl.edu (David Messina)
Date: Tue, 12 Dec 2006 12:44:29 -0600
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
References: <457EE37F.4020000@sfu.ca>
Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu>

Hi Keith,

Take a look at:
http://www.bioperl.org/wiki/HOWTO:SearchIO

You can read in a whole bunch of different blast formats (see Table  
1), and it is possible to write out in HTML. See:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output


I'm not sure what you mean by the most informative blast output. If  
you mean which one gives the most information, I'm pretty sure the  
standard Blast report has everything.


Dave


From neetisomaiya at gmail.com  Tue Dec 12 07:09:39 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:39:39 +0530
Subject: [Bioperl-l] problem in running needle
Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>

I am trying to run needle for the attached two sequence files, on a linux
machine. It says "Uncaught exception:  Assertion failed, raised at ajmem.c
:187".
Can anyone tell me what this could be coz of?

-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SEQ_1.REF
Type: application/octet-stream
Size: 44208 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq_of_contig11
Type: application/octet-stream
Size: 44344 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0007.obj>

From cjfields at uiuc.edu  Tue Dec 12 15:55:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 14:55:07 -0600
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <E5BB270E-46D1-4A8C-A268-938FF8235B67@uiuc.edu>


On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

This would be an EMBOSS error, not a BioPerl error.  Maybe the emboss  
list is the best place for this question?

http://emboss.open-bio.org/mailman/listinfo/emboss

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec 12 16:30:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 15:30:30 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:

>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger
...

Amir,

I committed some code to CVS where I added a -phase parameter option  
to SeqFeatureI::spliced_seq().  I also added some tests to SeqFeature.t.

If you run the following after creating the SeqFeature object $sf  
(the seq object is $seq):

$sf->attach_seq($seq);

for my $phase (-1..3) {
     my $spliced = $sf->spliced_seq(-phase => $phase);
     print $spliced->seq,"\n";
     print $spliced->translate->seq,"\n";
}

You should get warnings for any other value than 0, 1, or 2.

I'll also note that the sequence you are having trouble with  
(sbay_c127) is 712 bp, so it doesn't contain the complete coding  
region.  I used it in the test case in SeqFeature.t.

Chris


From boris.steipe at utoronto.ca  Tue Dec 12 16:26:14 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 12 Dec 2006 16:26:14 -0500
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <F0B737D0-8555-4723-8B8D-50DAFF522AC8@utoronto.ca>

Looks like a memory allocation problem. Your whole sequence is in one  
single line, throwing a few linebreaks in there every 80th character  
or so will probably do the trick.

HTH
Boris

On 12-Dec-06, at 7:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Derek.Fairley at bll.n-i.nhs.uk  Wed Dec 13 05:00:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Wed, 13 Dec 2006 10:00:16 -0000
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C657@bllmail.bll.n-i.nhs.uk>

Hi Keith,

>I would like to manipulate my blast results with bioperl but would also
>like to have the html output of the blast.  What would be the best way
>of going about this, as I don't see any write functions in any of the
>blast modules I have looked at.  Would it be better to create my own
>html layout from the blast data then attempt to recover this from bioperl?

Take a look at some of the example scripts here:
http://www.bioperl.org/wiki/Bioperl_scripts
Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point.

>p.s. - does anyone know what the most informative blast "alignment view"
>output is? xml i suppose?

Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls.

Derek.


-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Dec 13 13:02:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 13 Dec 2006 12:02:14 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>

I am working on a few RNA-related things related to structure and  
have a few questions, specifically about Meta data.  This is sort of  
a proposal, but I would like to get everybody's thoughts about this  
to gauge what everyone thinks.  Jason, sorry to bug you but I thought  
it might be something that would be of use phylohackathon-wise.

Heikki has several modules present which adds meta data to sequences  
(Bio::Seq::Meta).  In this case, the meta data is stored as a string  
(Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array).  In both cases  
you can have multiple types of meta data for a sequence based on a  
particular tag.  However, this also assumes that the meta data is  
somehow attached strictly to sequence data of some type.  It also  
doesn't allow for having mixed meta data types for a single sequence,  
such as attaching array data and string data to the same sequence.

Hence, I was thinking of a having a simple, generic meta data type  
(Bio::Meta), one which could encompass simple strings  
(Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other  
structured type of data.  This could be used to annotate any  
PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,  
maybe in a collection (similar to AnnotationCollection).  I thought  
something like this may be of general use for any PrimarySeq  
(quality, structure), alignments like NEXUS and Stockholm,  
SeqFeatures where structure could be stored (tRNA or riboswitches), etc.

However, this also seems to fall into the category of sequence  
annotation.  So, would it be better to have a set of Bio::Annotation  
classes used for this purpose?

Flames and jibes welcome; I'm wearing my asbestos suit today....

chris


From stewarta at nmrc.navy.mil  Wed Dec 13 20:06:14 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 13 Dec 2006 20:06:14 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>

I am trying to StandAloneBlast->blastall an array or Bio::Seq  
objects.  The documentation claims that blastall can be passed a file  
name, a Bio::Seq object, or an array of Bio::Seq objects, while the  
usage suggests that a reference to an array of Bio::Seq objects is  
what must be passed to blastall.

(from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ 
Bio/Tools/Run/StandAloneBlast.html#POD5)
Usage:
	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
Bio::Seq objects
	$blast_report = $factory->blastall(\@seq_array);

Should this be...
$report = $factory->blastall(@seq_array);
or
$report = $factory->blastall(\@seq_array);
???

And if you are blastall'ing an array of Seq objects, then does  
blastall just return one big blast report or should I be expecting an  
array of blast reports?

I've tried $report = $factory->blastall(@seq_array); which seems to  
work ok, except that when I process the results, there are only  
results for the first Seq object in the array.


-Andrew

--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From arareko at campus.iztacala.unam.mx  Wed Dec 13 20:37:27 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 13 Dec 2006 19:37:27 -0600
Subject: [Bioperl-l] BioPerl page in Wikipedia
Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx>

Folks,

I've updated a little bit of the BioPerl page in the Wikipedia. I think 
it would be nice if we expand the article a little bit more since it's 
tagged as a "stub". Here's the link:

http://en.wikipedia.org/wiki/BioPerl

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Thu Dec 14 05:54:07 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Thu, 14 Dec 2006 11:54:07 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>

Hello,
I am new bioperl and I have been trying to run the examples available in
bptutorial.pl and other basic literature. I have installed the latest
release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to
retrieve the SwissProt and EMBL databases it gives me an error. With genbank
it seems to be fine. I wonder if the installation was not successful, as  I
would expect that these databases accesses were included in the modules of
BioPerl Core. In addition, I would like to ask whether to run Clustaw within
the setting of BioPerl I need to download and install it in the same
directory in which I have installed bioperl, or is it included in the module
of Bio::Align.
I am not sure whether this is the best place to ask these very basic
questions. If not, could anyone please refer me to the proper e mail
account?
Thank you very much in advance.

Luba Pardo MD, PhD


From bix at sendu.me.uk  Thu Dec 14 09:10:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:10:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
Message-ID: <45815B63.1020003@sendu.me.uk>

Andrew Stewart wrote:
> I am trying to StandAloneBlast->blastall an array or Bio::Seq  
> objects.  The documentation claims that blastall can be passed a file  
> name,

You're referring to 'In addition, sequence input may be in the form of 
either a Bio::Seq object or or an array of Bio::Seq objects'? I agree 
its not clear, but supplying a reference to an array is still supplying 
an array. Anyway, I'll clarify it.


In any case, the usage for the method is what you should pay attention to:

> Usage:
> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
> Bio::Seq objects
> 	$blast_report = $factory->blastall(\@seq_array);
> 
> Should this be...
> $report = $factory->blastall(@seq_array);
> or
> $report = $factory->blastall(\@seq_array);
> ???

It should be exactly what it says. A reference to the array.


> And if you are blastall'ing an array of Seq objects, then does  
> blastall just return one big blast report or should I be expecting an  
> array of blast reports?

Returns : Reference to a Blast object or BPlite object
            containing the blast report.

That means, just one big object, not an array.


From bix at sendu.me.uk  Thu Dec 14 09:42:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:42:18 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
Message-ID: <458162CA.5030803@sendu.me.uk>

Luba Pardo wrote:
> Hello, I am new bioperl and I have been trying to run the examples
> available in bptutorial.pl and other basic literature. I have
> installed the latest release of bioperl 1.5.2 in a usr/local/src
> directory. Any time I try to retrieve the SwissProt and EMBL
> databases it gives me an error.

What exactly are you trying? Paste some relevant code along with the
exact error message you get when running that code.


> I wonder if the installation was not successful, as  I would expect
> that these databases accesses were included in the modules of BioPerl
> Core.

They should work with just core installed.


  In addition, I would like to ask whether to run Clustaw within
> the setting of BioPerl I need to download and install it in the same 
> directory in which I have installed bioperl, or is it included in the
> module of Bio::Align.

The ClustalW module is in the bioperl-run package, so install that in
the same way you installed bioperl (core). The actual ClustalW program 
you need to download and install according to its own instructions. You 
let Bioperl know about where you installed ClustalW by eg. setting an 
environment variable.

See 
http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION
for details.


> I am not sure whether this is the best place to ask these very basic 
> questions. If not, could anyone please refer me to the proper e mail 
> account?

Its certainly the correct place, I hope we can resolve your problems.


From neetisomaiya at gmail.com  Thu Dec 14 03:02:37 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 14 Dec 2006 13:32:37 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
	<C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>

How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?

On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).  I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.out
Type: application/octet-stream
Size: 204960 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/1416cef5/attachment-0003.obj>

From stewarta at nmrc.navy.mil  Thu Dec 14 11:34:43 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 11:34:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <45815B63.1020003@sendu.me.uk>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>

Thanks for the reply, Sendu.

So I've tried passing a reference to an array of Seq objects with the  
following code...
	
	push @blast_run, $factory->blastall(\@query);  # where @query is an  
array of Bio::Seq objects

(In case you're wondering, I'm pushing the report into an array of  
reports because I'm running several instances of blastall with  
different parameters each time.)

....and it throws me the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/ 
common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ 
Bio/Tools/Run/StandAloneBlast.pm:557
STACK: main::run_blastall ./new_blast_script.pl:215
STACK: ./new_blast_script.pl:115
-----------------------------------------------------------

And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns...
757         my $status = system($commandstring);
758
759         $self->throw("$executable call crashed: $? $commandstring 
\n")
760           unless ($status==0) ;

So it looks like the system call isn't returning a happy $status.  At  
this point I'm pretty much stuck, though.  Blastall works just fine  
if I only send it a single Seq object.  Looking at _setinput, it  
appears a reference to an array of Seq objects should end up creating  
a multi-fasta file.  The only possibilities I can think of to explain  
this is...

- The -i file isn't be created for some reason when an (ref to) array  
of Seqs is passed
- There is something wrong with the -i file that is created and sent  
to blastall.
- Something else is wrong with the $commandstring being sent to the  
system call.

Does anyone see something here that I don't?


Thanks,
Andrew


On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:

> Andrew Stewart wrote:
>> I am trying to StandAloneBlast->blastall an array or Bio::Seq   
>> objects.  The documentation claims that blastall can be passed a  
>> file  name,
>
> You're referring to 'In addition, sequence input may be in the form  
> of either a Bio::Seq object or or an array of Bio::Seq objects'? I  
> agree its not clear, but supplying a reference to an array is still  
> supplying an array. Anyway, I'll clarify it.
>
>
> In any case, the usage for the method is what you should pay  
> attention to:
>
>> Usage:
>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of   
>> Bio::Seq objects
>> 	$blast_report = $factory->blastall(\@seq_array);
>> Should this be...
>> $report = $factory->blastall(@seq_array);
>> or
>> $report = $factory->blastall(\@seq_array);
>> ???
>
> It should be exactly what it says. A reference to the array.
>
>
>> And if you are blastall'ing an array of Seq objects, then does   
>> blastall just return one big blast report or should I be expecting  
>> an  array of blast reports?
>
> Returns : Reference to a Blast object or BPlite object
>            containing the blast report.
>
> That means, just one big object, not an array.


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Thu Dec 14 12:03:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 11:03:12 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>


On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?

The error pops up when the executable returns a bad status, so maybe  
it's choking on too many input sequences (i.e. Bioperl is doing  
everything correctly, but you are attempting to BLAST too many  
sequences in one go).  How many sequences are you attempting to use  
as input?  What happens when you use fewer input sequences?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 12:49:45 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 12:49:45 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>

> So can you look at the tempfile that is created and see if it is sane?
>
> Set -save_tempfiles => 1 whene you initialize the factory object or do
> $factory->save_tempfiles(1)
> before calling the blastall.
>
> -jason
>

Jason,
I was actually wondering how to do that.  Thanks.  Odd though, it  
still doesn't seem to be saving the tempfiles.  Might not matter  
though, because...

> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>

I was processing 738 sequences for input.  I cut that down to 20  
sequences and I'm getting some other exception thrown further  
downstream, so it appears you may be correct.  You don't happen to  
know what the max number of sequences that blastall allows for input,  
would ya? ;)  I suppose I'll have to break @query down into smaller  
doses or something.

Thanks,
Andrew


On Dec 14, 2006, at 12:03 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:
>
>> Thanks for the reply, Sendu.
>>
>> So I've tried passing a reference to an array of Seq objects with the
>> following code...
>> 	
>> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
>> array of Bio::Seq objects
>>
>> (In case you're wondering, I'm pushing the report into an array of
>> reports because I'm running several instances of blastall with
>> different parameters each time.)
>>
>> ....and it throws me the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  - 
>> d  "/
>> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
>> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ 
>> lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
>> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ 
>> perl5/5.8.6/
>> Bio/Tools/Run/StandAloneBlast.pm:557
>> STACK: main::run_blastall ./new_blast_script.pl:215
>> STACK: ./new_blast_script.pl:115
>> -----------------------------------------------------------
>>
>> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
>> returns...
>> 757         my $status = system($commandstring);
>> 758
>> 759         $self->throw("$executable call crashed: $? $commandstring
>> \n")
>> 760           unless ($status==0) ;
>>
>> So it looks like the system call isn't returning a happy $status.  At
>> this point I'm pretty much stuck, though.  Blastall works just fine
>> if I only send it a single Seq object.  Looking at _setinput, it
>> appears a reference to an array of Seq objects should end up creating
>> a multi-fasta file.  The only possibilities I can think of to explain
>> this is...
>>
>> - The -i file isn't be created for some reason when an (ref to) array
>> of Seqs is passed
>> - There is something wrong with the -i file that is created and sent
>> to blastall.
>> - Something else is wrong with the $commandstring being sent to the
>> system call.
>>
>> Does anyone see something here that I don't?
>
> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From Derek.Fairley at bll.n-i.nhs.uk  Thu Dec 14 12:58:10 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Thu, 14 Dec 2006 17:58:10 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>

Neeti,

 
>From http://emboss.sourceforge.net/apps/cvs/needle.html:

 
"The results can be output in one of several styles by using the
command-line qualifier -aformat xxx, where 'xxx' is replaced by the name
of the required format. Some of the alignment formats can cope with an
unlimited number of sequences, while others are only for pairs of
sequences. 

 
The available multiple alignment format names are: unknown, multiple,
simple, fasta, msf, trace, srs 

 
The available pairwise alignment format names are: pair, markx0, markx1,
markx2, markx3, markx10, srspair, score 

 
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
information on alignment formats."

 
Not sure based on this whether you can get pairwise alignment in .msf
format; can't think of a good reason why not. The BioPerl Align::IO
module will allow you to parse alignments in .msf format.

 
HTH,

 
Derek.

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

 
How do I run needle specifying that I want the MSF format, on a linux
box?

The help doesnt show me any format option. Is there anything available
to

pasre MSF format?

Please find an example alignment file attached. Here the seq_of_contig

aligns with the reference sequence (i.e. SEQ_1.REF) starting at position

(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from
the

output alignment, how can I parse the result to get this?

 
On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:

>

>

> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

>

> > Hi,

> >

> > Does anyone know of a bioperl parser for needle output, basically I

> > won't

> > where the target sequence aligns on the template (i.e. coordinate

> > on the

> > template where the taget aligns).

> >

> > --

> > -Neeti

> > Even my blood says, B positive

>

> I answered this a number of months back:

>

> http://tinyurl.com/yzlbx5

>

> Basically, newer versions of EMBOSS have changed the output for the

> AlignIO::emboss parser (which parses needle).  I don't believe the

> parser has been fixed to deal with that, but Jason has pointed out

> you can use MSF output when running needle, then parse using AlignIO

> with the format set to 'msf'.

>

> chris

>

 
-- 

-Neeti

Even my blood says, B positive


From cjfields at uiuc.edu  Thu Dec 14 13:36:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 12:36:09 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>


On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:

>> So can you look at the tempfile that is created and see if it is  
>> sane?
>>
>> Set -save_tempfiles => 1 whene you initialize the factory object  
>> or do
>> $factory->save_tempfiles(1)
>> before calling the blastall.
>>
>> -jason
>>
>
> Jason,
> I was actually wondering how to do that.  Thanks.  Odd though, it
> still doesn't seem to be saving the tempfiles.  Might not matter

That needs to be checked out.  Can anyone verify that?

>> The error pops up when the executable returns a bad status, so
>> maybe it's choking on too many input sequences (i.e. Bioperl is
>> doing everything correctly, but you are attempting to BLAST too
>> many sequences in one go).  How many sequences are you attempting
>> to use as input?  What happens when you use fewer input sequences?
>>
>> chris
>>
>
> I was processing 738 sequences for input.  I cut that down to 20
> sequences and I'm getting some other exception thrown further
> downstream, so it appears you may be correct.  You don't happen to
> know what the max number of sequences that blastall allows for input,
> would ya? ;)  I suppose I'll have to break @query down into smaller
> doses or something.
>
> Thanks,
> Andrew

It was a shot in the dark, really.  The fact that the return status  
was bad could be due to a number of problems (permissions issues, bad  
data, etc).  The fact that a single sequence worked indicated that  
permissions and output format likely weren't to blame.  The only  
other thing left was a problem with blastall itself.

BTW, the blast docs do not indicate whether there is a maximum number  
of sequences.  There may be a point where available memory becomes  
the limiting issue.

chris


From vaughn at cshl.edu  Thu Dec 14 14:09:34 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Thu, 14 Dec 2006 14:09:34 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking
Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>

Dear all,

I'm trying to bring some of my code into compliance with the BioPerl  
1.5.2 and am running into some design decisions that I am unclear on.  
Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking  
of the 'type' against SOFA? It seems to me that this should be  
optional behavior as is the case with the Bio::FeatureIO family. I'd  
be happy to write the patch if there is any agreement with me on this  
case.

Thanks,

Matt

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2413 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/59a9ac32/attachment-0003.bin>

From jason at bioperl.org  Thu Dec 14 11:59:20 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Dec 2006 11:59:20 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org>

So can you look at the tempfile that is created and see if it is sane?

Set -save_tempfiles => 1 whene you initialize the factory object or do
$factory->save_tempfiles(1)
before calling the blastall.

-jason
On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?
>
>
> Thanks,
> Andrew
>
>
>
> On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:
>
>> Andrew Stewart wrote:
>>> I am trying to StandAloneBlast->blastall an array or Bio::Seq
>>> objects.  The documentation claims that blastall can be passed a
>>> file  name,
>>
>> You're referring to 'In addition, sequence input may be in the form
>> of either a Bio::Seq object or or an array of Bio::Seq objects'? I
>> agree its not clear, but supplying a reference to an array is still
>> supplying an array. Anyway, I'll clarify it.
>>
>>
>> In any case, the usage for the method is what you should pay
>> attention to:
>>
>>> Usage:
>>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of
>>> Bio::Seq objects
>>> 	$blast_report = $factory->blastall(\@seq_array);
>>> Should this be...
>>> $report = $factory->blastall(@seq_array);
>>> or
>>> $report = $factory->blastall(\@seq_array);
>>> ???
>>
>> It should be exactly what it says. A reference to the array.
>>
>>
>>> And if you are blastall'ing an array of Seq objects, then does
>>> blastall just return one big blast report or should I be expecting
>>> an  array of blast reports?
>>
>> Returns : Reference to a Blast object or BPlite object
>>            containing the blast report.
>>
>> That means, just one big object, not an array.
>
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stewarta at nmrc.navy.mil  Thu Dec 14 16:23:07 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 16:23:07 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
	<97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
Message-ID: <E1CF879B-7A07-4CE7-A0D0-C7749ECFF8FC@nmrc.navy.mil>

> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris

Interesting.  I ran the 738-sequence dataset through blastall  
manually and the report only returned 198 of the 738 expected  
results.  Not only that, it seems to have just cut off right in the  
middle of the 198th result and a Segmentation fault was reported.   I  
removed the 198th sequence, wondering if it might be some issue with  
the input, and the segmentation fault occured again with the results  
ending on the 210th result.  I stuck the 198th sequence back in, but  
at the start of the file and sure enough the Segmentation error  
occurred earlier.  I think we can rule out the size of the input or  
number of sequences as the source of error here.  I'm more inclined  
to think it has something to do with the blast databases being  
queried against.

I found an old discussion on a problem that sounds fairly similar to  
this one, for anyone interested.
http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html

I think I'll try to work around the problem for now.

andrew


On Dec 14, 2006, at 1:36 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:
>
>>> So can you look at the tempfile that is created and see if it is  
>>> sane?
>>>
>>> Set -save_tempfiles => 1 whene you initialize the factory object  
>>> or do
>>> $factory->save_tempfiles(1)
>>> before calling the blastall.
>>>
>>> -jason
>>>
>>
>> Jason,
>> I was actually wondering how to do that.  Thanks.  Odd though, it
>> still doesn't seem to be saving the tempfiles.  Might not matter
>
> That needs to be checked out.  Can anyone verify that?
>
>>> The error pops up when the executable returns a bad status, so
>>> maybe it's choking on too many input sequences (i.e. Bioperl is
>>> doing everything correctly, but you are attempting to BLAST too
>>> many sequences in one go).  How many sequences are you attempting
>>> to use as input?  What happens when you use fewer input sequences?
>>>
>>> chris
>>>
>>
>> I was processing 738 sequences for input.  I cut that down to 20
>> sequences and I'm getting some other exception thrown further
>> downstream, so it appears you may be correct.  You don't happen to
>> know what the max number of sequences that blastall allows for input,
>> would ya? ;)  I suppose I'll have to break @query down into smaller
>> doses or something.
>>
>> Thanks,
>> Andrew
>
> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From lincoln.stein at gmail.com  Thu Dec 14 15:24:56 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 15:24:56 -0500
Subject: [Bioperl-l] Bio::Graphics xyplot
In-Reply-To: <4578951B.5050206@sfu.ca>
References: <4578951B.5050206@sfu.ca>
Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com>

Hi,

The way it works is that you create a single feature that spans the entire
range of the xyplot. It contains subfeatures, each of which has a score. The
graph points correspond to each of the subfeatures.

Lincoln

On 12/7/06, Keith Anthony Boroevich <kaboroev at sfu.ca> wrote:
>
> Hi everyone,
>
> I'm attempting to add an xyplot of the phred quality scores to an
> Bio::Graphics image, and cannot get it to work.
> I have the panel with a track for both the scale and the DNA displaying
> properly.  When I attempt to add the xyplot i just get a garbled track
> of, what looks like, timy xyplots for each datapoint.  I have the cvs
> (updated today) of bioperl-live running.  I think what I am missing is
> the creation of a "Sequence Feature Group" to hold the individual points
> of the plot.  However, I cannot seem to find such an object. This is
> what I attempted:
>
> -------BEGIN---CODE-----------
> # start panel
> my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
>                       -width     => $f_seqlen*10,
>                       -pad_left  => 10,
>                       -pad_right => 10,
>                       -grid      => 1
>                       );
> # add scale
> $panel->add_track(arrow =>
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
>               -double  => 1,
>               -tick    => 2,
>               -fgcolor => 'black');
> # add DNA ($feature is of type Bio::SeqFeature::Annotated)
> $panel->add_track(dna => $feature);
> # get list of quality scores from database
> my ($pqs_value) = $dbh->selectrow_array($sql);
> my @pqs_value = split(/\s/,$pqs_value);
> # create track
> my $track =  $panel->add_track(-glyph        => 'xyplot',
>                    -graph_type   => 'points',
>                    -point_symbol => 'point',
>                    -max_score    => 100,
>                    -min_score    => 0,
>                    -scale        => 'none');
> # add "subfeatures" to
> for (my $i=0;$i<$f_seqlen;$i++) {
>
>
> $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));
>
> }
> print $panel->png();
> $panel->finished;
> ------END---CODE----------
>
> I also attempted to create an array of the point features and passed
> that by reference to the panel "add_track" as it describes in the xyplot
> documentation, but that resulted in the exact same image.
>
> keith
>
> --
> ><)))?> -cGRASP- <?(((><
> Keith Anthony Boroevich
> Davidson Lab
> Dept of Molecular Biology
> Simon Fraser University
> Tel: 604-268-7276
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Thu Dec 14 17:15:07 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 17:15:07 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
Message-ID: <4581CCEB.20206@sendu.me.uk>

Matthew Vaughn wrote:
> Dear all,
> 
> I'm trying to bring some of my code into compliance with the BioPerl 
> 1.5.2 and am running into some design decisions that I am unclear on. 
> Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> the 'type' against SOFA? It seems to me that this should be optional 
> behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> write the patch if there is any agreement with me on this case.

Lots of people seem to have worked on it over the years, but perhaps 
Scott Cain is the person to talk to?

revision 1.4
date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
two things:
   * adding SOFA as an available ontology to DocumentRegistry.pm
   * modifying FeatureIO::gff to use SOFA to validate, and to parse 
Ontology_term


From lincoln.stein at gmail.com  Thu Dec 14 16:56:41 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 16:56:41 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>

Hi All,

I'm afraid that the xyplot glyph that is in the recent bioperl release has
an error that causes the content to be printed to the right of the correct
position. Unfortunately this wasn't caught before the release because the
glyph was only tested on very large (whole genome) features.

You will need to do a CVS update to get a fixed version from bioperl-live. A
future bugfix release of gbrowse will patch this glyph for you
automatically.

Lincoln

On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>
> Hi,
> I'm having a problem getting features and an xyplot properly aligned in
> Gbrowse.  For example, see this page:
>
> http://tinyurl.com/ylbq3q
>
> The feature in the "CENPK SNPs" track should actually be around the peak
> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP feature
> is at position 79, and the xyplot axes and data should span from 61 - 95.
> However, as you can see, the data in the xyplot are oddly separated from
> the axes (which seem to be in the correct place), with the data shifted over
> to about position 120-155.
> This occurs elsewhere, not just at the ends of the chromosomes.
>
> When I zoom to ~80 bp, all is well, see:
>
> http://tinyurl.com/yzav8k
>
> The relevant snippets from the GFF and the config files are below.
>
> Thanks!
> Kara
>
> GFF:
>
> chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> is 2.24506
> chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> is 3.26837
> chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> is 1.39938
> chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> is 1.4039
> chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> is 9.16134
> chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> is 10.1413
> chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> is 12.9256
> chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> is 13.195
> chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> is 22.7127
> chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> is 23.8289
> chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> is 21.9123
> chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> is 28.3344
> chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> is 35.0436
> chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> is 37.361
> chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> is 39.5408
> chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> is 28.2008
> chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> is 32.6254
> chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> is 36.0832
> chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> is 32.1205
> chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> is 41.3048
> chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> is 30.7975
> chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> is 29.4282
> chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> is 35.3586
> chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> is 34.1426
> chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> is 30.2966
> chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> is 17.8402
> chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> is 15.2637
> chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> is 12.657
> chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> is 10.2033
> chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> is 9.40143
> chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> is 6.56273
> chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> is 3.66211
> chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> is 0.394194
>
> CONFIG:
>
>
> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>
> [CENPK_all_scores_graph]
> feature = GRAPH_CENPK:SNPScanner
> glyph = xyplot
> graph_type = boxes
> fgcolor = purple
> bgcolor = purple
> height = 100
> min_score = 0
> max_score = 110
> label = 0
> key = CENPK prediction signal
> link =
> category = SNPs: signal graphs
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From dmessina at wustl.edu  Thu Dec 14 20:45:24 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 19:45:24 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>

Hey Chris,

My thoughts below.

> [Chris]
> This could be used to annotate any
> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
> maybe in a collection (similar to AnnotationCollection).  I thought
> something like this may be of general use for any PrimarySeq
> (quality, structure), alignments like NEXUS and Stockholm,
> SeqFeatures where structure could be stored (tRNA or riboswitches),  
> etc.
>
> However, this also seems to fall into the category of sequence
> annotation.  So, would it be better to have a set of Bio::Annotation
> classes used for this purpose?


To me, all meta data is equal. That is, your classic Genbank feature  
annotation and a user's arbitrary meta-tag like "Bob thinks this is a  
kinase domain" aren't different in kind even if they are different in  
content.

As resequencing projects multiply, the ability to create arbitrary  
meta tags, attach them to different types of objects, and use those  
tags to link them together will become desirable, if not essential.

Keeping a common interface to all of these meta data types would be  
advantageous, plus new users won't have to determine whether they  
need to use Bio::Meta objects or Bio::Annotation objects.

So I would argue for all of the meta data types to live "under one  
roof". Which roof isn't as important. Bio::Annotation, since it  
already exists for today's meta data, seems like a reasonable choice.  
(assuming Annotation objects are flexible enough to be extended as  
you propose)

There, and no flames or jibes even. :)

Dave


From cjfields at uiuc.edu  Thu Dec 14 21:21:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 20:21:10 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>


On Dec 14, 2006, at 7:45 PM, David Messina wrote:

> Hey Chris,
>
> My thoughts below.
>
>> [Chris]
>> This could be used to annotate any
>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>> maybe in a collection (similar to AnnotationCollection).  I thought
>> something like this may be of general use for any PrimarySeq
>> (quality, structure), alignments like NEXUS and Stockholm,
>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>> etc.
>>
>> However, this also seems to fall into the category of sequence
>> annotation.  So, would it be better to have a set of Bio::Annotation
>> classes used for this purpose?
>
>
> To me, all meta data is equal. That is, your classic Genbank feature
> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
> kinase domain" aren't different in kind even if they are different in
> content.
>
> As resequencing projects multiply, the ability to create arbitrary
> meta tags, attach them to different types of objects, and use those
> tags to link them together will become desirable, if not essential.
>
> Keeping a common interface to all of these meta data types would be
> advantageous, plus new users won't have to determine whether they
> need to use Bio::Meta objects or Bio::Annotation objects.
>
> So I would argue for all of the meta data types to live "under one
> roof". Which roof isn't as important. Bio::Annotation, since it
> already exists for today's meta data, seems like a reasonable choice.
> (assuming Annotation objects are flexible enough to be extended as
> you propose)
>
> There, and no flames or jibes even. :)

I guess what I want to know is whether there should to be a  
distinction between 'normal' sequence annotation (comments,  
references, and so on) and annotation that could be best described as  
position-specific (like RNA or protein structural annotation).  The  
current meta implementation is for sequence data only; I felt it  
would be nice to have a generic implementation that would be  
applicable to any object data.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu>

And it all seemed so clear to me when I wrote it. :)

> whether there should to be a distinction

I would argue no because it would contravene a s


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


From dmessina at wustl.edu  Thu Dec 14 21:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <E4629E7B-E42C-4B93-869F-FE26035052A0@wustl.edu>

[oops, accidentally hit send midsentence]


And it all seemed so clear to me when I wrote it. :)


> whether there should to be a distinction

I would argue no because it would contravene a standard interface.


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


Dave


From neetisomaiya at gmail.com  Fri Dec 15 00:21:42 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 10:51:42 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>

Hi,

Thanks a lot for your response.
I ran needle like this
 /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
get the alignment start and stop coordinates on the sequence. I mean
something like hsp->query->start which gives us the alignment start position
on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate
where the alignment starts on the sequence.

~Neeti.

On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
>  Neeti,
>
>
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html:
>
>
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
>
>
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
>
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
>
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
>
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
>
>
> HTH,
>
>
>
> Derek.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
>
>
> How do I run needle specifying that I want the MSF format, on a linux box?
>
> The help doesnt show me any format option. Is there anything available to
>
> pasre MSF format?
>
> Please find an example alignment file attached. Here the seq_of_contig
>
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
>
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
>
> output alignment, how can I parse the result to get this?
>
>
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
> >
>
> >
>
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> >
>
> > > Hi,
>
> > >
>
> > > Does anyone know of a bioperl parser for needle output, basically I
>
> > > won't
>
> > > where the target sequence aligns on the template (i.e. coordinate
>
> > > on the
>
> > > template where the taget aligns).
>
> > >
>
> > > --
>
> > > -Neeti
>
> > > Even my blood says, B positive
>
> >
>
> > I answered this a number of months back:
>
> >
>
> > http://tinyurl.com/yzlbx5
>
> >
>
> > Basically, newer versions of EMBOSS have changed the output for the
>
> > AlignIO::emboss parser (which parses needle).  I don't believe the
>
> > parser has been fixed to deal with that, but Jason has pointed out
>
> > you can use MSF output when running needle, then parse using AlignIO
>
> > with the format set to 'msf'.
>
> >
>
> > chris
>
> >
>
>
>
>
>
>
>
> --
>
> -Neeti
>
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive


From Derek.Fairley at bll.n-i.nhs.uk  Fri Dec 15 04:57:35 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Fri, 15 Dec 2006 09:57:35 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>

Neeti,

In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. 

Derek.


-----Original Message-----
From: neeti somaiya [mailto:neetisomaiya at gmail.com] 
Sent: 15 December 2006 05:22
To: Fairley, Derek; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

Hi,

Thanks a lot for your response.
I ran needle like this 
?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence.

~Neeti.
On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
Neeti,
?
>From http://emboss.sourceforge.net/apps/cvs/needle.html :
?
"The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. 
?
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs 
?
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score 
?
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats."
?
Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format.
?
HTH,
?
Derek.
?
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?
?
How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?
?
On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5 
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).? I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>
?
?
?
-- 
-Neeti
Even my blood says, B positive


-- 
-Neeti
Even my blood says, B positive 


From cain at cshl.edu  Fri Dec 15 00:01:36 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 15 Dec 2006 00:01:36 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <4581CCEB.20206@sendu.me.uk>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
Message-ID: <1166158897.2569.335.camel@localhost.localdomain>

As much as I would like to take credit for this :-)  Allen Day wrote the
original code, and then Chris Fields tried to fix it so that it actually
worked :-)  I think it would be a good idea to have a validate_terms
option like Bio::FeatureIO::gff.

Scott

On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote:
> Matthew Vaughn wrote:
> > Dear all,
> > 
> > I'm trying to bring some of my code into compliance with the BioPerl 
> > 1.5.2 and am running into some design decisions that I am unclear on. 
> > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> > the 'type' against SOFA? It seems to me that this should be optional 
> > behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> > write the patch if there is any agreement with me on this case.
> 
> Lots of people seem to have worked on it over the years, but perhaps 
> Scott Cain is the person to talk to?
> 
> revision 1.4
> date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
> two things:
>    * adding SOFA as an available ontology to DocumentRegistry.pm
>    * modifying FeatureIO::gff to use SOFA to validate, and to parse 
> Ontology_term
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/021ec42f/attachment-0003.bin>

From neetisomaiya at gmail.com  Fri Dec 15 07:46:08 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 18:16:08 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>

I ran needle like this

/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out

Please find the output attached.

When I run the following :-

use Bio::SearchIO;

my $io = Bio::SearchIO->new(-file   => "1.out",
                           -format => "fasta" );

while ( my $result = $io->next_result() )
{
       while( my $hit = $result->next_hit)
      {

               print "yes\n";
       }
}


It says :-

-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------

What should I do?

~Neeti.

On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
> Neeti,
>
> In lieu of a response from a BioPerl guru... why not use Needle to
> generate your pairwise alignment in fasta format, rather than msf format?
> The sequence you want should correspond to a single HSP which you can get
> directly from the fasta alignment with Bio::SearchIO:
> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use
> Bio::AlignIO at all.
>
> Derek.
>
>
> -----Original Message-----
> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
> Sent: 15 December 2006 05:22
> To: Fairley, Derek; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> Hi,
>
> Thanks a lot for your response.
> I ran needle like this
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
> It gave me the output in format msf.
> But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
> get the alignment start and stop coordinates on the sequence. I mean
> something like hsp->query->start which gives us the alignment start position
> on query sequence in a blast output when using Bio::SearchIO.
> Please help.
> Like I explained with an example in my previous mail, I want the
> coordinate where the alignment starts on the sequence.
>
> ~Neeti.
> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
> Neeti,
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
> HTH,
>
> Derek.
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> How do I run needle specifying that I want the MSF format, on a linux box?
> The help doesnt show me any format option. Is there anything available to
> pasre MSF format?
> Please find an example alignment file attached. Here the seq_of_contig
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
> output alignment, how can I parse the result to get this?
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
> >
> >
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > Does anyone know of a bioperl parser for needle output, basically I
> > > won't
> > > where the target sequence aligns on the template (i.e. coordinate
> > > on the
> > > template where the taget aligns).
> > >
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> >
> > I answered this a number of months back:
> >
> > http://tinyurl.com/yzlbx5
> >
> > Basically, newer versions of EMBOSS have changed the output for the
> > AlignIO::emboss parser (which parses needle). I don't believe the
> > parser has been fixed to deal with that, but Jason has pointed out
> > you can use MSF output when running needle, then parse using AlignIO
> > with the format set to 'msf'.
> >
> > chris
> >
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.out
Type: application/octet-stream
Size: 90277 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/34b05d03/attachment-0003.obj>

From jason at bioperl.org  Fri Dec 15 09:28:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:28:13 -0500
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>


On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>
>> Hey Chris,
>>
>> My thoughts below.
>>
>>> [Chris]
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> etc.
>>>
>>> However, this also seems to fall into the category of sequence
>>> annotation.  So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>>
>>
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> content.
>>
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>>
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>>
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>>
>> There, and no flames or jibes even. :)
>
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation).  The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.

my stream-of-consciousness for right now:

I was thinking Bio::Annotation is where this should go - that system  
doesn't have anything about it that makes it explicitly sequence  
related. What we're trying to hammer out here on the Alignment side -  
which fits with your RNA example - is have features, basically  
SeqFeatures - associated with alignments so columns can be annotated  
to cover things like character sets and partitions for phylogenetic  
analyses.  As for data which annotates non-contiguous things like  
RNAstems we may have  to be more creative about that or model it with  
a splitLocation.

So currently we've added code so that an Alignment is-a  
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
end, with the goal of being able to capture more of the data that can  
be represented in a NEXUS file.

It feels more like a hack than an elegant Meta-data solution, but I  
am totally sure whether the data you are thinking about doing at this  
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of  
the data into features or annotations is confusing users?


From jason at bioperl.org  Fri Dec 15 09:48:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:48:32 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
	<764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org>

I get the impression you are trying to use the wrong tool for the  
job.  Can you explain a little more generally what you want to do?

Semantically FASTA in Bio::SearchIO is much different from FASTA in  
Bio::AlignIO.  We explain this on the wiki, please have a look on the  
FASTA page.

  do not use Bio::SearchIO to parse multi-fasta alignment output  
Bio::SearchIO is for pairwise alignment reports
  use Bio::AlignIO for a multi-fasta format or for msf - you just  
provide a different field to '-format'.

But none of that is going to help you get start/end for your  
alignment because that is not part of the output format - do the  
experiment of looking at the file and figuring out what are the  
actual fields you want output, if they don't exist then you either  
have a format that won't work for your question, or you will have to  
calculate additional .  If you trying to align transcripts to genome  
please consider tools that are built for it (and referenced on the  
wiki like Sim4, est2genome, exonerate, BLAT).

-jason
On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote:

> I ran needle like this
>
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out
>
> Please find the output attached.
>
> When I run the following :-
>
> use Bio::SearchIO;
>
> my $io = Bio::SearchIO->new(-file   => "1.out",
>                           -format => "fasta" );
>
> while ( my $result = $io->next_result() )
> {
>       while( my $hit = $result->next_hit)
>      {
>
>               print "yes\n";
>       }
> }
>
>
> It says :-
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> What should I do?
>
> ~Neeti.
>
> On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>>
>> Neeti,
>>
>> In lieu of a response from a BioPerl guru... why not use Needle to
>> generate your pairwise alignment in fasta format, rather than msf  
>> format?
>> The sequence you want should correspond to a single HSP which you  
>> can get
>> directly from the fasta alignment with Bio::SearchIO:
>> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need  
>> to use
>> Bio::AlignIO at all.
>>
>> Derek.
>>
>>
>> -----Original Message-----
>> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
>> Sent: 15 December 2006 05:22
>> To: Fairley, Derek; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> Hi,
>>
>> Thanks a lot for your response.
>> I ran needle like this
>> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
>> It gave me the output in format msf.
>> But now my problem is, if I use Bio::AlignIO module of Bioperl,  
>> how can I
>> get the alignment start and stop coordinates on the sequence. I mean
>> something like hsp->query->start which gives us the alignment  
>> start position
>> on query sequence in a blast output when using Bio::SearchIO.
>> Please help.
>> Like I explained with an example in my previous mail, I want the
>> coordinate where the alignment starts on the sequence.
>>
>> ~Neeti.
>> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>> Neeti,
>>
>> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>>
>> "The results can be output in one of several styles by using the
>> command-line qualifier -aformat xxx, where 'xxx' is replaced by  
>> the name of
>> the required format. Some of the alignment formats can cope with an
>> unlimited number of sequences, while others are only for pairs of  
>> sequences.
>>
>> The available multiple alignment format names are: unknown, multiple,
>> simple, fasta, msf, trace, srs
>>
>> The available pairwise alignment format names are: pair, markx0,  
>> markx1,
>> markx2, markx3, markx10, srspair, score
>>
>> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
>> information on alignment formats."
>>
>> Not sure based on this whether you can get pairwise alignment in .msf
>> format; can't think of a good reason why not. The BioPerl  
>> Align::IO module
>> will allow you to parse alignments in .msf format.
>>
>> HTH,
>>
>> Derek.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:
>> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
>> Sent: 14 December 2006 08:03
>> To: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> How do I run needle specifying that I want the MSF format, on a  
>> linux box?
>> The help doesnt show me any format option. Is there anything  
>> available to
>> pasre MSF format?
>> Please find an example alignment file attached. Here the  
>> seq_of_contig
>> aligns with the reference sequence (i.e. SEQ_1.REF) starting at  
>> position
>> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate  
>> from the
>> output alignment, how can I parse the result to get this?
>>
>> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>> >
>> >
>> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone know of a bioperl parser for needle output,  
>> basically I
>> > > won't
>> > > where the target sequence aligns on the template (i.e. coordinate
>> > > on the
>> > > template where the taget aligns).
>> > >
>> > > --
>> > > -Neeti
>> > > Even my blood says, B positive
>> >
>> > I answered this a number of months back:
>> >
>> > http://tinyurl.com/yzlbx5
>> >
>> > Basically, newer versions of EMBOSS have changed the output for the
>> > AlignIO::emboss parser (which parses needle). I don't believe the
>> > parser has been fixed to deal with that, but Jason has pointed out
>> > you can use MSF output when running needle, then parse using  
>> AlignIO
>> > with the format set to 'msf'.
>> >
>> > chris
>> >
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive
> <1.out>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From lubapardo at gmail.com  Fri Dec 15 11:39:11 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 15 Dec 2006 17:39:11 +0100
Subject: [Bioperl-l] NO BLAST
Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>

*Hello,*
*I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
**
*I got the following error message: cannot find path to blastall.*
*The code I used is (modified from HOWTObeginners):
*

#! /local/bin/perl -w

#use strict;

use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use
Bio::Tools::Run::StandAloneBlast;

my $db_object = Bio::DB::GenBank-> new;

#my $seq_ob = $db_object->get_Seq_by_id('NM_004043');

#$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta');

#$seq ->write_seq($seq_ob);

#print $seq;

@params = (program =>'blastn',
   database =>'db.fa');

$blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params);


$seq_obj = Bio::Seq->new(-id =>"testquery",
   -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT");

$report_obj = $blast_obj->blastall($seq_obj);

$result_obj =$report_obj->next_result;

print $result_obj->num_hits;

*Whether I create a sequence the novo or retrieve one from internet I got
the same message.*


From cjfields at uiuc.edu  Fri Dec 15 12:23:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:23:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
Message-ID: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>


On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:

>
> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>
>>
>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>
>>> Hey Chris,
>>>
>>> My thoughts below.
>>>
>>>> [Chris]
>>>> This could be used to annotate any
>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- 
>>>> you,
>>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>>> something like this may be of general use for any PrimarySeq
>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>>> etc.
>>>>
>>>> However, this also seems to fall into the category of sequence
>>>> annotation.  So, would it be better to have a set of  
>>>> Bio::Annotation
>>>> classes used for this purpose?
>>>
>>>
>>> To me, all meta data is equal. That is, your classic Genbank feature
>>> annotation and a user's arbitrary meta-tag like "Bob thinks this  
>>> is a
>>> kinase domain" aren't different in kind even if they are  
>>> different in
>>> content.
>>>
>>> As resequencing projects multiply, the ability to create arbitrary
>>> meta tags, attach them to different types of objects, and use those
>>> tags to link them together will become desirable, if not essential.
>>>
>>> Keeping a common interface to all of these meta data types would be
>>> advantageous, plus new users won't have to determine whether they
>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>
>>> So I would argue for all of the meta data types to live "under one
>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>> already exists for today's meta data, seems like a reasonable  
>>> choice.
>>> (assuming Annotation objects are flexible enough to be extended as
>>> you propose)
>>>
>>> There, and no flames or jibes even. :)
>>
>> I guess what I want to know is whether there should to be a
>> distinction between 'normal' sequence annotation (comments,
>> references, and so on) and annotation that could be best described as
>> position-specific (like RNA or protein structural annotation).  The
>> current meta implementation is for sequence data only; I felt it
>> would be nice to have a generic implementation that would be
>> applicable to any object data.
>
> my stream-of-consciousness for right now:
>
> I was thinking Bio::Annotation is where this should go - that  
> system doesn't have anything about it that makes it explicitly  
> sequence related. What we're trying to hammer out here on the  
> Alignment side - which fits with your RNA example - is have  
> features, basically SeqFeatures - associated with alignments so  
> columns can be annotated to cover things like character sets and  
> partitions for phylogenetic analyses.  As for data which annotates  
> non-contiguous things like RNAstems we may have  to be more  
> creative about that or model it with a splitLocation.
>
> So currently we've added code so that an Alignment is-a  
> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
> end, with the goal of being able to capture more of the data that  
> can be represented in a NEXUS file.
>
> It feels more like a hack than an elegant Meta-data solution, but I  
> am totally sure whether the data you are thinking about doing at  
> this point, perhaps I need to spend more time thinking about it.
> Or are you worried about the idea of whether the semantic mapping  
> of the data into features or annotations is confusing users?

Sorry in advance for the longish response here...

My original thought was to have a generic abstract class capable of  
positionally describing data in any another class, similar to  
Heikki's Bio::Seq::MetaI but not constrained to sequence data only.   
Implementing classes would be capable of having different data  
structures based on their use (simple string, array, AoA, AoH, AoO).   
One MetaCollection class to contain them all in a tag-like system, so  
you could have mixed data types describe the same object.  The latter  
Collection class is so similar to AnnotationCollection that I agree  
Bio::Annotation would be the best place for this.

The way I reconfigured Stockholm alignment parsing/writing is to use  
Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is  
capable of holding a sequence and several meta strings, stored as  
tags or 'names'.  However, there is no Meta object for alignments  
(for RNA/protein structure consensus and other Rfam/Pfam markup); I  
hacked around this by using a Bio::Seq::Meta w/o a seq, but I would  
rather have a generic Meta object independent of the sequence cruft.

So for this partial Pfam alignment,

Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
#=GR Q92SV1_RHIME/122-299 pAS .........................
Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
#=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
#=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
#=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
#=GC SA_cons                 03002200312...1312414..676
#=GC seq_cons                luhhLuhsRpl...hthppth..+pG
//

'#=GC' lines would be in generic meta string objects in the  
alignment, while '#=GR' tags would be in similar meta objects in the  
relevant sequences.  As long as both aren't AnnotatableI this isn't  
an issue.

Similarly, NEXUS files which contained any position-based values  
could hold a meta string/array object in a similar tag.

The basic scheme is:

                     |--String
                     |
Annotation::Meta----|--Array
                     |
                     |--HorriblyComplexDataStruct

Then I started thinking about where this could be applied, and  
whether a true Meta object needs to be constrained only to describing  
position-based data.  This somewhat relates to this bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1825

which seems to need a simple but unconstrained hash-of-arrays-based  
meta object.

Then my head appropriately exploded...

Hope everything is going well at the hackathon!  Looks like some  
interesting stuff coming out of it.

chris


From cjfields at uiuc.edu  Fri Dec 15 12:49:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:49:45 -0600
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>

On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:

> As much as I would like to take credit for this :-)  Allen Day  
> wrote the
> original code, and then Chris Fields tried to fix it so that it  
> actually
> worked :-)  I think it would be a good idea to have a validate_terms
> option like Bio::FeatureIO::gff.
>
> Scott

I did ?!?  I committed a bug fix a while back:

Revision 1.34 / (view) - annotate - [select for diffs] ,
Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
Branch: MAIN
CVS Tags: branch-experimental
Branch point for: branch-1-5-2
Changes since 1.33: +155 -33 lines
Diff to previous 1.33

Bug 2026; Robert's enhancements

To tell the truth I don't know if this is where the mandatory checks  
were added in; I'm not too familiar with SeqFeature::Annotation yet.

I agree with Scott (and Matthew) that SOFA checks should be  
optional.  Matthew, can you write up a patch and maybe some tests?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 18:30:11 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 18:30:11 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>

I'm getting the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ 
SearchIO/blast.pm:1172
STACK: main::process_reports ./new_blast_script.pl:254
STACK: ./new_blast_script.pl:132
-----------------------------------------------------------


next_result is a pretty dense chunk of code to decipher.  I was  
wondering if anyone more familiar with that code might know what the  
"no data for midline $_" exception is referring to?

For context:

    1161                if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ 
(\-?\d+)/ ) {
    1162                    my ($full,$type,$start,$str,$end) = ($1, 
$2,$3,$4,$5);
    1163                    if( $str eq '-' ) {
    1164                        $i = 3 if $type eq 'Sbjct';
    1165                    } else {
    1166                        $data{$type} = $str;
    1167                    }
    1168                    $len = length($full);
    1169                    $self->{"\_$type"}->{'begin'} = $start  
unless $self->{"_$type"}->{'begin'};
    1170                    $self->{"\_$type"}->{'end'} = $end;
    1171                } else {
    1172                    $self->throw("no data for midline $_")
    1173                        unless (defined $_ && defined $len);
    1174                    $data{'Mid'} = substr($_,$len);
    1175                }


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason at bioperl.org  Fri Dec 15 13:56:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 13:56:13 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
Message-ID: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>

It means it is expecting alignment block of data and there is none  
(or there is none in the context it is expecting it) - so something  
is wrong with the report as it gets tripped up.

I'm not sure reading the code is going to help you - what someone  
will have to do is figure out what is different about this report  
than reports that do work for the parser.
You'll do better if you just provide an example report that is  
failing as a bug report.

Providing the version of BLAST you are using and version of bioperl  
will help.  I seem to remember NCBI changing the BLAST text format so  
that will break the parser if it is a significant change.

As has been mentioned in the past, this playing cat and mouse with  
format changes means things will periodically break. If you need rock- 
solid always going to work, I guess the XML is better route to go.

-jason
On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:

> I'm getting the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
> SearchIO/blast.pm:1172
> STACK: main::process_reports ./new_blast_script.pl:254
> STACK: ./new_blast_script.pl:132
> -----------------------------------------------------------
>
>
> next_result is a pretty dense chunk of code to decipher.  I was
> wondering if anyone more familiar with that code might know what the
> "no data for midline $_" exception is referring to?
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Dec 15 14:21:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 13:21:32 -0600
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
	<B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu>


On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote:

> It means it is expecting alignment block of data and there is none
> (or there is none in the context it is expecting it) - so something
> is wrong with the report as it gets tripped up.
>
> I'm not sure reading the code is going to help you - what someone
> will have to do is figure out what is different about this report
> than reports that do work for the parser.
> You'll do better if you just provide an example report that is
> failing as a bug report.
>
> Providing the version of BLAST you are using and version of bioperl
> will help.  I seem to remember NCBI changing the BLAST text format so
> that will break the parser if it is a significant change.
>
> As has been mentioned in the past, this playing cat and mouse with
> format changes means things will periodically break. If you need rock-
> solid always going to work, I guess the XML is better route to go.
>
> -jason

I agree that XML is the only reliable way to go, though I have been  
reading on the BioPython group about some issues with newer (2.2.13  
or greater) BLAST XML output when reports with multiple BLAST  
queries.  Don't know if this affects Bioperl or not.

As for the 'midline' error, there was a similar error a while back  
(fixed for the 1.5.2 release) that had to do with extra lines in the  
alignment section in some BLAST reports.  Unless we have a demo BLAST  
report and sample code we can't do much about it (we need to  
reproduce the error in order to fix it), so the best thing to do it  
file a bug report.

chris

> On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:
>
>> I'm getting the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
>> SearchIO/blast.pm:1172
>> STACK: main::process_reports ./new_blast_script.pl:254
>> STACK: ./new_blast_script.pl:132
>> -----------------------------------------------------------
>>
>>
>> next_result is a pretty dense chunk of code to decipher.  I was
>> wondering if anyone more familiar with that code might know what the
>> "no data for midline $_" exception is referring to?
>>
>>
>> --
>> Andrew Stewart
>> Research Assistant, Genomics Team
>> Navy Medical Research Center (NMRC)
>> Biological Defense Research Directorate (BDRD)
>> BDRD Annex
>> 12300 Washington Avenue, 2nd Floor
>> Rockville, MD 20852
>>
>> email: stewarta at nmrc.navy.mil
>> phone: 301-231-6700 Ext 270
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From vaughn at cshl.edu  Fri Dec 15 13:05:47 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Fri, 15 Dec 2006 13:05:47 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <ed625e0e0612151005o2641f019ndb5cf0ac6582e2d6@mail.gmail.com>

Yes, I will. I am working on it today. It's a little more complicated
to fix this than I expected because SeqFeature::Annotation->type()
returns a Bio::AnnotationI rather than a simple scalar like it used
to.

On 12/15/06, Chris Fields <cjfields at uiuc.edu> wrote:
> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:
>
> > As much as I would like to take credit for this :-)  Allen Day
> > wrote the
> > original code, and then Chris Fields tried to fix it so that it
> > actually
> > worked :-)  I think it would be a good idea to have a validate_terms
> > option like Bio::FeatureIO::gff.
> >
> > Scott
>
> I did ?!?  I committed a bug fix a while back:
>
> Revision 1.34 / (view) - annotate - [select for diffs] ,
> Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
> Branch: MAIN
> CVS Tags: branch-experimental
> Branch point for: branch-1-5-2
> Changes since 1.33: +155 -33 lines
> Diff to previous 1.33
>
> Bug 2026; Robert's enhancements
>
> To tell the truth I don't know if this is where the mandatory checks
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
>
> I agree with Scott (and Matthew) that SOFA checks should be
> optional.  Matthew, can you write up a patch and maybe some tests?
>
> chris
>
>
>
>


From valiente at lsi.upc.edu  Fri Dec 15 19:45:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Sat, 16 Dec 2006 01:45:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577EFD3.7090904@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>

> I don't think that can be true. Your error message contains 'Must  
> supply
> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>
> If you uninstall the fink installation and install 1.5.2 using cpan  
> (with root privileges by going sudo cpan) that should at least get  
> rid of the error messages...
>
>
>> The tree is not correct (I've parsed it from R to have a double
>> check) but don't know yet what the problem is with it.
>
> ... But if the tree is wrong anyway... Let me know what you find out.

I've uninstalled the fink installation and used the cvs instead, and  
the error message is gone. However, on a larger set of 190 species,  
which are all present in the NCBI taxonomy, the resulting tree has  
only 178 taxa. I suspect, something must be wrong with the  
merge_lineage method in the major rewrite of the taxonomy2tree  
script. Can someone please check this? I'm attaching the 190 species  
call to the script. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061216/5e392593/attachment-0003.obj>

From lincoln.stein at gmail.com  Fri Dec 15 11:02:27 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Fri, 15 Dec 2006 11:02:27 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>

This is very embarassing for me, particularly since I spent a lot of time
validating that Bio::Graphics was working properly before the 1.5.2 release
went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

Lincoln

On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>
> Hi All,
>
> I'm afraid that the xyplot glyph that is in the recent bioperl release has
> an error that causes the content to be printed to the right of the correct
> position. Unfortunately this wasn't caught before the release because the
> glyph was only tested on very large (whole genome) features.
>
> You will need to do a CVS update to get a fixed version from bioperl-live.
> A future bugfix release of gbrowse will patch this glyph for you
> automatically.
>
> Lincoln
>
> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
> >
> > Hi,
> > I'm having a problem getting features and an xyplot properly aligned in
> > Gbrowse.  For example, see this page:
> >
> > http://tinyurl.com/ylbq3q
> >
> > The feature in the "CENPK SNPs" track should actually be around the peak
> > of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
> > feature is at position 79, and the xyplot axes and data should span from
> > 61 - 95.  However, as you can see, the data in the xyplot are oddly
> > separated from the axes (which seem to be in the correct place), with the
> > data shifted over to about position 120-155.
> > This occurs elsewhere, not just at the ends of the chromosomes.
> >
> > When I zoom to ~80 bp, all is well, see:
> >
> > http://tinyurl.com/yzav8k
> >
> > The relevant snippets from the GFF and the config files are below.
> >
> > Thanks!
> > Kara
> >
> > GFF:
> >
> > chrI SNPScanner
> > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> > is 2.24506
> > chrI SNPScanner
> > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> > is 3.26837
> > chrI SNPScanner
> > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> > is 1.39938
> > chrI SNPScanner
> > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> > is 1.4039
> > chrI SNPScanner
> > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> > is 9.16134
> > chrI SNPScanner
> > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> > is 10.1413
> > chrI SNPScanner
> > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> > is 12.9256
> > chrI SNPScanner
> > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> > is 13.195
> > chrI SNPScanner
> > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> > is 22.7127
> > chrI SNPScanner
> > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> > is 23.8289
> > chrI SNPScanner
> > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> > is 21.9123
> > chrI SNPScanner
> > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> > is 28.3344
> > chrI SNPScanner
> > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> > is 35.0436
> > chrI SNPScanner
> > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> > is 37.361
> > chrI SNPScanner
> > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> > is 39.5408
> > chrI SNPScanner
> > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> > is 28.2008
> > chrI SNPScanner
> > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> > is 32.6254
> > chrI SNPScanner
> > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> > is 36.0832
> > chrI SNPScanner
> > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> > is 32.1205
> > chrI SNPScanner
> > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> > is 41.3048
> > chrI SNPScanner
> > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> > is 30.7975
> > chrI SNPScanner
> > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> > is 29.4282
> > chrI SNPScanner
> > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> > is 35.3586
> > chrI SNPScanner
> > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> > is 34.1426
> > chrI SNPScanner
> > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> > is 30.2966
> > chrI SNPScanner
> > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> > is 17.8402
> > chrI SNPScanner
> > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> > is 15.2637
> > chrI SNPScanner
> > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> > is 12.657
> > chrI SNPScanner
> > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> > is 10.2033
> > chrI SNPScanner
> > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> > is 9.40143
> > chrI SNPScanner
> > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> > is 6.56273
> > chrI SNPScanner
> > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> > is 3.66211
> > chrI SNPScanner
> > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> > is 0.394194
> >
> > CONFIG:
> >
> >
> > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
> >
> > [CENPK_all_scores_graph]
> > feature = GRAPH_CENPK:SNPScanner
> > glyph = xyplot
> > graph_type = boxes
> > fgcolor = purple
> > bgcolor = purple
> > height = 100
> > min_score = 0
> > max_score = 110
> > label = 0
> > key = CENPK prediction signal
> > link =
> > category = SNPs: signal graphs
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> >
> > _______________________________________________
> > Gmod-gbrowse mailing list
> > Gmod-gbrowse at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Sat Dec 16 01:10:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:10:07 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu>

We could feasibly have regular point releases of the 1.5 dev. series  
for bug fixes; I guess it just depends on how often these should come  
out and what critical tests must pass for a release to go forward.   
Sendu's already done a ton of work towards getting BioPerl switched  
over to Module::Build and Test::More, and fixing bugs.  As Hilmar has  
pointed out in the past, this is a developer's series, so not every  
test needs to pass before a release goes out.

When would you like this to go out?

chris

On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote:

> This is very embarassing for me, particularly since I spent a lot  
> of time
> validating that Bio::Graphics was working properly before the 1.5.2  
> release
> went out. How long before there is a 1.5.3 release? How about a  
> 1.5.2.1release?
>
> Lincoln
>
> On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>>
>> Hi All,
>>
>> I'm afraid that the xyplot glyph that is in the recent bioperl  
>> release has
>> an error that causes the content to be printed to the right of the  
>> correct
>> position. Unfortunately this wasn't caught before the release  
>> because the
>> glyph was only tested on very large (whole genome) features.
>>
>> You will need to do a CVS update to get a fixed version from  
>> bioperl-live.
>> A future bugfix release of gbrowse will patch this glyph for you
>> automatically.
>>
>> Lincoln
>>
>> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>>>
>>> Hi,
>>> I'm having a problem getting features and an xyplot properly  
>>> aligned in
>>> Gbrowse.  For example, see this page:
>>>
>>> http://tinyurl.com/ylbq3q
>>>
>>> The feature in the "CENPK SNPs" track should actually be around  
>>> the peak
>>> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
>>> feature is at position 79, and the xyplot axes and data should  
>>> span from
>>> 61 - 95.  However, as you can see, the data in the xyplot are oddly
>>> separated from the axes (which seem to be in the correct place),  
>>> with the
>>> data shifted over to about position 120-155.
>>> This occurs elsewhere, not just at the ends of the chromosomes.
>>>
>>> When I zoom to ~80 bp, all is well, see:
>>>
>>> http://tinyurl.com/yzav8k
>>>
>>> The relevant snippets from the GFF and the config files are below.
>>>
>>> Thanks!
>>> Kara
>>>
>>> GFF:
>>>
>>> chrI SNPScanner
>>> CENPK_GRAPH 61 95 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_CALL 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 61 61 2.24506 . .  
>>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
>>> is 2.24506
>>> chrI SNPScanner
>>> CENPK_SCORE 62 62 3.26837 . .  
>>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
>>> is 3.26837
>>> chrI SNPScanner
>>> CENPK_SCORE 63 63 1.39938 . .  
>>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
>>> is 1.39938
>>> chrI SNPScanner
>>> CENPK_SCORE 64 64 1.4039 . .  
>>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
>>> is 1.4039
>>> chrI SNPScanner
>>> CENPK_SCORE 65 65 9.16134 . .  
>>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
>>> is 9.16134
>>> chrI SNPScanner
>>> CENPK_SCORE 66 66 10.1413 . .  
>>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
>>> is 10.1413
>>> chrI SNPScanner
>>> CENPK_SCORE 67 67 12.9256 . .  
>>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
>>> is 12.9256
>>> chrI SNPScanner
>>> CENPK_SCORE 68 68 13.195 . .  
>>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
>>> is 13.195
>>> chrI SNPScanner
>>> CENPK_SCORE 69 69 22.7127 . .  
>>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
>>> is 22.7127
>>> chrI SNPScanner
>>> CENPK_SCORE 70 70 23.8289 . .  
>>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
>>> is 23.8289
>>> chrI SNPScanner
>>> CENPK_SCORE 71 71 21.9123 . .  
>>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
>>> is 21.9123
>>> chrI SNPScanner
>>> CENPK_SCORE 72 72 28.3344 . .  
>>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
>>> is 28.3344
>>> chrI SNPScanner
>>> CENPK_SCORE 73 73 35.0436 . .  
>>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
>>> is 35.0436
>>> chrI SNPScanner
>>> CENPK_SCORE 74 74 37.361 . .  
>>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
>>> is 37.361
>>> chrI SNPScanner
>>> CENPK_SCORE 75 75 39.5408 . .  
>>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
>>> is 39.5408
>>> chrI SNPScanner
>>> CENPK_SCORE 76 76 28.2008 . .  
>>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
>>> is 28.2008
>>> chrI SNPScanner
>>> CENPK_SCORE 77 77 32.6254 . .  
>>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
>>> is 32.6254
>>> chrI SNPScanner
>>> CENPK_SCORE 78 78 36.0832 . .  
>>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
>>> is 36.0832
>>> chrI SNPScanner
>>> CENPK_SCORE 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 80 80 32.1205 . .  
>>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
>>> is 32.1205
>>> chrI SNPScanner
>>> CENPK_SCORE 81 81 41.3048 . .  
>>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
>>> is 41.3048
>>> chrI SNPScanner
>>> CENPK_SCORE 82 82 30.7975 . .  
>>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
>>> is 30.7975
>>> chrI SNPScanner
>>> CENPK_SCORE 83 83 29.4282 . .  
>>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
>>> is 29.4282
>>> chrI SNPScanner
>>> CENPK_SCORE 84 84 35.3586 . .  
>>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
>>> is 35.3586
>>> chrI SNPScanner
>>> CENPK_SCORE 85 85 34.1426 . .  
>>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
>>> is 34.1426
>>> chrI SNPScanner
>>> CENPK_SCORE 86 86 30.2966 . .  
>>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
>>> is 30.2966
>>> chrI SNPScanner
>>> CENPK_SCORE 87 87 17.8402 . .  
>>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
>>> is 17.8402
>>> chrI SNPScanner
>>> CENPK_SCORE 88 88 15.2637 . .  
>>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
>>> is 15.2637
>>> chrI SNPScanner
>>> CENPK_SCORE 89 89 12.657 . .  
>>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
>>> is 12.657
>>> chrI SNPScanner
>>> CENPK_SCORE 90 90 10.2033 . .  
>>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
>>> is 10.2033
>>> chrI SNPScanner
>>> CENPK_SCORE 91 91 9.40143 . .  
>>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
>>> is 9.40143
>>> chrI SNPScanner
>>> CENPK_SCORE 92 92 6.56273 . .  
>>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
>>> is 6.56273
>>> chrI SNPScanner
>>> CENPK_SCORE 93 93 3.66211 . .  
>>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
>>> is 3.66211
>>> chrI SNPScanner
>>> CENPK_SCORE 94 94 0.394194 . .  
>>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
>>> is 0.394194
>>>
>>> CONFIG:
>>>
>>>
>>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>>>
>>> [CENPK_all_scores_graph]
>>> feature = GRAPH_CENPK:SNPScanner
>>> glyph = xyplot
>>> graph_type = boxes
>>> fgcolor = purple
>>> bgcolor = purple
>>> height = 100
>>> min_score = 0
>>> max_score = 110
>>> label = 0
>>> key = CENPK prediction signal
>>> link =
>>> category = SNPs: signal graphs
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -----
>>> Take Surveys. Earn Cash. Influence the Future of IT
>>> Join SourceForge.net's Techsay panel and you'll get the chance to  
>>> share
>>> your
>>> opinions on IT & business topics through brief surveys - and earn  
>>> cash
>>> http://www.techsay.com/default.php? 
>>> page=join.php&p=sourceforge&CID=DEVDEV
>>>
>>>
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>>
>> --
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Dec 16 01:28:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:28:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>


On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:

>> I don't think that can be true. Your error message contains 'Must  
>> supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using  
>> cpan (with root privileges by going sudo cpan) that should at  
>> least get rid of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
>
> I've uninstalled the fink installation and used the cvs instead,  
> and the error message is gone. However, on a larger set of 190  
> species, which are all present in the NCBI taxonomy, the resulting  
> tree has only 178 taxa. I suspect, something must be wrong with the  
> merge_lineage method in the major rewrite of the taxonomy2tree  
> script. Can someone please check this? I'm attaching the 190  
> species call to the script. Thanks,
>
> Gabriel

I can confirm that.  It is definitely dropping them in merge_lineage 
(); if you add a call to get_leaf_nodes to check how many are present  
after each merge_lineage() call, you can see it dropping nodes along  
the trace.

in taxonomy2tree.pl:

my $ct;
my ($treect, $mergect) = 0;
for my $name (@species) {
   my $ncbi_id = $db->get_taxonid($name);
   if ($ncbi_id) {
     #print "Species: $name\n\tTaxID: $ncbi_id\n";
     #$ids{$ncbi_id}++;
     my $node = $db->get_taxon(-taxonid => $ncbi_id);

     if ($tree) {
       $tree->merge_lineage($node);

     }
     else {
       $tree = Bio::Tree::Tree->new(-node => $node);
     }
     printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes));
   }
   else {
     warn "no NCBI Taxonomy node for species ",$name,"\n";
   }
   $ct++;
}

chris


From bix at sendu.me.uk  Sat Dec 16 09:37:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:37:49 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <458404BD.8030908@sendu.me.uk>

Lincoln Stein wrote:
> This is very embarassing for me, particularly since I spent a lot of time
> validating that Bio::Graphics was working properly before the 1.5.2 release
> went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

I'm happy to try a point release for critical bug fixes. Why don't you 
commit the necessary fixes to branch-1-5-2 and let me know when you're 
happy, and I'll do 1.5.2.1.


Cheers,
Sendu.


From bix at sendu.me.uk  Sat Dec 16 09:47:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:47:57 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <4584071D.3070005@sendu.me.uk>

Gabriel Valiente wrote:
>> I don't think that can be true. Your error message contains 'Must supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using cpan 
>> (with root privileges by going sudo cpan) that should at least get rid 
>> of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
> 
> I've uninstalled the fink installation and used the cvs instead, and the 
> error message is gone. However, on a larger set of 190 species, which 
> are all present in the NCBI taxonomy, the resulting tree has only 178 
> taxa. I suspect, something must be wrong with the merge_lineage method 
> in the major rewrite of the taxonomy2tree script. Can someone please 
> check this? I'm attaching the 190 species call to the script. Thanks,

Ok, I'll look into it. You're also welcome to see if you can take your 
own code from your original taxonomy2tree script and see if you can 
merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with 
your algorithms to get it working correctly. Indeed, does your original 
version of the script work on this data set?


Cheers,
Sendu.


From cjfields at uiuc.edu  Sat Dec 16 10:18:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 09:18:50 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4584071D.3070005@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<4584071D.3070005@sendu.me.uk>
Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu>


On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>>> I don't think that can be true. Your error message contains 'Must  
>>> supply
>>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>>
>>> If you uninstall the fink installation and install 1.5.2 using cpan
>>> (with root privileges by going sudo cpan) that should at least  
>>> get rid
>>> of the error messages...
>>>
>>>
>>>> The tree is not correct (I've parsed it from R to have a double
>>>> check) but don't know yet what the problem is with it.
>>>
>>> ... But if the tree is wrong anyway... Let me know what you find  
>>> out.
>>
>> I've uninstalled the fink installation and used the cvs instead,  
>> and the
>> error message is gone. However, on a larger set of 190 species, which
>> are all present in the NCBI taxonomy, the resulting tree has only 178
>> taxa. I suspect, something must be wrong with the merge_lineage  
>> method
>> in the major rewrite of the taxonomy2tree script. Can someone please
>> check this? I'm attaching the 190 species call to the script. Thanks,
>
> Ok, I'll look into it. You're also welcome to see if you can take your
> own code from your original taxonomy2tree script and see if you can
> merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with
> your algorithms to get it working correctly. Indeed, does your  
> original
> version of the script work on this data set?
>
>
> Cheers,
> Sendu.

Sendu,

Don't know if it helps, but when I tried Gabriel's shell script last  
night I ran a modification of taxonomy2tree to see what would pop  
up.  Everything is fine up to about 100 iterations, then merge_lineage 
() starts dropping leaf nodes.

chris 
  

From bix at sendu.me.uk  Sat Dec 16 10:33:35 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 15:33:35 +0000
Subject: [Bioperl-l] NO BLAST
In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
Message-ID: <458411CF.8000707@sendu.me.uk>

Luba Pardo wrote:
> *Hello,*
> *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
> **
> *I got the following error message: cannot find path to blastall.*
> *The code I used is (modified from HOWTObeginners):

Bioperl doesn't know where you installed blast. If you've actually 
installed it, you can set the environment variable BLASTDIR to point to 
the directory that contains the blastall executable.


From cain.cshl at gmail.com  Fri Dec 15 13:09:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 15 Dec 2006 13:09:48 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and
	mandatory	type	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <1166206188.2569.380.camel@localhost.localdomain>

On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote:
> 
> To tell the truth I don't know if this is where the mandatory checks  
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
> 
> I agree with Scott (and Matthew) that SOFA checks should be  
> optional.  Matthew, can you write up a patch and maybe some tests?
> 
> chris
> 
That's not where they were added in, it just that they hadn't been fully
implemented before then, so they didn't work (which probably meant they
weren't mandatory, though I don't remember (it could be that it just
croaked)).

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/b248a096/attachment-0003.bin>

From hlapp at gmx.net  Sun Dec 17 01:02:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 17 Dec 2006 01:02:04 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <458404BD.8030908@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>


On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:

> Lincoln Stein wrote:
>> This is very embarassing for me, particularly since I spent a lot  
>> of time
>> validating that Bio::Graphics was working properly before the  
>> 1.5.2 release
>> went out. How long before there is a 1.5.3 release? How about a  
>> 1.5.2.1release?
>
> I'm happy to try a point release for critical bug fixes. Why don't you
> commit the necessary fixes to branch-1-5-2 and let me know when you're
> happy, and I'll do 1.5.2.1.

Feel free to do that, but why not make a 1.5.3 off the main trunk?  
1.5.2.1 may be adding more to the version confusion (developer/stable/ 
point-release/etc) than it is worth, and there is no shame in  
releasing new developer versions every few weeks.

My $0.02 ...

	-hilmar


>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From fgarret at ub.edu  Mon Dec 18 07:07:02 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 13:07:02 +0100
Subject: [Bioperl-l] codeml
Message-ID: <45868466.508@ub.edu>

Hi all,

I've been using bioperl's PAML module (specifically the codeml part) but 
with just one tree.

Since the program accepts several trees as input (and runs the analysis 
for each tree outputting the difference in likelihoods for each one) I 
was wondering if there's some way to do it through bioperl?

thanks in adv,
FG


From heikki at sanbi.ac.za  Mon Dec 18 08:51:50 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 18 Dec 2006 15:51:50 +0200
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
Message-ID: <200612181551.51277.heikki@sanbi.ac.za>


Reading the discussion, I think it is time to draw some guidelines.

1. Base the Meta implementation to a real use cases.

   MSA is a good example.

2. Allow generalisations

   If you can see an other implementation of the same idea that can be merged 
   with the first do it but do not hurt yourself if you can not.


The most difficult question is how to separate case-specific attributes that 
are best implemented by subclassing with additional methods from truly widely 
variable meta data that is best done as a parallel track meta information 
holding class.

The problem I see with undefined, totally open meta annotation, is that if you 
can put anything in there, it is also totally confusing to a user. If you can 
put anything in, how do you know what to get get out and know that it is 
there?

That leads to the the third guideline:

3. Use separate meta classes only when there are several different ways of 
encoding data that is present in large numbers *and* when you are expecting 
to be assessing the data computationally rather than just checking if an 
attribute is there. 


	-Heikki


On Friday 15 December 2006 19:23, Chris Fields wrote:
> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
> > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
> >>> Hey Chris,
> >>>
> >>> My thoughts below.
> >>>
> >>>> [Chris]
> >>>> This could be used to annotate any
> >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
> >>>> you,
> >>>> maybe in a collection (similar to AnnotationCollection).  I thought
> >>>> something like this may be of general use for any PrimarySeq
> >>>> (quality, structure), alignments like NEXUS and Stockholm,
> >>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
> >>>> etc.
> >>>>
> >>>> However, this also seems to fall into the category of sequence
> >>>> annotation.  So, would it be better to have a set of
> >>>> Bio::Annotation
> >>>> classes used for this purpose?
> >>>
> >>> To me, all meta data is equal. That is, your classic Genbank feature
> >>> annotation and a user's arbitrary meta-tag like "Bob thinks this
> >>> is a
> >>> kinase domain" aren't different in kind even if they are
> >>> different in
> >>> content.
> >>>
> >>> As resequencing projects multiply, the ability to create arbitrary
> >>> meta tags, attach them to different types of objects, and use those
> >>> tags to link them together will become desirable, if not essential.
> >>>
> >>> Keeping a common interface to all of these meta data types would be
> >>> advantageous, plus new users won't have to determine whether they
> >>> need to use Bio::Meta objects or Bio::Annotation objects.
> >>>
> >>> So I would argue for all of the meta data types to live "under one
> >>> roof". Which roof isn't as important. Bio::Annotation, since it
> >>> already exists for today's meta data, seems like a reasonable
> >>> choice.
> >>> (assuming Annotation objects are flexible enough to be extended as
> >>> you propose)
> >>>
> >>> There, and no flames or jibes even. :)
> >>
> >> I guess what I want to know is whether there should to be a
> >> distinction between 'normal' sequence annotation (comments,
> >> references, and so on) and annotation that could be best described as
> >> position-specific (like RNA or protein structural annotation).  The
> >> current meta implementation is for sequence data only; I felt it
> >> would be nice to have a generic implementation that would be
> >> applicable to any object data.
> >
> > my stream-of-consciousness for right now:
> >
> > I was thinking Bio::Annotation is where this should go - that
> > system doesn't have anything about it that makes it explicitly
> > sequence related. What we're trying to hammer out here on the
> > Alignment side - which fits with your RNA example - is have
> > features, basically SeqFeatures - associated with alignments so
> > columns can be annotated to cover things like character sets and
> > partitions for phylogenetic analyses.  As for data which annotates
> > non-contiguous things like RNAstems we may have  to be more
> > creative about that or model it with a splitLocation.
> >
> > So currently we've added code so that an Alignment is-a
> > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
> > end, with the goal of being able to capture more of the data that
> > can be represented in a NEXUS file.
> >
> > It feels more like a hack than an elegant Meta-data solution, but I
> > am totally sure whether the data you are thinking about doing at
> > this point, perhaps I need to spend more time thinking about it.
> > Or are you worried about the idea of whether the semantic mapping
> > of the data into features or annotations is confusing users?
>
> Sorry in advance for the longish response here...
>
> My original thought was to have a generic abstract class capable of
> positionally describing data in any another class, similar to
> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
> Implementing classes would be capable of having different data
> structures based on their use (simple string, array, AoA, AoH, AoO).
> One MetaCollection class to contain them all in a tag-like system, so
> you could have mixed data types describe the same object.  The latter
> Collection class is so similar to AnnotationCollection that I agree
> Bio::Annotation would be the best place for this.
>
> The way I reconfigured Stockholm alignment parsing/writing is to use
> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
> capable of holding a sequence and several meta strings, stored as
> tags or 'names'.  However, there is no Meta object for alignments
> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
> rather have a generic Meta object independent of the sequence cruft.
>
> So for this partial Pfam alignment,
>
> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
> #=GR Q92SV1_RHIME/122-299 pAS .........................
> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
> #=GC SA_cons                 03002200312...1312414..676
> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
> //
>
> '#=GC' lines would be in generic meta string objects in the
> alignment, while '#=GR' tags would be in similar meta objects in the
> relevant sequences.  As long as both aren't AnnotatableI this isn't
> an issue.
>
> Similarly, NEXUS files which contained any position-based values
> could hold a meta string/array object in a similar tag.
>
> The basic scheme is:
>                      |--String
>
> Annotation::Meta----|--Array
>
>                      |--HorriblyComplexDataStruct
>
> Then I started thinking about where this could be applied, and
> whether a true Meta object needs to be constrained only to describing
> position-based data.  This somewhat relates to this bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>
> which seems to need a simple but unconstrained hash-of-arrays-based
> meta object.
>
> Then my head appropriately exploded...
>
> Hope everything is going well at the hackathon!  Looks like some
> interesting stuff coming out of it.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From fgarret at ub.edu  Mon Dec 18 11:18:31 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 17:18:31 +0100
Subject: [Bioperl-l] PAML files
Message-ID: <4586BF57.4090002@ub.edu>

Hi all,

does anyone knows how to get the name of the .ctl file created by the 
PAML module? Inside the tmp directory there are 2 files with random 
names (tree and ctl). Why do they have random names?? Wouldn't it be 
easier to assign them a fixed name?? For instance "codeml.ctl" and 
"tree.nwk"??

thanks in adv,
FG


From bix at sendu.me.uk  Mon Dec 18 11:15:21 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 16:15:21 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
Message-ID: <4586BE99.7020308@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
> 
>> Lincoln Stein wrote:
>>> This is very embarassing for me, particularly since I spent a lot
>>> of time validating that Bio::Graphics was working properly before
>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>> release? How about a 1.5.2.1release?
>> 
>> I'm happy to try a point release for critical bug fixes. Why don't
>> you commit the necessary fixes to branch-1-5-2 and let me know when
>> you're happy, and I'll do 1.5.2.1.
> 
> Feel free to do that, but why not make a 1.5.3 off the main trunk? 
> 1.5.2.1 may be adding more to the version confusion 
> (developer/stable/point-release/etc) than it is worth,

My feeling is that 1.5.3 should be reserved for some significant changes
and new features, and not just a few bug fixes. I'd say this causes less
confusion amongst users - they can associate '1.5.2' with a certain API
and feature-set, and the specific name of the file they download and
install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
matter at all to them.

I also won't have to make some major announcement about it; it will
simply be the most recent developer version of bioperl available so new
users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
1.5.2 users will only feel compelled to get it if they suffer from the
bugs fixed.


> and there is no shame in releasing new developer versions every few
> weeks.

I think doing frequent releases are inadvisable; such a quick release
won't have had much testing so we shouldn't encourage people to install
it: encouragement is implicit when a major new version comes out like
1.5.3. People who want to live on the edge can and should be using a
CVS checkout.


From bix at sendu.me.uk  Mon Dec 18 14:15:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 19:15:16 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
Message-ID: <4586E8C4.6030306@sendu.me.uk>

Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
> 
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>> 
>> Gabriel
> 
> I can confirm that.  It is definitely dropping them in merge_lineage
>  (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.

I confirm the 'dropped' nodes, but also claim that this is no bug.

For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.

The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.

Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.


PS. I changed the merge_lineage() algorithm to be 18x faster (from the 
absurd 3mins for making the 190 species tree to a more reasonable 10s), 
without changing the tree produced.


From fgarret at ub.edu  Mon Dec 18 15:01:38 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:01:38 +0100
Subject: [Bioperl-l] PAML files
In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
Message-ID: <4586F3A2.4010607@ub.edu>


Hi Jason,

This question is related with the one I made previously today.
I need to run codeml with 3 tree topologies. I looked on codeml module 
but it only accepts one tree as input so I thought of using the codeml 
module to prepare all the files and then I would just have to run the 
codeml with the new tree file in batch. But for that I need to know 
which one is the ctl file.

any idea?
FG

Jason Stajich wrote:
> They are temporary names so they are deliberately random and there is no 
> intention of you needing them after a run since it to be cleaned up on 
> the fly. We use an internal method for generating tempfiles that takes 
> care of cleanup afterwards.  I suppose since we do all the work within a 
> temp directory that is cleaned up, one could have a fixed name for the 
> tree, alignment, and ctl files but honestly we never expect people to be 
> reading these filenames as they are intended to be transient.
> 
> What problem are you having that you need the filename?
> 
> -jason
> On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> does anyone knows how to get the name of the .ctl file created by the 
>> PAML module? Inside the tmp directory there are 2 files with random 
>> names (tree and ctl). Why do they have random names?? Wouldn't it be 
>> easier to assign them a fixed name?? For instance "codeml.ctl" and 
>> "tree.nwk"??
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
> 
> 


From fgarret at ub.edu  Mon Dec 18 15:07:46 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:07:46 +0100
Subject: [Bioperl-l] codeml
In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
References: <45868466.508@ub.edu>
	<7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
Message-ID: <4586F512.1030209@ub.edu>


Right now it's impossible for me to write it.
By February or March I should have more time but I'll let you know.

FG

Jason Stajich wrote:
> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I 
> guess we'll need to allow the -tree option to accept and arrayref of trees.
> Are you willing to try write this patch?  It should be added as a 
> bug/feature request to bugzilla so it can be corrected in short order.
> 
> -jason
> On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> I've been using bioperl's PAML module (specifically the codeml part) but 
>> with just one tree.
>>
>> Since the program accepts several trees as input (and runs the analysis 
>> for each tree outputting the difference in likelihoods for each one) I 
>> was wondering if there's some way to do it through bioperl?
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich 
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 


From cjfields at uiuc.edu  Mon Dec 18 15:55:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 14:55:55 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4586E8C4.6030306@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>


On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that.  It is definitely dropping them in merge_lineage
>>  (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'  
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue  
> arises
> in all cases.

Makes sense now.  I personally would consider this a bug since the  
results are unexpected (so the docs need to be modified in order to  
clarify).  Some say tomato...

I suppose this is one of the issues one might run into when using  
NCBI taxonomy to build trees.

> Gabriel, please confirm this isn't a bug, or suggest how you  
> propose to
> see your taxa when they are not all leaves of the tree.

Having the nodes appear internally seems semantically correct to me.   
Is there any other way?

> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable  
> 10s),
> without changing the tree produced.

Definitely an improvement!

chris


From jason at bioperl.org  Mon Dec 18 14:33:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:33:32 -0500
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586BF57.4090002@ub.edu>
References: <4586BF57.4090002@ub.edu>
Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>

They are temporary names so they are deliberately random and there is  
no intention of you needing them after a run since it to be cleaned  
up on the fly. We use an internal method for generating tempfiles  
that takes care of cleanup afterwards.  I suppose since we do all the  
work within a temp directory that is cleaned up, one could have a  
fixed name for the tree, alignment, and ctl files but honestly we  
never expect people to be reading these filenames as they are  
intended to be transient.

What problem are you having that you need the filename?

-jason
On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:

> Hi all,
>
> does anyone knows how to get the name of the .ctl file created by the
> PAML module? Inside the tmp directory there are 2 files with random
> names (tree and ctl). Why do they have random names?? Wouldn't it be
> easier to assign them a fixed name?? For instance "codeml.ctl" and
> "tree.nwk"??
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjm at fruitfly.org  Mon Dec 18 16:50:00 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 18 Dec 2006 13:50:00 -0800
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>


I agree with everything Heikki is saying, I just wanted to highlight  
one paragraph:

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?

One solution is to give your annotation/metadata-model formal  
computational semantics and use ontologies to give additional  
semantics to your metadata tags. This provides both user information  
in the form of documentation, and a means of specifying to the  
computer exactly what should be done with the tags.

This is probably overkill for bioperl; but if the use cases being  
proposed do lean in the direction of a new metadata system that is  
not necessarily backwards compatible with the existing one, then I'd  
recommend checking out what's already out there before re-inventing  
the wheel. Perl RDF libraries are getting a little better.

If anyone is interested in pursuing this sort of thing (probably on a  
branch), let me know

On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.
>
> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.
>
>
> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.
>
> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki
>
>
>
> On Friday 15 December 2006 19:23, Chris Fields wrote:
>> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
>>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>>>> Hey Chris,
>>>>>
>>>>> My thoughts below.
>>>>>
>>>>>> [Chris]
>>>>>> This could be used to annotate any
>>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
>>>>>> you,
>>>>>> maybe in a collection (similar to AnnotationCollection).  I  
>>>>>> thought
>>>>>> something like this may be of general use for any PrimarySeq
>>>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>>>> SeqFeatures where structure could be stored (tRNA or  
>>>>>> riboswitches),
>>>>>> etc.
>>>>>>
>>>>>> However, this also seems to fall into the category of sequence
>>>>>> annotation.  So, would it be better to have a set of
>>>>>> Bio::Annotation
>>>>>> classes used for this purpose?
>>>>>
>>>>> To me, all meta data is equal. That is, your classic Genbank  
>>>>> feature
>>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this
>>>>> is a
>>>>> kinase domain" aren't different in kind even if they are
>>>>> different in
>>>>> content.
>>>>>
>>>>> As resequencing projects multiply, the ability to create arbitrary
>>>>> meta tags, attach them to different types of objects, and use  
>>>>> those
>>>>> tags to link them together will become desirable, if not  
>>>>> essential.
>>>>>
>>>>> Keeping a common interface to all of these meta data types  
>>>>> would be
>>>>> advantageous, plus new users won't have to determine whether they
>>>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>>>
>>>>> So I would argue for all of the meta data types to live "under one
>>>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>>>> already exists for today's meta data, seems like a reasonable
>>>>> choice.
>>>>> (assuming Annotation objects are flexible enough to be extended as
>>>>> you propose)
>>>>>
>>>>> There, and no flames or jibes even. :)
>>>>
>>>> I guess what I want to know is whether there should to be a
>>>> distinction between 'normal' sequence annotation (comments,
>>>> references, and so on) and annotation that could be best  
>>>> described as
>>>> position-specific (like RNA or protein structural annotation).  The
>>>> current meta implementation is for sequence data only; I felt it
>>>> would be nice to have a generic implementation that would be
>>>> applicable to any object data.
>>>
>>> my stream-of-consciousness for right now:
>>>
>>> I was thinking Bio::Annotation is where this should go - that
>>> system doesn't have anything about it that makes it explicitly
>>> sequence related. What we're trying to hammer out here on the
>>> Alignment side - which fits with your RNA example - is have
>>> features, basically SeqFeatures - associated with alignments so
>>> columns can be annotated to cover things like character sets and
>>> partitions for phylogenetic analyses.  As for data which annotates
>>> non-contiguous things like RNAstems we may have  to be more
>>> creative about that or model it with a splitLocation.
>>>
>>> So currently we've added code so that an Alignment is-a
>>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
>>> end, with the goal of being able to capture more of the data that
>>> can be represented in a NEXUS file.
>>>
>>> It feels more like a hack than an elegant Meta-data solution, but I
>>> am totally sure whether the data you are thinking about doing at
>>> this point, perhaps I need to spend more time thinking about it.
>>> Or are you worried about the idea of whether the semantic mapping
>>> of the data into features or annotations is confusing users?
>>
>> Sorry in advance for the longish response here...
>>
>> My original thought was to have a generic abstract class capable of
>> positionally describing data in any another class, similar to
>> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
>> Implementing classes would be capable of having different data
>> structures based on their use (simple string, array, AoA, AoH, AoO).
>> One MetaCollection class to contain them all in a tag-like system, so
>> you could have mixed data types describe the same object.  The latter
>> Collection class is so similar to AnnotationCollection that I agree
>> Bio::Annotation would be the best place for this.
>>
>> The way I reconfigured Stockholm alignment parsing/writing is to use
>> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
>> capable of holding a sequence and several meta strings, stored as
>> tags or 'names'.  However, there is no Meta object for alignments
>> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
>> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
>> rather have a generic Meta object independent of the sequence cruft.
>>
>> So for this partial Pfam alignment,
>>
>> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
>> #=GR Q92SV1_RHIME/122-299 pAS .........................
>> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
>> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
>> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
>> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
>> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
>> #=GC SA_cons                 03002200312...1312414..676
>> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
>> //
>>
>> '#=GC' lines would be in generic meta string objects in the
>> alignment, while '#=GR' tags would be in similar meta objects in the
>> relevant sequences.  As long as both aren't AnnotatableI this isn't
>> an issue.
>>
>> Similarly, NEXUS files which contained any position-based values
>> could hold a meta string/array object in a similar tag.
>>
>> The basic scheme is:
>>                      |--String
>>
>> Annotation::Meta----|--Array
>>
>>                      |--HorriblyComplexDataStruct
>>
>> Then I started thinking about where this could be applied, and
>> whether a true Meta object needs to be constrained only to describing
>> position-based data.  This somewhat relates to this bug:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>>
>> which seems to need a simple but unconstrained hash-of-arrays-based
>> meta object.
>>
>> Then my head appropriately exploded...
>>
>> Hope everything is going well at the hackathon!  Looks like some
>> interesting stuff coming out of it.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Dec 18 14:35:50 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:35:50 -0500
Subject: [Bioperl-l] codeml
In-Reply-To: <45868466.508@ub.edu>
References: <45868466.508@ub.edu>
Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>

This is shortcoming in the Run::Phylo::PAML::Codeml implementation -  
I guess we'll need to allow the -tree option to accept and arrayref  
of trees.
Are you willing to try write this patch?  It should be added as a bug/ 
feature request to bugzilla so it can be corrected in short order.

-jason
On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:

> Hi all,
>
> I've been using bioperl's PAML module (specifically the codeml  
> part) but
> with just one tree.
>
> Since the program accepts several trees as input (and runs the  
> analysis
> for each tree outputting the difference in likelihoods for each one) I
> was wondering if there's some way to do it through bioperl?
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From gowthaman.ramasamy at sbri.org  Mon Dec 18 17:19:09 2006
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 18 Dec 2006 14:19:09 -0800
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>


Hi List,
Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

Many thanks in advance,
gowtham


From cjfields at uiuc.edu  Mon Dec 18 17:33:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:33:34 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <FBD2CED3-EBE7-4CB9-8969-70C7A5931A04@uiuc.edu>


On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.

AlignIO::stockholm is where I'll initially test it out.

> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.

I agree.

> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.

I would probably start with a general Bio::Annotation::MetaI abstract  
class, which supplements AnnotationI with general meta-specific  
methods (meta, meta_text, named_meta, etc)?  Implement this in  
whatever way one wanted (RNA structure as strings, quality data as  
arrays, etc) under the constraints of the interface description.

Multiple meta objects, potentially of mixed data types, could be  
added in an AnnotationCollection along with other Bio::Annotation  
data, or stored in a nested meta-specific AnnotationCollection object  
(I favor the former as it's simpler).  So you could have an  
alignment, sequence, seqfeature (anything that is AnnotatableI) with  
a regular AnnotationCollection also containing possibly multiple meta  
objects, each meta object also containing possibly more than one set  
of meta data.

The key issue I have is whether or not to constrain these to  
describing positional data, similar to Bio::Seq::Meta, by ensuring  
that the data is_flush(), etc.  My current inclination is 'no', and  
to have a separate abstract class which describes these methods,  
implementing those separately.

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki

The initial use case for this would be simple data strings for  
alignment data.  I already have a partial implementation in place for  
stockholm using Bio::Seq::Meta (which led me to this proposal!).  I  
like Chris M.'s idea of ensuring that meta implementations use some  
sort of formalized ontology, but I'll probably start out very simple  
and work up from there.

chris


From cjfields at uiuc.edu  Mon Dec 18 17:38:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:38:14 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <4586BE99.7020308@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
	<4586BE99.7020308@sendu.me.uk>
Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu>


On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>>
>> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
>>
>>> Lincoln Stein wrote:
>>>> This is very embarassing for me, particularly since I spent a lot
>>>> of time validating that Bio::Graphics was working properly before
>>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>>> release? How about a 1.5.2.1release?
>>>
>>> I'm happy to try a point release for critical bug fixes. Why don't
>>> you commit the necessary fixes to branch-1-5-2 and let me know when
>>> you're happy, and I'll do 1.5.2.1.
>>
>> Feel free to do that, but why not make a 1.5.3 off the main trunk?
>> 1.5.2.1 may be adding more to the version confusion
>> (developer/stable/point-release/etc) than it is worth,
>
> My feeling is that 1.5.3 should be reserved for some significant  
> changes
> and new features, and not just a few bug fixes. I'd say this causes  
> less
> confusion amongst users - they can associate '1.5.2' with a certain  
> API
> and feature-set, and the specific name of the file they download and
> install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
> matter at all to them.
>
> I also won't have to make some major announcement about it; it will
> simply be the most recent developer version of bioperl available so  
> new
> users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
> 1.5.2 users will only feel compelled to get it if they suffer from the
> bugs fixed.
>
>
>> and there is no shame in releasing new developer versions every few
>> weeks.
>
> I think doing frequent releases are inadvisable; such a quick release
> won't have had much testing so we shouldn't encourage people to  
> install
> it: encouragement is implicit when a major new version comes out like
> 1.5.3. People who want to live on the edge can and should be using a
> CVS checkout.

I thought that 1.5.2 was considered a point release for the 1.5 dev  
series, for bug fixes along with the potential for added/experimental  
features.  Similarly, 1.6.x releases would be point releases for bug  
fixes only with all tests passing (no added features since it is a  
stable release series).  I guess one could reason that 1.5.x releases  
have both bug fixes and new features, while 1.5.x.y releases are  
simply bug fixes for the 1.5.x branch (no new features).  We probably  
should add something to the FAQ and maybe make a few changes to the  
1.5.2 wiki page.

I think having a 1.5.2.1 release is feasible as a quick one-off to  
get Lincoln's fixes in, since you would make them off the 1.5.2  
branch anyway (so I guess it could be considered a bug release from  
that branch).  It's probably not something we should make a habit of,  
but then again I'm not the Pumpkin!

chris


From bix at sendu.me.uk  Mon Dec 18 17:50:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 22:50:11 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
Message-ID: <45871B23.8070103@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
> 
>> For example, the first 'drop' happens for the 101st species which is
>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>> longer a leaf, so the overall number of leaves does not increase.
>
> Makes sense now.  I personally would consider this a bug since the 
> results are unexpected (so the docs need to be modified in order to 
> clarify).  Some say tomato...
> 
> I suppose this is one of the issues one might run into when using NCBI 
> taxonomy to build trees.

No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
deliberately then does:

# simple paths are contracted by removing degree one nodes
$tree->contract_linear_paths;

Because that is what Gabriel's script originally did.


>> Gabriel, please confirm this isn't a bug, or suggest how you propose to
>> see your taxa when they are not all leaves of the tree.
> 
> Having the nodes appear internally seems semantically correct to me.  Is 
> there any other way?

I suppose if we want to see all the input species output again we have 
to make contract_linear_paths() aware of nodes we want to keep, even 
when they are degree one nodes. Gabriel, is that what you want to see?


From cjfields at uiuc.edu  Mon Dec 18 18:14:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:14:23 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <45871B23.8070103@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
Message-ID: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>


On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'  
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now.  I personally would consider this a bug since the  
>> results are unexpected (so the docs need to be modified in order  
>> to clarify).  Some say tomato...
>> I suppose this is one of the issues one might run into when using  
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl  
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.

I think you misunderstood me.  The tree is fine; the data used to  
make the tree (NCBI taxonomy) is the issue.  One of the clear caveats  
that NCBI attaches to their taxonomy data is that should not be the  
'primary source for taxonomic or phylogenetic information':

http://tinyurl.com/y3k624

I think it works as a good guide as long as one takes the above into  
consideration.  That and the fact that not all taxids attached to  
sequence data will represent leaf nodes.

chris


From cjfields at uiuc.edu  Mon Dec 18 18:15:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:15:56 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
	<6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu>


On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote:

>
> I agree with everything Heikki is saying, I just wanted to highlight
> one paragraph:
>
>> The problem I see with undefined, totally open meta annotation, is
>> that if you
>> can put anything in there, it is also totally confusing to a user.
>> If you can
>> put anything in, how do you know what to get get out and know that
>> it is
>> there?
>
> One solution is to give your annotation/metadata-model formal
> computational semantics and use ontologies to give additional
> semantics to your metadata tags. This provides both user information
> in the form of documentation, and a means of specifying to the
> computer exactly what should be done with the tags.
>
> This is probably overkill for bioperl; but if the use cases being
> proposed do lean in the direction of a new metadata system that is
> not necessarily backwards compatible with the existing one, then I'd
> recommend checking out what's already out there before re-inventing
> the wheel. Perl RDF libraries are getting a little better.
>
> If anyone is interested in pursuing this sort of thing (probably on a
> branch), let me know
...

I like the idea of of using ontologies (although that's one of my  
many weak points!).  I'll likely start off with simple examples using  
meta data initially, then progress from there.  It is a developer  
series, after all!

Thanks everybody!  I think I have an idea on how to at least get  
started.

chris


From bix at sendu.me.uk  Mon Dec 18 18:27:15 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:27:15 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
Message-ID: <458723D3.4010908@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>>> For example, the first 'drop' happens for the 101st species which is
>>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>>>> longer a leaf, so the overall number of leaves does not increase.
>>>
>>> Makes sense now.  I personally would consider this a bug since the 
>>> results are unexpected (so the docs need to be modified in order to 
>>> clarify).  Some say tomato...
>>> I suppose this is one of the issues one might run into when using 
>>> NCBI taxonomy to build trees.
>>
>> No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
>> deliberately then does:
>>
>> # simple paths are contracted by removing degree one nodes
>> $tree->contract_linear_paths;
>>
>> Because that is what Gabriel's script originally did.
> 
> I think you misunderstood me.  The tree is fine; the data used to make 
> the tree (NCBI taxonomy) is the issue.

In what way is it the issue? The database is also fine as far as I can 
see, in so far as it is not causing any problems in this instance.

Gabriel asked for a tree featuring a species and its subspecies. The 
NCBI taxonomy database provided Bioperl the correct data to build such a 
tree. Then Gabriel asked to remove the degree one nodes of his tree. His 
problem was that doing that happened to (correctly) remove the species 
node. If he wants to see both his species and his subspecies he must 
either not remove degree one nodes, or alter the method of doing so to 
keep desired nodes. There is no possible way for NCBI to improve matters 
here.


From bix at sendu.me.uk  Mon Dec 18 18:45:59 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:45:59 +0000
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45872837.6050403@sendu.me.uk>

Gowthaman Ramasamy wrote:
> Hi List, Is there any module in bioperl which can find out the primer
> binding sites in a genomic sequence. I am interested in finding
> locations with few mismatches along the primer...not just the exact
> match (which is a very trivial task)

There's no module dedicated to that task, but Bioperl may help you to
answer the question.

Probably the easiest/reliable/clear thing to do is to do a Blast with
appropriate settings for short sequence with few mismatches. You can
write a script to only consider hits for your forward primer that are a
'primable' distance from a hit to your reverse primer (and check their
orientations are correct as well).

Or use some e-pcr tool.


From Kevin.M.Brown at asu.edu  Mon Dec 18 18:52:20 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 18 Dec 2006 16:52:20 -0700
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu>

A function I use to find the first landing site for a primer.  Should be
modifiable to handle multiple occurences:

=head1 C<match>

Match searches for a near alignment between two strings and returns the
position
at which the two strings align.  Match is based on 80% conformation

	match($this, $in_that)
	
=cut

sub match
{
	my ($primer, $gene) = @_;
	my $start   = 0;
	my $pattern = "";
	for (my $i = 0 ; $i < length($primer) ; $i++)
	{
		$pattern .= substr($primer, $i, 1);
		pos($gene) = 0;
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
		else
		{
			$start = 0;
			chop($pattern);
			$pattern .= '.';
		}
	}
	if ($pattern =~ /\.$/)
	{
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
	}
	$pattern =~ s/\.//g;

	if ((length($pattern) / length($primer)) > .8)
	{

		#print $start . "\n";
		return $start;
	}
	return 0;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, December 18, 2006 4:46 PM
> To: Gowthaman Ramasamy
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] module to find out primer binding 
> sites in a genome sequence
> 
> Gowthaman Ramasamy wrote:
> > Hi List, Is there any module in bioperl which can find out 
> the primer
> > binding sites in a genomic sequence. I am interested in finding
> > locations with few mismatches along the primer...not just the exact
> > match (which is a very trivial task)
> 
> There's no module dedicated to that task, but Bioperl may help you to
> answer the question.
> 
> Probably the easiest/reliable/clear thing to do is to do a Blast with
> appropriate settings for short sequence with few mismatches. You can
> write a script to only consider hits for your forward primer 
> that are a
> 'primable' distance from a hit to your reverse primer (and check their
> orientations are correct as well).
> 
> Or use some e-pcr tool.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From torsten.seemann at infotech.monash.edu.au  Mon Dec 18 18:52:58 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 19 Dec 2006 10:52:58 +1100
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <458729DA.9030909@infotech.monash.edu.au>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

This FAQ question may help:
http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

This software may help:
http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sdavis2 at mail.nih.gov  Mon Dec 18 21:16:19 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 18 Dec 2006 21:16:19 -0500
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45874B73.7010600@mail.nih.gov>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)
>   

See here:

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

It is designed for exactly this task, is very fast, is available as an 
executable or web-based (though watch the usage requirements), and the 
output can be parsed rather easily.

Sean


From cjfields at uiuc.edu  Mon Dec 18 21:30:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 20:30:04 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <458723D3.4010908@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>

>> I think you misunderstood me.  The tree is fine; the data used to  
>> make
>> the tree (NCBI taxonomy) is the issue.
>
> In what way is it the issue? The database is also fine as far as I can
> see, in so far as it is not causing any problems in this instance.

I should maybe have clarified a bit more: what I said has nothing to  
do with the structure of the database itself.  I was just pointing  
out that NCBI Taxonomy isn't the best source of data for building a  
phylogenetic tree, something NCBI also points out (the link in my  
last post).  Not a big deal, really.

> Gabriel asked for a tree featuring a species and its subspecies. The
> NCBI taxonomy database provided Bioperl the correct data to build  
> such a
> tree. Then Gabriel asked to remove the degree one nodes of his  
> tree. His
> problem was that doing that happened to (correctly) remove the species
> node. If he wants to see both his species and his subspecies he must
> either not remove degree one nodes, or alter the method of doing so to
> keep desired nodes. There is no possible way for NCBI to improve  
> matters
> here.

Actually, there isn't any way they could w/o digging through the  
literature in many cases.  The problem is incomplete taxonomic  
information for nodes derived from older sequence data, where a genus  
and species was designated but nothing else (strain, etc) is known.

Again, I merely was pointing out what I had mentioned above.  I  
wasn't criticizing you, Gabriel, or the methodology here.  Honest!

chris


From avilella at gmail.com  Mon Dec 18 16:43:27 2006
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 18 Dec 2006 21:43:27 +0000
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586F3A2.4010607@ub.edu>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
	<4586F3A2.4010607@ub.edu>
Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com>

Filipe, if you need to create the ctl file but not run the job, you
can use the "prepare" method in Codeml run.

Also, there is a tmpdir and save_tempfiles method that will keep the
files where you want. About the naming, you can add a ".tree" and
".aln" extension to the tempnames if you want, by altering the
$temptreefile and $tempseqfile variables in
bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version).

If you want, you can also add a couple of getters/setters there:

sub alnfilename{
    my $self = shift;

    return $self->{'alnfilename'} = shift if @_;
    return $self->{'alnfilename'};
}

and subtitute those $tempseqfile io calls for you
$self->{'alnfilename'} io calls.

$codeml->alnfilename("/path/name");
$codeml->prepare;
...
$codeml->run;

What I use to do is to have the aln and tree files in a different
place. Codeml will create the tmp files for running somewhere, and
then delete all the stuff when done.

Cheers,

    Albert.

On 12/18/06, Filipe Garrett <fgarret at ub.edu> wrote:
>
> Hi Jason,
>
> This question is related with the one I made previously today.
> I need to run codeml with 3 tree topologies. I looked on codeml module
> but it only accepts one tree as input so I thought of using the codeml
> module to prepare all the files and then I would just have to run the
> codeml with the new tree file in batch. But for that I need to know
> which one is the ctl file.
>
> any idea?
> FG
>
> Jason Stajich wrote:
> > They are temporary names so they are deliberately random and there is no
> > intention of you needing them after a run since it to be cleaned up on
> > the fly. We use an internal method for generating tempfiles that takes
> > care of cleanup afterwards.  I suppose since we do all the work within a
> > temp directory that is cleaned up, one could have a fixed name for the
> > tree, alignment, and ctl files but honestly we never expect people to be
> > reading these filenames as they are intended to be transient.
> >
> > What problem are you having that you need the filename?
> >
> > -jason
> > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> >
> >> Hi all,
> >>
> >> does anyone knows how to get the name of the .ctl file created by the
> >> PAML module? Inside the tmp directory there are 2 files with random
> >> names (tree and ctl). Why do they have random names?? Wouldn't it be
> >> easier to assign them a fixed name?? For instance "codeml.ctl" and
> >> "tree.nwk"??
> >>
> >> thanks in adv,
> >> FG
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> > http://jason.open-bio.org/
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From valiente at lsi.upc.edu  Mon Dec 18 23:18:20 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 19 Dec 2006 13:18:20 +0900
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>

Thanks a lot for the prompt answer and follow-up discussion. I think  
this turned out not to be a bug in the merge_lineage() code but a  
direct consequence of building a phylogenetic tree instead of a  
taxonomic tree, aka with internal node labels.

In order to reconstruct the NCBI taxonomy for the set of species  
present in a given phylogenetic tree, the only reasonable work-around  
seems to be a first step of merging lineages and contracting linear  
paths with the current implementation, followed by a second step of  
restricting the given phylogenetic tree to the set of species present  
in the obtained NCBI taxonomy. But this does not affect the code for  
merge_lineage().

Gabriel

>>> I think you misunderstood me.  The tree is fine; the data used to  
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I  
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing  
> to do with the structure of the database itself.  I was just  
> pointing out that NCBI Taxonomy isn't the best source of data for  
> building a phylogenetic tree, something NCBI also points out (the  
> link in my last post).  Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build  
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his  
>> tree. His
>> problem was that doing that happened to (correctly) remove the  
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing  
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve  
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the  
> literature in many cases.  The problem is incomplete taxonomic  
> information for nodes derived from older sequence data, where a  
> genus and species was designated but nothing else (strain, etc) is  
> known.
>
> Again, I merely was pointing out what I had mentioned above.  I  
> wasn't criticizing you, Gabriel, or the methodology here.  Honest!
>
> chris


From cjfields at uiuc.edu  Mon Dec 18 23:41:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 22:41:16 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
	<287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
Message-ID: <D72C19DB-B551-414E-96AF-113B32A34BCB@uiuc.edu>


On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote:

> Thanks a lot for the prompt answer and follow-up discussion. I  
> think this turned out not to be a bug in the merge_lineage() code  
> but a direct consequence of building a phylogenetic tree instead of  
> a taxonomic tree, aka with internal node labels.
>
> In order to reconstruct the NCBI taxonomy for the set of species  
> present in a given phylogenetic tree, the only reasonable work- 
> around seems to be a first step of merging lineages and contracting  
> linear paths with the current implementation, followed by a second  
> step of restricting the given phylogenetic tree to the set of  
> species present in the obtained NCBI taxonomy. But this does not  
> affect the code for merge_lineage().
>
> Gabriel

I did notice one thing, though it's minor: if you use the option to  
retrieve the data from Entrez, a few species aren't found (even  
though they show up in a local taxonomy search).  I think both were  
E. coli strains.

chris


From DGroskreutz at twt.com  Tue Dec 19 02:00:40 2006
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Tue, 19 Dec 2006 01:00:40 -0600
Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office.
Message-ID: <OFEB7AC000.56E72ED8-ON86257249.002683B4-86257249.002683B4@twt.com>


I will be out of the office starting  12/18/2006 and will not return until
01/02/2007.


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:20:56 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:20:56 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From michael.watson at bbsrc.ac.uk  Tue Dec 19 07:27:59 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:27:59 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk>

Sorry, problem solved.

Mick 

-----Original Message-----
From: michael watson (IAH-C) 
Sent: 19 December 2006 12:21
To: bioperl-l at lists.open-bio.org
Subject: Problems with EMBL entries and fasta IDs?

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From roest216 at student.otago.ac.nz  Tue Dec 19 04:15:55 2006
From: roest216 at student.otago.ac.nz (Stephan Roessner)
Date: Tue, 19 Dec 2006 22:15:55 +1300
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>

Dear support team,

I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
gbrowse.
The installation seems to work (except of the test failures) but the
gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
of course it requires 1.52.

Is there a chance to find out what went wrong?

thanks a lot,
Stephan


From bix at sendu.me.uk  Tue Dec 19 10:12:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 15:12:39 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
Message-ID: <45880167.9010605@sendu.me.uk>

Stephan Roessner wrote:
> Dear support team,
> 
> I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> gbrowse.
> The installation seems to work (except of the test failures) but the
> gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> of course it requires 1.52.
> 
> Is there a chance to find out what went wrong?

Nothing went wrong with the Bioperl installation (well, expect there 
shouldn't have been any test failures - can you post those please?). 
gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
me exactly where you downloaded gbrowse from and how you went about 
installing it, and provide the exact, complete error message you got 
from it, I might be able help the authors fix the problem.

Or I'm pretty sure they can figure it our for themselves :)


From cjfields at uiuc.edu  Tue Dec 19 11:05:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 10:05:01 -0600
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>


On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:

> I really don't think the BioPerl version detection is wrong.  I  
> actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
> try reinstalling BioPerl and providing the --uninst 1 argument to  
> remove
> older versions of BioPerl:
>
>   sudo ./Build install --uninst 1
>
> Scott

Could having two Bioperl instances explain the test failures?  I'm  
not sure (maybe Sendu can answer this), but I would assume  
Module::Build uses the current working directory for test runs.

chris


From bix at sendu.me.uk  Tue Dec 19 12:02:34 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:02:34 +0000
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
Message-ID: <45881B2A.8060907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:
> 
>> I really don't think the BioPerl version detection is wrong.  I actually
>> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
>> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
>> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
>> have seen this happen when more than one BioPerl instance is installed
>> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
>> try reinstalling BioPerl and providing the --uninst 1 argument to remove
>> older versions of BioPerl:
>>
>>   sudo ./Build install --uninst 1
>>
>> Scott
> 
> Could having two Bioperl instances explain the test failures?  I'm not 
> sure (maybe Sendu can answer this), but I would assume Module::Build 
> uses the current working directory for test runs.

It does, so that shouldn't be an issue for the test failures.


From ferraria at gmail.com  Tue Dec 19 11:40:05 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 17:40:05 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
Message-ID: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From bix at sendu.me.uk  Tue Dec 19 12:06:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:06:03 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>	
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <45881BFB.7020008@sendu.me.uk>

Scott Cain wrote:
> I really don't think the BioPerl version detection is wrong.  I actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.

Yes, I saw that, which is why I thought I must be looking at something 
different to what the OP had installed.


> My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove
> older versions of BioPerl:
> 
>   sudo ./Build install --uninst 1

My confusion is that he has definitely installed 1.5.2 and this version 
is being polled for its version number (by something!) and returning the 
correct '1.0050021', whilst the something expects '1.52'. Anyway, this 
can only be resolved if Stephan provides the real error message and its 
context.


From cjfields at uiuc.edu  Tue Dec 19 12:27:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 11:27:24 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>


On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:

> Hi all,
>
> I've just installed BioPerl 1.5.2 (devel) on a linux mandrake  
> machine with
> the cpan shell.
> I want to use the Bio::DB::EUtilities to retrieve data (id's) from  
> NCBI
> 'gene' database (first step of my pipeline).
>
> But the installation of this package doesn't seem to be correct :
> The simple example given on the documentation doesn't work. (this  
> one :
> http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
>
> Here is the error message I got :
> "Can't use an undefined value as an ARRAY reference at
> /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> In the UserAgent package, line 779 is in the private "_need_proxy"
> subroutine and corresponds to the code :    ...if (@{ $self-> 
> {'no_proxy'} })
> ...
>
> If I comment this line in the UserAgent package and the  
> corresponding "}",
> the example works. But obviously, I prefer to solve the problem in  
> a regular
> way :)
>
> Indeed, my computer accesses the internet via a http proxy and I  
> didn't tell
> this to BioPerl at any moment.
> As I read on the BioPerl Wiki site, I tried to configure an  
> $http_proxy
> environment variable but it still doesn't work.
>
> One last maybe important information is that I saw during the  
> installation
> that the tests 't/EUtilities' were skipped because of an unrevealed  
> reason.
>
>
> So finally I got two questions :
> 1. Is there somebody who can figure out what is my problem ?
> 2. At the moment, is the Bio::DB::EUtilities package really  
> efficient or
> using directly the NCBI eutilities with the LWP::Simple package  
> could be an
> good alternative ?
>
> Many thanks in advance,
> Best Regards,
> Anthony Ferrari

First things first: at the moment the BioPerl EUtilities interface is  
very experimental (as specifically outlined in the POD), so I can't  
really recommend it for production use until the API is cleaned up.   
However, I do appreciate any feedback or comments re:EUtilities (the  
reason it's out there in the 1.5.2 release).

You might check out this bug report, which relates directly to your  
issue:

http://bugzilla.open-bio.org/show_bug.cgi?id=2109

After I worked out the proxy issue Torsten got it working.  Let me  
know if this doesn't help or fix the problem.

chris


From cain at cshl.edu  Tue Dec 19 10:31:50 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 19 Dec 2006 10:31:50 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <45880167.9010605@sendu.me.uk>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
Message-ID: <1166542310.6981.119.camel@localhost.localdomain>

I really don't think the BioPerl version detection is wrong.  I actually
don't check Bio::Root::Version::VERSION in Makefile.PL, I check
Bio::Graphics::Panel->api_version.  When it doesn't find the correct
api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
have seen this happen when more than one BioPerl instance is installed
and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
try reinstalling BioPerl and providing the --uninst 1 argument to remove
older versions of BioPerl:

  sudo ./Build install --uninst 1

Scott


On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> Stephan Roessner wrote:
> > Dear support team,
> > 
> > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> > gbrowse.
> > The installation seems to work (except of the test failures) but the
> > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> > of course it requires 1.52.
> > 
> > Is there a chance to find out what went wrong?
> 
> Nothing went wrong with the Bioperl installation (well, expect there 
> shouldn't have been any test failures - can you post those please?). 
> gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
> me exactly where you downloaded gbrowse from and how you went about 
> installing it, and provide the exact, complete error message you got 
> from it, I might be able help the authors fix the problem.
> 
> Or I'm pretty sure they can figure it our for themselves :)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/67132cb3/attachment-0003.bin>

From ferraria at gmail.com  Tue Dec 19 12:06:31 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 18:06:31 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <b2ec54b90612190906s2b4ddbf8g9b591372a85fdcd@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From stewarta at nmrc.navy.mil  Tue Dec 19 13:49:57 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Tue, 19 Dec 2006 13:49:57 -0500
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>

I see that Bio::Tools::Glimmer documentation clearly states that this  
module is intended for use with GlimmerM (eukaryotic version) only.   
I am wondering if anyone can recall any talk about adopting  
Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?   
I've searched the list history with little luck other than someone  
else  asking a similar question.

If not, does anyone have any thoughts on how difficult it might be to  
implement support for glimmer2/3 result parsing?  Perhaps just a  
matter of editing the _parse_predictions method?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From rvosa at sfu.ca  Tue Dec 19 13:53:47 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 10:53:47 -0800
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/276348b7/attachment-0002.pl>

From cjfields at uiuc.edu  Tue Dec 19 14:31:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 13:31:17 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu>


On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote:

> I see that Bio::Tools::Glimmer documentation clearly states that this
> module is intended for use with GlimmerM (eukaryotic version) only.
> I am wondering if anyone can recall any talk about adopting
> Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?
> I've searched the list history with little luck other than someone
> else  asking a similar question.

There is a thread here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ 
focus=12546

> If not, does anyone have any thoughts on how difficult it might be to
> implement support for glimmer2/3 result parsing?  Perhaps just a
> matter of editing the _parse_predictions method?

It depends on how different the various Glimmer formats are; I'll  
have to look at the ones Torsten added in CVS.  You could always try  
modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM  
reports, but based on the mail list thread above it may not be so  
straightforward.

chris


From MEC at stowers-institute.org  Tue Dec 19 14:57:48 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 19 Dec 2006 13:57:48 -0600
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
Message-ID: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>

Lincoln and fellow Bio::DB::SeqFeature travelers,

I find that using bp_seqfeature_load.PLS to load subfeatures of genes
already loaded using bp_seqfeature_load.PLS fails with 

------------- EXCEPTION  -------------
MSG: FBgn0017545 doesn't have a primary id
STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
STACK toplevel
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
ad.PLS:76

Where FBgn0017545 is the ID of a gene previously loaded.

I am unsure how to remedy my situation and welcome any advise on correct
or improved approach to my problem.

Here's some detail if it helps.  I am developing a pipeline to design a
microarray probes capable of distinguishing among splice variants in
drosophila (using latest Flybase dmel_r5.1 annotation).  So I

1) load a filtered selection of Flybase annotation using
bp_seqfeature_load.  (for testing purposes, I am using a single gene's
worth of annotation, FBgn0017545.gff, attached).  This is done as
follows:

	> bp_seqfeature_load.PLS  --create FBgn0017545.gff 

2) analyze all the genes in the database, and create GFF3 output each
feature of which has a 'Parent' that is a previously loaded gene (i.e.
FBgn0017545).  (These features represent the unique introns, splice
sites, and exonic design targets. Output of this analysis,
FBgn0017545_matd.gff, is also attached)

3) load these analysis results into the same database, as follows:

	> bp_seqfeature_load.PLS          FBgn0017545_matd.gff

It is at this point that I get the above error.

However, I don't get any error and the data loads fine if I load the two
files together, as follows:

	> bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
FBgn0017545_matd.gff)

So, I suspect that either I am misunderstanding when/how to use
bp_seqfeature_load.PLS or else this use case has not yet arisen and must
be provided for somehow.

I am running against bioperl-live

Thanks for your thoughts and assistance,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

From Kevin.M.Brown at asu.edu  Tue Dec 19 16:46:19 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 19 Dec 2006 14:46:19 -0700
Subject: [Bioperl-l] Bio::SimpleAlign
Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>

I'm working on a script that plays around with alignments of sequences
and one of the things I noticed is that the code for the match method
does not seem to actually use the start/end information when creating
the match between objects in the alignment.  Maybe I'm misunderstanding
what the alignment is supposed to hold in terms of sequence.  The
alignment objects I build up are based on the sequence of a gene and the
sequences of the primers that amplify that gene.

$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[0]->seq(),
				-id    => $seq[0]->id(),
				-start => $start,
				-end => $start + $seq[0]->length() - 1,
				-strand => 1
			 )
);
$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[1]->seq(),
				-id    => $seq[1]->id(),
				-start => $stop,
				-end => $stop + $seq[1]->length() - 1,
				-strand => -1
				)
);

So, you can see I input a start and stop point for the primer, but when
I use the match function all it does is match the first character of the
gene sequence to the first char of the primer sequences, then the second
gene char to the second in each primer, etc...  This doesn't seem to fit
with the documentation and seems odd that there would be holders for the
start/stop points and not use them when doing things like matching of
sequences in an alignment.


From bix at sendu.me.uk  Tue Dec 19 17:01:22 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 22:01:22 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
Message-ID: <45886132.7050505@sendu.me.uk>

Rutger Vos wrote:
> Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird
> version-string-translation way that makes 5.5 and 5.005 equivalent also?

Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 
5.500 however.


From lstein at cshl.edu  Tue Dec 19 16:58:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 19 Dec 2006 16:58:24 -0500
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
In-Reply-To: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com>

Hi Malcom,

Your second guess was right. The use case of augmenting an existing gene
with additional splice forms isn't provided for. You can get the
functionality by making direct calls to Bio::DB::SeqFeature::Store methods:

my @genes = $db->get_features_by_name('FBgn0017545');
@genes == 1 or die "Didn't get exactly one gene";
my $parent = $genes[0];

my $parent = $genes[0];
my $chr    = $parent->seq_id;
my $start  = $parent->start;
my $end    = $parent->end;
my $strand = $parent->strand;

my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA',
                       -source      => 'added',
                       -seq_id   => '4r',
                       -strand   => $strand,
                       -start    => $start+10,
                       -end      => $end,
                       );
$parent->add_SeqFeature($new_splice_form);

for my $pos ([$start+10,$start+100],[$start+200,$end]) {
  my ($e_start,$e_end) = @$pos;
  my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon',
                                      -store       => $db,
                      -seq_id      => '4r',
                      -strand     => $strand,
                      -start       => $e_start,
                      -end         => $e_end);
  $new_splice_form->add_SeqFeature($exon);
}

I found a bug in updating the seqfeature database when I wrote this script,
so you'll have to get the latest biperl live. I think you can use this to
write a splice form updating script.

In order to support the idea of adding new splice forms to an existing gene
using the GFF3 format, I will have to either modify the loader, or write a
separate script (probably better to do the latter). It shouldn't be hard if
you'd like to give it a try.

Lincoln

On 12/19/06, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln and fellow Bio::DB::SeqFeature travelers,
>
> I find that using bp_seqfeature_load.PLS to load subfeatures of genes
> already loaded using bp_seqfeature_load.PLS fails with
>
> ------------- EXCEPTION  -------------
> MSG: FBgn0017545 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
> ad.PLS:76
>
> Where FBgn0017545 is the ID of a gene previously loaded.
>
> I am unsure how to remedy my situation and welcome any advise on correct
> or improved approach to my problem.
>
> Here's some detail if it helps.  I am developing a pipeline to design a
> microarray probes capable of distinguishing among splice variants in
> drosophila (using latest Flybase dmel_r5.1 annotation).  So I
>
> 1) load a filtered selection of Flybase annotation using
> bp_seqfeature_load.  (for testing purposes, I am using a single gene's
> worth of annotation, FBgn0017545.gff, attached).  This is done as
> follows:
>
>         > bp_seqfeature_load.PLS  --create FBgn0017545.gff
>
> 2) analyze all the genes in the database, and create GFF3 output each
> feature of which has a 'Parent' that is a previously loaded gene (i.e.
> FBgn0017545).  (These features represent the unique introns, splice
> sites, and exonic design targets. Output of this analysis,
> FBgn0017545_matd.gff, is also attached)
>
> 3) load these analysis results into the same database, as follows:
>
>         > bp_seqfeature_load.PLS          FBgn0017545_matd.gff
>
> It is at this point that I get the above error.
>
> However, I don't get any error and the data loads fine if I load the two
> files together, as follows:
>
>         > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
> FBgn0017545_matd.gff)
>
> So, I suspect that either I am misunderstanding when/how to use
> bp_seqfeature_load.PLS or else this use case has not yet arisen and must
> be provided for somehow.
>
> I am running against bioperl-live
>
> Thanks for your thoughts and assistance,
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From rvosa at sfu.ca  Tue Dec 19 23:23:20 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 20:23:20 -0800
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/17ec7ff3/attachment-0002.pl>

From cjfields at uiuc.edu  Wed Dec 20 01:16:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 00:16:47 -0600
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>


On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:

> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the  
> NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more  
> loosely
> an OTU).
>
> The object would primarily function as a way to relate nodes in  
> trees to
> sequences in an alignment (a foreign key that both nodes and  
> sequences refer
> to), and secondarily as the keeper of the canonical name of the  
> OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node  
> named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer  
> to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).

Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence  
objects; at the moment LocatableSeqs don't store their own annotation  
but they could easily be made or subclassed to be AnnotatableI (i.e.  
they can store annotation collections).  I recently made SimpleAlign  
Annotatable; Jason has also made SimpleAlign implement  
FeatureHolderI, so alignments can store SeqFeatures as well; he may  
have his own designs here.

There may be a wide variety of ways to go about this.  I would  
probably do the following (bear in mind I'm a microbiologist, not a  
computer scientist).  If one could add trees as annotation to the  
alignment (i.e. if trees could be Annotation objects and kept in the  
SimpleAlign's annotation collection), and each sequence in the  
alignment contained reference to a node object of the tree (i.e. if  
Bio::Taxon/Bio::Species objects could also be Annotation objects, but  
kept in a LocatableSeq annotation collection), both could refer to  
the same node object.  This may not be exactly what you want, but  
maybe it's close?

SimpleAlign->AnnoColln->Tree->OTU(Nodes)
    \----->LocSeqs-->AnnoColln-----/

I suppose this could also be done with Seqfeatures...

> I was thinking that a (possibly expanded) Bio::Species might work  
> if there
> was some sensible way of appending references to node and sequence  
> objects
> to it (or otherwise associate them with each other), but I am  
> writing *to
> solicit any and all suggestions*. I am looking for something  
> similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos

Sendu would be the best one to speak about Bio::Taxon and  
Bio::Species and may have some ideas on the above.  The current plan  
was to deprecate Bio::Species, but who knows?

chris


From heikki at sanbi.ac.za  Wed Dec 20 05:25:08 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 20 Dec 2006 12:25:08 +0200
Subject: [Bioperl-l] Bio::SimpleAlign
In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
Message-ID: <200612201225.08862.heikki@sanbi.ac.za>

Kevin,

Sequences that are added to the alignment are supposed to be *aligned*. 
SimpleAlign does not do it for you. It seems to me that you are adding 
sequences like this:

nnnnnnnnnnnnnnnnnnnn  1 - 20, "a short gene" 
nnnnnn               21 - 26 "a short primer after the gene"

when you should be doing this

nnnnnnnnnnnnnnnnnnnn        1 - 20, "a short gene" 
--------------------nnnnnn 21 - 26 "a short primer after the gene"

Note that the default way of displaying names in SimpleAlign 
is "name/start-end". The name usually are expected to refer to the sequence 
from which this subsequence is derived from. The displayname does not change 
if you add gaps.


Yours,
	-Heikki


On Tuesday 19 December 2006 23:46, Kevin Brown wrote:
> I'm working on a script that plays around with alignments of sequences
> and one of the things I noticed is that the code for the match method
> does not seem to actually use the start/end information when creating
> the match between objects in the alignment.  Maybe I'm misunderstanding
> what the alignment is supposed to hold in terms of sequence.  The
> alignment objects I build up are based on the sequence of a gene and the
> sequences of the primers that amplify that gene.
>
> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[0]->seq(),
> 				-id    => $seq[0]->id(),
> 				-start => $start,
> 				-end => $start + $seq[0]->length() - 1,
> 				-strand => 1
> 			 )
> );

If your sequence does not contain gaps and the numbering starts from one, you 
can let the object handle start/stop:

my $a = new Bio::LocatableSeq(
      -seq   => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
      -id    => 'A00001',
      -strand => 1
}


> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[1]->seq(),
> 				-id    => $seq[1]->id(),
> 				-start => $stop,
> 				-end => $stop + $seq[1]->length() - 1,
> 				-strand => -1
> 				)
> );
>
> So, you can see I input a start and stop point for the primer, but when
> I use the match function all it does is match the first character of the
> gene sequence to the first char of the primer sequences, then the second
> gene char to the second in each primer, etc...  This doesn't seem to fit
> with the documentation and seems odd that there would be holders for the
> start/stop points and not use them when doing things like matching of
> sequences in an alignment.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From ferraria at gmail.com  Wed Dec 20 06:04:16 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 12:04:16 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
Message-ID: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>

On 19/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:
>
> > Hi all,
> >
> > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake
> > machine with
> > the cpan shell.
> > I want to use the Bio::DB::EUtilities to retrieve data (id's) from
> > NCBI
> > 'gene' database (first step of my pipeline).
> >
> > But the installation of this package doesn't seem to be correct :
> > The simple example given on the documentation doesn't work. (this
> > one :
> > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
> >
> > Here is the error message I got :
> > "Can't use an undefined value as an ARRAY reference at
> > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > In the UserAgent package, line 779 is in the private "_need_proxy"
> > subroutine and corresponds to the code :    ...if (@{ $self->
> > {'no_proxy'} })
> > ...
> >
> > If I comment this line in the UserAgent package and the
> > corresponding "}",
> > the example works. But obviously, I prefer to solve the problem in
> > a regular
> > way :)
> >
> > Indeed, my computer accesses the internet via a http proxy and I
> > didn't tell
> > this to BioPerl at any moment.
> > As I read on the BioPerl Wiki site, I tried to configure an
> > $http_proxy
> > environment variable but it still doesn't work.
> >
> > One last maybe important information is that I saw during the
> > installation
> > that the tests 't/EUtilities' were skipped because of an unrevealed
> > reason.
> >
> >
> > So finally I got two questions :
> > 1. Is there somebody who can figure out what is my problem ?
> > 2. At the moment, is the Bio::DB::EUtilities package really
> > efficient or
> > using directly the NCBI eutilities with the LWP::Simple package
> > could be an
> > good alternative ?
> >
> > Many thanks in advance,
> > Best Regards,
> > Anthony Ferrari
>
> First things first: at the moment the BioPerl EUtilities interface is
> very experimental (as specifically outlined in the POD), so I can't
> really recommend it for production use until the API is cleaned up.
> However, I do appreciate any feedback or comments re:EUtilities (the
> reason it's out there in the 1.5.2 release).
>
> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>


I carefully read this bug but that doesn't help because this has already
been modified in the now given GenericWebDBI.pm
So my problem does not come from a deep recursion loop.

As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w
t/EUtilities.t " to see what's really happening.
And actually, all tests are skipped because of the same message error
-> "Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

***
I tried the same command with the modified LWP::UserAgent package (which
means I comment the line 779 and the corresponding '}') and all 453 tests
passed.
But not always. I made the tests several times and  it often failed. And
always on a test called "eXXX->cookie->cookie() query key" (ending with
query key). In those cases, I got back a html message indicating that the
error was thrown by the internal sever of NCBI. So I guess that sometimes it
is just NCBI server fault (internal problem), and BioPerl is not implied..
But once more, I comment a line from a basic package so it is a bit
hazardous.
***

tony - a little bit lost.


From smane at vbi.vt.edu  Tue Dec 19 14:46:56 2006
From: smane at vbi.vt.edu (Shrinivasrao P. Mane)
Date: Tue, 19 Dec 2006 14:46:56 -0500
Subject: [Bioperl-l] Using Muscle parameter within bioperl
Message-ID: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>

Hi,
I need to run muscle using bioperl. This is how I do it in command line.

muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet

I used the following in perl script

my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');

The program runs and produces the result file but it doesn't create a  
log file nor does it stop sending output to STDOUT (-quiet).
Could anybody help me with this?
Thanks
Mane


From cjfields at uiuc.edu  Wed Dec 20 09:09:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 08:09:56 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>
>
> I carefully read this bug but that doesn't help because this has  
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
>
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/ 
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ 
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package  
> (which means I comment the line 779 and the corresponding '}') and  
> all 453 tests passed.
> But not always. I made the tests several times and  it often  
> failed. And always on a test called "eXXX->cookie->cookie() query  
> key" (ending with query key). In those cases, I got back a html  
> message indicating that the error was thrown by the internal sever  
> of NCBI. So I guess that sometimes it is just NCBI server fault  
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit  
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies.

EUtilities is set up to check for an env. proxy and also take a set  
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy  
to say this was a bug in LWP, but I think the problem is that  
something is undefined (i.e. an env. variable), or username/password.

 From the bug report, Torsten set his proxy variables using the  
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy.
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference.   
After the recursion fix, I'm assuming he made no changes to the env.  
settings, and according to the bug everything was fine (is that  
correct Tortsen?).

Also LWP::UserAgent has this:

--------------------------------------
"Load proxy settings from *_proxy environment variables. You might  
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy=http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these  
environment variables.

On systems with case insensitive environment variables there exists a  
name clash between the CGI environment variables and the HTTP_PROXY  
environment variable normally picked up by env_proxy(). Because of  
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY  
environment variable can be used instead."
--------------------------------------

chris


From bix at sendu.me.uk  Wed Dec 20 09:08:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:08:16 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <458943D0.10400@sendu.me.uk>

Shrinivasrao P. Mane wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
> 
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
> 
> I used the following in perl script
> 
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
> 
> The program runs and produces the result file but it doesn't create a  
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?

The Muscle arguments don't take dashed args. To make switches active you 
need to set them to some true value. So (-verbose => 1, quiet => 1, log 
=> 'inv.log'). Verbose may not do what you want since it is both a 
Bioperl option and a Muscle option; if you want the latter try using 
verbose => 1.


From bix at sendu.me.uk  Wed Dec 20 09:51:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:51:33 +0000
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
	<4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
Message-ID: <45894DF5.1060503@sendu.me.uk>

Chris Fields wrote:
> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I am looking for a bioperl object that can be abused to function as
>> a suitable 'taxon' object, where I mean 'taxon' as understood by
>> the NEXUS file format (i.e. not strictly an entity from a taxonomy,
>> but more loosely an OTU).
>> 
>> The object would primarily function as a way to relate nodes in 
>> trees to sequences in an alignment (a foreign key that both nodes
>> and sequences refer to), and secondarily as the keeper of the
>> canonical name of the OTU, such that a sequence named
>> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens
>> (constrained monophyly)' can still be understood to refer to the 
>> same thing - the OTU 'Homo sapiens sapiens' (for example).

I haven't had time to give your suggestions consideration, but I can say 
that I'm having to do the same thing for a bioperl-run module and my 
work-around is simply to set a custom name on my Bio::Taxon objects. To 
explain, I have the benefit that my tree is made up of Bio::Taxon 
objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to 
know which of my sequences corresponds to a particular taxon, I work out 
which of them has the id given by shift @{$taxon->name('seq_id')}.

Hardly ideal, but it works for now.


>> I was thinking that a (possibly expanded) Bio::Species might work
>>  if there was some sensible way of appending references to node and
>> sequence objects to it (or otherwise associate them with each
>> other), but I am writing *to solicit any and all suggestions*. I am
>> looking for something similar to Bio::Phylo::Taxa::Taxon.
>
> Sendu would be the best one to speak about Bio::Taxon and 
> Bio::Species and may have some ideas on the above.  The current plan
> was to deprecate Bio::Species, but who knows?

Given that we do plan to deprecate Bio::Species, I'd resist the 
temptation to expand on it. Use Bio::Taxon as a base if it has stuff you 
need, or base straight from Bio::Tree::Node if not.


From ferraria at gmail.com  Wed Dec 20 10:40:34 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 16:40:34 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
	<13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
Message-ID: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>

Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!)


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
>
> > You might check out this bug report, which relates directly to your
> > issue:
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> >
> > After I worked out the proxy issue Torsten got it working.  Let me
> > know if this doesn't help or fix the problem.
> >
> > chris
> >
> >
> > I carefully read this bug but that doesn't help because this has
> > already been modified in the now given GenericWebDBI.pm
> > So my problem does not come from a deep recursion loop.
> >
> > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > EUtilities.t " to see what's really happening.
> > And actually, all tests are skipped because of the same message error
> > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > ***
> > I tried the same command with the modified LWP::UserAgent package
> > (which means I comment the line 779 and the corresponding '}') and
> > all 453 tests passed.
> > But not always. I made the tests several times and  it often
> > failed. And always on a test called "eXXX->cookie->cookie() query
> > key" (ending with query key). In those cases, I got back a html
> > message indicating that the error was thrown by the internal sever
> > of NCBI. So I guess that sometimes it is just NCBI server fault
> > (internal problem), and BioPerl is not implied..
> > But once more, I comment a line from a basic package so it is a bit
> > hazardous.
> > ***
> >
> > tony - a little bit lost.
>
> I'm cc'ing Torsten as he has a bit more experience with proxies.
>
> EUtilities is set up to check for an env. proxy and also take a set
> proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> to say this was a bug in LWP, but I think the problem is that
> something is undefined (i.e. an env. variable), or username/password.
>
> From the bug report, Torsten set his proxy variables using the
> following:
>
> --------------------------------------
> "Note: I am behind an _authenticating_ proxy.
> My $http_proxy and $HTTP_PROXY are both set to
> http://USER:PASS at proxy.monash.edu.au:80/"
> --------------------------------------
>
> Note the lowercase for $http_proxy, which can make a difference.
> After the recursion fix, I'm assuming he made no changes to the env.
> settings, and according to the bug everything was fine (is that
> correct Tortsen?).
>
> Also LWP::UserAgent has this:
>
> --------------------------------------
> "Load proxy settings from *_proxy environment variables. You might
> specify proxies like this (sh-syntax):
>
>        gopher_proxy=http://proxy.my.place/
>        wais_proxy=http://proxy.my.place/
>        no_proxy="localhost,my.domain"
>        export gopher_proxy wais_proxy no_proxy
>
>      csh or tcsh users should use the setenv command to define these
> environment variables.
>
> On systems with case insensitive environment variables there exists a
> name clash between the CGI environment variables and the HTTP_PROXY
> environment variable normally picked up by env_proxy(). Because of
> this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> environment variable can be used instead."
> --------------------------------------
>
> chris
>


From cjfields at uiuc.edu  Wed Dec 20 11:10:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 10:10:48 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine>

Just to clarify: does it work it you don't have any proxy env. settings?
 
chris


  _____  

From: Anthony Ferrari [mailto:ferraria at gmail.com] 
Sent: Wednesday, December 20, 2006 9:41 AM
To: Chris Fields
Cc: bioperl-l List; Torsten Seemann
Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy


Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!) 


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote: 


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
> 
>
> I carefully read this bug but that doesn't help because this has
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
> 
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error 
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package 
> (which means I comment the line 779 and the corresponding '}') and
> all 453 tests passed.
> But not always. I made the tests several times and  it often
> failed. And always on a test called "eXXX->cookie->cookie() query 
> key" (ending with query key). In those cases, I got back a html
> message indicating that the error was thrown by the internal sever
> of NCBI. So I guess that sometimes it is just NCBI server fault 
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies. 

EUtilities is set up to check for an env. proxy and also take a set
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
to say this was a bug in LWP, but I think the problem is that
something is undefined ( i.e. an env. variable), or username/password.

>From the bug report, Torsten set his proxy variables using the
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy. 
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference. 
After the recursion fix, I'm assuming he made no changes to the env.
settings, and according to the bug everything was fine (is that
correct Tortsen?).

Also LWP::UserAgent has this:

-------------------------------------- 
"Load proxy settings from *_proxy environment variables. You might
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy= http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these 
environment variables.

On systems with case insensitive environment variables there exists a
name clash between the CGI environment variables and the HTTP_PROXY
environment variable normally picked up by env_proxy(). Because of 
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
environment variable can be used instead."
--------------------------------------

chris


From ferraria at gmail.com  Wed Dec 20 11:59:49 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 17:59:49 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine>
References: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
	<007901c72451$6ad540a0$15327e82@pyrimidine>
Message-ID: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>

First, I got a $http_proxy env. variable automatically defined by the
BioPerl installation (I don't define and export it in my .bash_profile).
So when I'm logging in,             $http_proxy=http://ip_adress:port/

Next step :  two solutions :
1) defining an $no_proxy env.variable in my .bash_profile.
It can be set to 'whatever'.

2) If I do not define '$no_proxy'; to make it work, I must call the
no_proxy() method on each Bio::DB::EUtilities object I create before I can
call the get_response() method on it.

(The bug is in the 'get_response' call)

And finally without 1) or 2) it doesn't work.

Tony

On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>  Just to clarify: does it work it you don't have any proxy env. settings?
>
One thing I didn't point out previously is that Bio::DB::GenericWebDBI
> inherits LWP::UserAgent.  You should be able to use $eutil->no_proxy()
> instead of setting it in your env.
> chris
>
>  ------------------------------
> *From:* Anthony Ferrari [mailto:ferraria at gmail.com]
> *Sent:* Wednesday, December 20, 2006 9:41 AM
> *To:* Chris Fields
> *Cc:* bioperl-l List; Torsten Seemann
> *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy
>
> Defining a "no_proxy" environment variable in my '.bashrc' file solved my
> problem. I set it to "localhost".
>
> It indeed corresponds to the line...       [    ...if (@{
> $self->{'no_proxy'} }) ...    ]   (I guess!)
>
>
> I really don't know why we are compelled to do this, but let's say that's
> the way it is.
>
> It works now !
>
> Thanks a lot.
>
> Tony
>
>
>
>
> On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> >
> > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
> >
> > > You might check out this bug report, which relates directly to your
> > > issue:
> > >
> > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> > >
> > > After I worked out the proxy issue Torsten got it working.  Let me
> > > know if this doesn't help or fix the problem.
> > >
> > > chris
> > >
> > >
> > > I carefully read this bug but that doesn't help because this has
> > > already been modified in the now given GenericWebDBI.pm
> > > So my problem does not come from a deep recursion loop.
> > >
> > > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > > EUtilities.t " to see what's really happening.
> > > And actually, all tests are skipped because of the same message error
> > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> > >
> > > ***
> > > I tried the same command with the modified LWP::UserAgent package
> > > (which means I comment the line 779 and the corresponding '}') and
> > > all 453 tests passed.
> > > But not always. I made the tests several times and  it often
> > > failed. And always on a test called "eXXX->cookie->cookie() query
> > > key" (ending with query key). In those cases, I got back a html
> > > message indicating that the error was thrown by the internal sever
> > > of NCBI. So I guess that sometimes it is just NCBI server fault
> > > (internal problem), and BioPerl is not implied..
> > > But once more, I comment a line from a basic package so it is a bit
> > > hazardous.
> > > ***
> > >
> > > tony - a little bit lost.
> >
> > I'm cc'ing Torsten as he has a bit more experience with proxies.
> >
> > EUtilities is set up to check for an env. proxy and also take a set
> > proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> > to say this was a bug in LWP, but I think the problem is that
> > something is undefined ( i.e. an env. variable), or username/password.
> >
> > From the bug report, Torsten set his proxy variables using the
> > following:
> >
> > --------------------------------------
> > "Note: I am behind an _authenticating_ proxy.
> > My $http_proxy and $HTTP_PROXY are both set to
> > http://USER:PASS at proxy.monash.edu.au:80/"
> > --------------------------------------
> >
> > Note the lowercase for $http_proxy, which can make a difference.
> > After the recursion fix, I'm assuming he made no changes to the env.
> > settings, and according to the bug everything was fine (is that
> > correct Tortsen?).
> >
> > Also LWP::UserAgent has this:
> >
> > --------------------------------------
> > "Load proxy settings from *_proxy environment variables. You might
> > specify proxies like this (sh-syntax):
> >
> >        gopher_proxy=http://proxy.my.place/
> >        wais_proxy= http://proxy.my.place/
> >        no_proxy="localhost,my.domain"
> >        export gopher_proxy wais_proxy no_proxy
> >
> >      csh or tcsh users should use the setenv command to define these
> > environment variables.
> >
> > On systems with case insensitive environment variables there exists a
> > name clash between the CGI environment variables and the HTTP_PROXY
> > environment variable normally picked up by env_proxy(). Because of
> > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> > environment variable can be used instead."
> > --------------------------------------
> >
> > chris
> >
>
>


From cjfields at uiuc.edu  Wed Dec 20 13:28:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 12:28:09 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>
Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine>


> First, I got a $http_proxy env. variable automatically 
> defined by the BioPerl installation (I don't define and 
> export it in my .bash_profile).
> So when I'm logging in,             $http_proxy=http://ip_adress:port/

BioPerl can't permanently set any env. variables out of the box since that
would mean modifying your local .bash_profile or the system profile.  If
you're a user on a system where you're not the sysadmin, then it's more
likely the sysadmin has set up user accounts with an already-defined
$http_proxy variable in the system .bash_profile (which is passed on to all
users).  

The problem I can see (going by what you have above) is there is no
username/password defined, only the address (IP:Port).  I am assuming LWP is
expecting some form of authentication when a proxy is env. defined w/o
username/password included.  If so, you'll have to supply those yourself,
either by redefining $http_proxy to include it in your local .bash_profile,

export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/'

by using $agent->proxy() for including all proxy information, or by using
$agent->authentication() so that a proxy can authorize any outgoing/incoming
requests.  The first may be preferrable if you are able to do so since you
wouldn't have to authenticate every agent.

Note that this would also explain why you had an LWP problem with an
undefined array ref: the LWP agent is likely expecting some form of
authentication, probably in the form [username, password], if a proxy env.
variable is found.

> Next step :  two solutions :
> 1) defining an $no_proxy env.variable in my .bash_profile.
> It can be set to 'whatever'.
> 
> 2) If I do not define '$no_proxy'; to make it work, I must call the
> no_proxy() method on each Bio::DB::EUtilities object I create 
> before I can call the get_response() method on it.
> 
> (The bug is in the 'get_response' call)

If you mean when the request is calling proxy_authorization_basic(), that's
not a bug.  If we didn't authorize then it likely wouldn't work for properly
set up proxies (Torsten's worked).  Note that it's supposed to be passing a
username/password from $self->authentication().  

The fact that you can set $no_proxy to anything suggests there is no proxy
in place.  
 
> And finally without 1) or 2) it doesn't work.
> 
> Tony

We can't guarantee that defining no_proxy will always work on your system,
either.  It's possible a proxy was added systemwide but a firewall hasn't
been put in place yet; once it goes up and all requests need to be
authorized, then you'll run into problems again.  Conversely, maybe this was
defined at some point systemwide in the .bash_profile but wasn't removed.
The only one who would know is the sysadmin.

If you aren't the sysadmin, you can contact them to find out about how to
properly set up your proxy, or whether it is even necessary (maybe they
neglected to remove the proxy definition from the system .bash_profile).
Who knows?  

chris


From bix at sendu.me.uk  Wed Dec 20 16:03:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 21:03:03 +0000
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
References: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <4589A507.60106@sendu.me.uk>

Chris Fields wrote:
>> First, I got a $http_proxy env. variable automatically 
>> defined by the BioPerl installation (I don't define and 
>> export it in my .bash_profile).
>> So when I'm logging in,             $http_proxy=http://ip_adress:port/
> 
> BioPerl can't permanently set any env. variables out of the box since

True, and it doesn't try to set one temporarily either.

To clarify some of the other points Chris made, the proxy variable 
certainly doesn't need username and password to be defined (from LWPs 
point of view), since not all proxies authenticate. Of course accesses 
won't work if authentication is actually required and these aren't set.

There's no reason that no_proxy should have to be set. It is used to say 
what domains shouldn't be proxied. Either this is a real LWP bug, or 
somehow EUtilities or one of its bases is doing something wrong. It 
should be investigated...

It would be very informative if Anthony could log in when he hasn't done 
anything to his environment variables (and so where the original problem 
manifests) and give us the results of:

perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }'


From avilella at gmail.com  Wed Dec 20 09:07:17 2006
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 20 Dec 2006 14:07:17 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com>

Try something like:

my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log');
my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params);

it works for me with muscle 3.6. The log only gives me a start,
commandstring and end. I dunno if that is what muscle is supposed to
spit out.

    Albert.

On 12/19/06, Shrinivasrao P. Mane <smane at vbi.vt.edu> wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
>
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
>
> I used the following in perl script
>
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
>
> The program runs and produces the result file but it doesn't create a
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?
> Thanks
> Mane
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 20 17:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 16:46:35 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <4589A507.60106@sendu.me.uk>
Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine>


> Chris Fields wrote:
> >> First, I got a $http_proxy env. variable automatically 
> defined by the 
> >> BioPerl installation (I don't define and export it in my 
> >> .bash_profile).
> >> So when I'm logging in,             
> $http_proxy=http://ip_adress:port/
> > 
> > BioPerl can't permanently set any env. variables out of the 
> box since
> 
> True, and it doesn't try to set one temporarily either.
> 
> To clarify some of the other points Chris made, the proxy 
> variable certainly doesn't need username and password to be 
> defined (from LWPs point of view), since not all proxies 
> authenticate. Of course accesses won't work if authentication 
> is actually required and these aren't set.
>
> There's no reason that no_proxy should have to be set. It is 
> used to say what domains shouldn't be proxied. Either this is 
> a real LWP bug, or somehow EUtilities or one of its bases is 
> doing something wrong. It should be investigated...

Actually, after some investigation I repeated the error and committed a fix.


If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error:

Can't use an undefined value as an ARRAY reference at
C:/Perl/lib/LWP/UserAgent.pm line 787.

It's EUtilities-specific as other WebAgents that have proxy settings do not
have the same problem, though I haven't checked any WebAgent-based classes.
I think this may also partly be an LWP bug as setting env_proxy to
TRUE/FALSE doesn't seem to have an effect, but instantiating with it
(env_proxy => 1) in the constructor fixes the problem.  Anthony, I have
committed a fix to CVS to GenericWebDBI and EUtilities.  Could you try it
out?

-chris


From cjfields at uiuc.edu  Wed Dec 20 18:19:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 17:19:59 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine>

> > First, I got a $http_proxy env. variable automatically 
> defined by the 
> > BioPerl installation (I don't define and export it in my 
> > .bash_profile).
> > So when I'm logging in,             
> $http_proxy=http://ip_adress:port/

Anthony,

Sorry about the prior long-winded response.  I managed to reproduce the
error about five minutes after I responded and managed to trace the problem
back to GenericWebDBI.  The issue seems to be with the LWP::UserAgent
env_proxy method not setting correctly post-instantiation; setting to 0 or 1
doesn't seem to do anything.  If I add it to the list of args for chained
instantiation in the constructor:

    my $self = $class->SUPER::new(@args, env_proxy => 1);

it suddenly works like a charm.  Hard to know why it's being fussy...

I'm going to try reproducing this on a few platforms and check to see if it
has been reported as an LWP bug.  I have also committed a fix to CVS if you
want to test it out.

Chris


From jnewcomer at jhu.edu  Wed Dec 20 20:56:10 2006
From: jnewcomer at jhu.edu (Joe Newcomer)
Date: Wed, 20 Dec 2006 20:56:10 -0500
Subject: [Bioperl-l]  a stupid question
Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu>

Hello Paul Leo,
I am with Johns Hopkins University Advanced Academic Programs.  I am trying
to contact a student named Paul Leo who has registered for Protein
Bioinformatics.  If this is you please email me.  I would like to send you
information about the spring course.

Respectfully, 
Joe Newcomer  (410) 516-5047
Online Education


From anhthu.tieu at gsf.de  Thu Dec 21 05:10:47 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:10:47 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5DA7.1010802@gsf.de>

Hi,

 I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From anhthu.tieu at gsf.de  Thu Dec 21 05:12:36 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:12:36 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5E14.8060409@gsf.de>

Hi,

I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From somil.sharma1 at gmail.com  Thu Dec 21 01:22:24 2006
From: somil.sharma1 at gmail.com (Somil Sharma)
Date: Thu, 21 Dec 2006 14:22:24 +0800
Subject: [Bioperl-l] problem
Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>

hello

*i  run this program*

*#!/use/bin/perl*

*use Bio::DB::GenBank;*

*$gb = new Bio::DB::GenBank;
$seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1;
*

*and got this error on cmd line--*

---------- *EXCEPTION  -------------
MSG: WebDBSeqI Request Error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
Content-Type: text/plain
Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
Client-Warning: Internal response*

*500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*

*STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
STACK Bio::DB::WebDBSeqI::get_Stream_by_id
C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
STACK toplevel C:\Perl\a2.pl:5*

plz see if u can help me out.

my ppm is also not able to install Bioperl so i did that also manually.

waiting for ur reply


From granjeau at tagc.univ-mrs.fr  Thu Dec 21 06:14:25 2006
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Thu, 21 Dec 2006 12:14:25 +0100
Subject: [Bioperl-l] BioFetch: Adding databases
Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr>

Hello!

I needed to query the Unisave database at EBI. Up to date, the only way 
to access it is the dbfetch web service 
(http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined 
in the BioFetch package 
(http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote 
these few lines to make it work, but I don't think it fits a good 
programming practice. May be it makes sense to defined a method to add 
databases to FORMATMAP, in order to follow the dbfetch service evolutions.

Cheers,
--Samuel

use Bio::DB::BioFetch;
$Bio::DB::BioFetch::FORMATMAP{unisave} = {
default   => 'swiss',
swissprot => 'swiss',
fasta     => 'fasta',
namespace => 'unisave',
};
my $bf = new Bio::DB::BioFetch(-db=>'unisave');
my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); 

print $seq->display_id();
print $seq->seq();


From cain at cshl.edu  Thu Dec 21 08:56:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 08:56:21 -0500
Subject: [Bioperl-l] problem
In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
Message-ID: <1166709381.3739.47.camel@localhost.localdomain>

Hello,

It looks to me like you have a networking problem that doesn't have
anything to do with BioPerl.  When I run your script, I get:

Bio::Seq::RichSeq=HASH(0x97013e0)

Fairly quickly, too.  Can you get to http://eutils.ncbi.nlm.nih.gov/ in
a browser without proxy settings?

As an aside, you probably don't really want the HASH stuff above, so I
modified your script to look like this, with warnings and strict to make
future debugging easier:

#!/use/bin/perl -w
use strict;

use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank;
my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1->seq;


Scott


On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote:
> hello
> 
> *i  run this program*
> 
> *#!/use/bin/perl*
> 
> *use Bio::DB::GenBank;*
> 
> *$gb = new Bio::DB::GenBank;
> $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
> print $seq1;
> *
> 
> *and got this error on cmd line--*
> 
> ---------- *EXCEPTION  -------------
> MSG: WebDBSeqI Request Error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
> Content-Type: text/plain
> Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
> Client-Warning: Internal response*
> 
> *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*
> 
> *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
> STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
> STACK Bio::DB::WebDBSeqI::get_Stream_by_id
> C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
> STACK toplevel C:\Perl\a2.pl:5*
> 
> plz see if u can help me out.
> 
> my ppm is also not able to install Bioperl so i did that also manually.
> 
> waiting for ur reply
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f63031e2/attachment-0003.bin>

From cjfields at uiuc.edu  Thu Dec 21 09:28:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 21 Dec 2006 08:28:07 -0600
Subject: [Bioperl-l] BioFetch: Adding databases
In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr>
References: <458A6C91.7090000@tagc.univ-mrs.fr>
Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu>

I've added this to the BioFetch FORMATMAP as 'unisave' and committed  
to CVS.  Thanks!

chris

On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> I needed to query the Unisave database at EBI. Up to date, the only  
> way
> to access it is the dbfetch web service
> (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet  
> defined
> in the BioFetch package
> (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote
> these few lines to make it work, but I don't think it fits a good
> programming practice. May be it makes sense to defined a method to add
> databases to FORMATMAP, in order to follow the dbfetch service  
> evolutions.
>
> Cheers,
> --Samuel
>
> use Bio::DB::BioFetch;
> $Bio::DB::BioFetch::FORMATMAP{unisave} = {
> default   => 'swiss',
> swissprot => 'swiss',
> fasta     => 'fasta',
> namespace => 'unisave',
> };
> my $bf = new Bio::DB::BioFetch(-db=>'unisave');
> my $seq = $bf->get_Seq_by_id('LAM1_MOUSE');
>
> print $seq->display_id();
> print $seq->seq();
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From anhthu.tieu at gsf.de  Thu Dec 21 09:31:45 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 15:31:45 +0100
Subject: [Bioperl-l] multiple glyph elements in one track
Message-ID: <458A9AD1.50907@gsf.de>

Hello,

 I use bioperl 1.5.2. Does anyone know how I could create two seperate 
glyph elements on the same track with the Bio::Graphics::Panel module? 
My aim is to have two (e.g. two different) clickable imagemap elements 
on the same track. Until now I can merely create two glyph elements 
(transcript2 or generic options) per track with only one imagemap 
element (e.g. the same imagemap element is used for the entire track as 
the entire (=both elements) glyph's coordinates are returned to the 
image_and_map function as one set of coordinate).

Thank you for your help.

Best regards,

Anh Thu


From cain at cshl.edu  Thu Dec 21 09:47:32 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 09:47:32 -0500
Subject: [Bioperl-l] multiple glyph elements in one track
In-Reply-To: <458A9AD1.50907@gsf.de>
References: <458A9AD1.50907@gsf.de>
Message-ID: <1166712453.3739.53.camel@localhost.localdomain>

Hello Anh Thu,

You can provide a callback for the glyph argument that returns different
glyphs depending on what you want to do (ie, how you've coded your
callback).  This example is from the perldoc for Bio::Graphics::Panel:

        $panel->add_track(\@exons,
                          -glyph => sub { my $feature = shift;
                                          $feature->source_tag eq ?curated?                                                    
                                                    ? ?ellipse? : ?generic?; }
                         );

Scott

 
On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote:
> Hello,
> 
>  I use bioperl 1.5.2. Does anyone know how I could create two seperate 
> glyph elements on the same track with the Bio::Graphics::Panel module? 
> My aim is to have two (e.g. two different) clickable imagemap elements 
> on the same track. Until now I can merely create two glyph elements 
> (transcript2 or generic options) per track with only one imagemap 
> element (e.g. the same imagemap element is used for the entire track as 
> the entire (=both elements) glyph's coordinates are returned to the 
> image_and_map function as one set of coordinate).
> 
> Thank you for your help.
> 
> Best regards,
> 
> Anh Thu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/9ec29c3e/attachment-0003.bin>

From cain.cshl at gmail.com  Thu Dec 21 15:03:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 21 Dec 2006 15:03:48 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz>
	<1166621113.3739.11.camel@localhost.localdomain>
	<1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz>
	<1166643051.3739.28.camel@localhost.localdomain>
	<1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
Message-ID: <1166731428.3739.71.camel@localhost.localdomain>

Hi Stephan,

About your bioperl mail: did you cancel it, or did it just disappear?
If the latter, I might have accidentally deleted it, sorry :-/

So 'GBrowse is running' means that you can see the sample yeast chr1
database, browse around, etc, right?  I still don't know what is up with
the warning but my guess is that everything is working there.

As for your question about writing a callback, the reason it's not
working is that the attributes method returns a list (typically but not
always with only one element), so what you are really doing in your test
is this "number of elements in the list > 1200", which is almost always
going to be false.  You should change it to this:

  my $feature = shift;
  my ($score) = $feature->attributes('score');
  if ($score > 1200) {
  ...etc...

Finally, if you really want to test that you are using the correct
bioperl, you can put this simple cgi in your cgi-bin directory as
test_biographics.pl, set it as world executable and go to
http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and
warnings even when the script is only 10 lines long :-)  :

#!/usr/bin/perl
use strict;
use warnings;
use Bio::Graphics::Panel;
use CGI qw/:standard/;

print header(),
      start_html,
      p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version),
      p("It should be 1.654 for BioPerl 1.5.2"),
      end_html;

Scott


On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote:
> Hi Scott,
> 
> responded to group but did get through.
> So I reply back to you.
> 
> I installed Class-Base-0.03 using CPAN.
> 
> Reinstalling GBrowse gives me still a warning like:
> Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021.
> Writing Makefile for Bio::Graphocs::Browser::CAlign
> Writing Makefile for Generic-Genome-Browser.
> 
> GBrowse is running but I cannot access attributes and/or the score column
> of .gff files. Is this related to the warning?
> 
> To get an attribute I use
> 
> my $feature = shift;
>                 if ($feature->attributes('score') > 1200) {
>                   return 'blue';
>                 } else {
>                   return 'pink';
>                 }
> But I retrieve not data using $feature->
> 
> Can I somehaow verify what bioperl version GBrowse is using?
> 
> Stephan,
> 
> 
> 
> Quoting Scott Cain <cain.cshl at gmail.com>:
> 
> > Stephan,
> >
> > Yes, it is in cpan:
> >
> > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm
> >
> > The cpan shell should be able to install it.
> >
> > Whether or not that works, please respond to the mailing list so that
> > the rest of the conversation can be archived.
> >
> > Scott
> >
> >
> > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote:
> > > Hi Scott,
> > >
> > > No I didn't.
> > > I had a look and couldn't find it.
> > > It is not part of CPAN?
> > >
> > > Stephan
> > >
> > >
> > > Quoting Scott Cain <cain.cshl at gmail.com>:
> > >
> > > > Stephan,
> > > >
> > > > Did you install Class::Base?  It was inadvertantly left out the
> > > > install
> > > > document, but is required.
> > > >
> > > > Scott
> > > >
> > > >
> > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote:
> > > > > Hi all,
> > > > >
> > > > > I did sudo ./Build install --uninst 1 and got the error
> > > > > * ERROR: Confiduration was initially created with MOdule::Build
> > > > version
> > > > > '0.2805', but we are now using '0.2806'. ...
> > > > >
> > > > > So I ran perl Build.PL and got the message
> > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'.
> > > > >
> > > > > I did run sudo ./Build install --uninst 1 again.
> > > > > Seems to be fine with no error messages.
> > > > >
> > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in
> > > > >
> > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have
> > 1.0050021.
> > > > > Warning: prerequisite Class::Base 0 not found.
> > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign
> > > > > Writing Makefile for Generic-Genome-Browser
> > > > >
> > > > > GBrowse is running but I have really troubles with aggregators
> > trying
> > > > to
> > > > > use xyplot. It does not plot anything. So I thought the bioperl
> > could
> > > > be
> > > > > the problem.
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > Quoting Scott Cain <cain at cshl.edu>:
> > > > >
> > > > > > I really don't think the BioPerl version detection is wrong.
> > I
> > > > > > actually
> > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I
> > check
> > > > > > Bio::Graphics::Panel->api_version.  When it doesn't find the
> > > > correct
> > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not
> > installed.
> > > >  I
> > > > > > have seen this happen when more than one BioPerl instance is
> > > > installed
> > > > > > and `perl Makefile.PL` finds the wrong one first.  My
> > suggestion is
> > > > to
> > > > > > try reinstalling BioPerl and providing the --uninst 1 argument
> > to
> > > > > > remove
> > > > > > older versions of BioPerl:
> > > > > >
> > > > > >   sudo ./Build install --uninst 1
> > > > > >
> > > > > > Scott
> > > > > >
> > > > > >
> > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> > > > > > > Stephan Roessner wrote:
> > > > > > > > Dear support team,
> > > > > > > >
> > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be
> > able
> > > > to
> > > > > > use
> > > > > > > > gbrowse.
> > > > > > > > The installation seems to work (except of the test
> > failures)
> > > > but
> > > > > > the
> > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is
> > > > > > installed, but
> > > > > > > > of course it requires 1.52.
> > > > > > > >
> > > > > > > > Is there a chance to find out what went wrong?
> > > > > > >
> > > > > > > Nothing went wrong with the Bioperl installation (well,
> > expect
> > > > there
> > > > > > > shouldn't have been any test failures - can you post those
> > > > please?).
> > > > > > > gbrowse simply defined its Bioperl requirement incorrectly.
> > If
> > > > you
> > > > > > tell
> > > > > > > me exactly where you downloaded gbrowse from and how you
> > went
> > > > about
> > > > > > > installing it, and provide the exact, complete error message
> > you
> > > > got
> > > > > > > from it, I might be able help the authors fix the problem.
> > > > > > >
> > > > > > > Or I'm pretty sure they can figure it our for themselves :)
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > --
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > Scott Cain, Ph. D.
> > > > > > cain at cshl.edu
> > > > > > GMOD Coordinator (http://www.gmod.org/)
> > > > > > 216-392-3087
> > > > > > Cold Spring Harbor Laboratory
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > --
> > > >
> > ------------------------------------------------------------------------
> > > > Scott Cain, Ph. D.
> > > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > >
> > >
> > >
> > >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f8621965/attachment-0003.bin>

From rvosa at sfu.ca  Sat Dec 23 17:17:37 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sat, 23 Dec 2006 14:17:37 -0800
Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <458DAB01.6080200@sfu.ca>

The replies I've received so far indicate I should look into Bio::Taxon. 
I will probably come back with further questions/discussions as to how 
to link and cross reference taxa, sequences and  nodes, but for now I 
should first look at the Bio::Taxon api (and unpack my other Christmas 
gifts). Thank you for all comments and suggestions.

Happy holidays!

Rutger


Rutger Vos wrote:
> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more loosely
> an OTU). 
>
> The object would primarily function as a way to relate nodes in trees to
> sequences in an alignment (a foreign key that both nodes and sequences refer
> to), and secondarily as the keeper of the canonical name of the OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).
>
> I was thinking that a (possibly expanded) Bio::Species might work if there
> was some sensible way of appending references to node and sequence objects
> to it (or otherwise associate them with each other), but I am writing *to
> solicit any and all suggestions*. I am looking for something similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Rutger A. Vos
 Postdoctoral research fellow
 University of British Columbia
 Personal site: http://www.sfu.ca/~rvosa
        CIPRES: http://www.phylo.org
    Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From paul.boutros at utoronto.ca  Sat Dec 23 22:36:59 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:36:59 -0500
Subject: [Bioperl-l] Bio::Graphics::Glyph::dna
Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca>

Hi,

I've been trying to get the dna glyph working and have had some  
problems.  I'm using a fasta file, and am having some problems.  This  
is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP.  I'm  
starting with a FASTA file, so I've tried:
$panel->add_track(
	$seq,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

where $seq is a Bio::Seq object

and I've tried it using a GFF $segment:
my $db = Bio::DB::GFF->new(
          -adaptor=>    'berkeleydb',
          -create =>    1,
          -dsn    =>    'temp'
          );

$db->load_sequence_string(
           $seq->primary_id(),
           $seq->seq()
           );

my $segment = Bio::DB::GFF::Segment->new(
           $db,
           $seq->primary_id(),
           $seq->primary)_id(),
           1,
           $seq->length()
           );

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);


From paul.boutros at utoronto.ca  Sat Dec 23 22:46:27 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:46:27 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having  
some problems.  I'm starting with a fasta file, and I am running perl  
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2

If I try simply using a Bio::Seq object like this:
$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
Can't locate object method "start" via package "Bio::Seq" at  
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this:
my $db = Bio::DB::GFF->new(
	-adaptor  => 'berkeleydb',
	-create   => 1,
	-dsn      => '/usr/local/share/gff/dmel'
	);

$db->initialize(1);

$db->load_sequence_string(
	$seq->primary_id(),
	$seq->seq()
	);

my $segment = Bio::DB::GFF::Segment->new(
	$db,
	$seq->primary_id(),
	$seq->primary_id(),
	1,
	$seq->length()
	);

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not  
implemented b
y package Bio::DB::GFF::Segment.
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented  
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures  
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
STACK: Bio::Graphics::Glyph::subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Panel::_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated!
Paul


From lstein at cshl.edu  Sun Dec 24 12:23:18 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun, 24 Dec 2006 12:23:18 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>

Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g.
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:
>
> Hello,
>
> I'm trying to get the dna glyph of Bio::Graphics to work and am having
> some problems.  I'm starting with a fasta file, and I am running perl
> 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2
>
> If I try simply using a Bio::Seq object like this:
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> Can't locate object method "start" via package "Bio::Seq" at
> C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.
>
>
> And if I try creating a Bio::DB::GFFSegment object like this:
> my $db = Bio::DB::GFF->new(
>         -adaptor  => 'berkeleydb',
>         -create   => 1,
>         -dsn      => '/usr/local/share/gff/dmel'
>         );
>
> $db->initialize(1);
>
> $db->load_sequence_string(
>         $seq->primary_id(),
>         $seq->seq()
>         );
>
> my $segment = Bio::DB::GFF::Segment->new(
>         $db,
>         $seq->primary_id(),
>         $seq->primary_id(),
>         1,
>         $seq->length()
>         );
>
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> ------------- EXCEPTION: Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
> implemented b
> y package Bio::DB::GFF::Segment.
> This is not your fault - author of Bio::DB::GFF::Segment should be blamed!
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Root::RootI::throw_not_implemented
> C:/Perl/site/lib/Bio/Root/RootI.pm:522
> STACK: Bio::FeatureHolderI::get_SeqFeatures
> C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
> STACK: Bio::Graphics::Glyph::_subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
> STACK: Bio::Graphics::Glyph::subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Panel::_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
> STACK: Bio::Graphics::Panel::_do_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
> STACK: Bio::Graphics::Panel::add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
> STACK: create_figure.pl:147
> ----------------------------------------------------------------
>
> I'm really unsure what to try next, any suggestions much appreciated!
> Paul
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From tgenahmet at gmail.com  Wed Dec 27 16:38:43 2006
From: tgenahmet at gmail.com (Ahmet Kurdoglu)
Date: Wed, 27 Dec 2006 14:38:43 -0700
Subject: [Bioperl-l] get mRNA details for a gene
Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com>

Hi,

This is my first message to the list. I hope I get it right. Here is what
I'm trying to accomplish:

Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file.

Using the web-interface I can search with this query:
DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search
'gene' database)
and get the GenBank file by clicking on NC_000001.9 and I can see the
details for its two mRNAs. (I eventually need to get exon locations for both
of its transcripts)

However trying to do this in Perl has proved to be very difficult for me.
I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and
get_Stream_by_query. Before I explain in detail what I did I'd like to hear
your ideas on how to accomplish this.

Thank you.


From sdavis2 at mail.nih.gov  Thu Dec 28 16:57:03 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 28 Dec 2006 16:57:03 -0500
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
Message-ID: <45943DAF.70100@mail.nih.gov>

Michael Muratet US-Huntsville wrote:
> Sean
>
> Thanks. I did consider the bioconductor package and downloaded your
> write-up after it was recommended by the GEO folks. I've looked at R a
> few times, but I never got proficient at it. I'm a lot better with perl.
>
> I've been looking at MINiML, too. It looked like it might be easier to
> parse the SOFT file since the data is in-line with the attributes and
> I'd have to use a SAX parser (not enough memory for DOM) for MINiML.
>
> NCBI must have parsers to get the data into their databases. Do you know
> what they use?
>   
Michael,

You might want to look more specifically at the MINiML format specs.  
The data tables are stored as separate tab-delimited files with an 
external reference in the XML, so DOM parsing is possible with just a 
few kB of memory.  Of course, to read in all of the data into memory at 
once will take a large amount of memory for some datasets.  If you are 
going to load into a database, I would suggest reading the MINiML using 
DOM and then stepping through the data files one at a time, loading as 
you go.

As for their parsers, I'm not sure what language they use, but writing a 
parser for either SOFT or MINiML isn't at all difficult.  GEO uses a 
very simplified MAGE schema. 

As for R vs. perl, if you are planning on doing analyses of microarray 
data, I would highly suggest looking again at the R/bioconductor 
project.  It will save you reinventing many wheels, such as getting 
annotation like gene ontology and pathways, doing stats, plotting, 
maintaining MIAME-compliant data structures, converting from multiple 
microarray formats, etc. 

Sean


From allenday at ucla.edu  Thu Dec 28 18:21:07 2006
From: allenday at ucla.edu (Allen Day)
Date: Thu, 28 Dec 2006 15:21:07 -0800
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <45943DAF.70100@mail.nih.gov>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
	<45943DAF.70100@mail.nih.gov>
Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com>

> As for R vs. perl, if you are planning on doing analyses of microarray
> data, I would highly suggest looking again at the R/bioconductor
> project.  It will save you reinventing many wheels, such as getting
> annotation like gene ontology and pathways, doing stats, plotting,
> maintaining MIAME-compliant data structures, converting from multiple
> microarray formats, etc.

I'll second this statement WRT the data analysis.  I'm doing all my
analysis in R, Perl is just not good at dealing with large matrices or
plotting.  OTOH, I have also found that R is particularly weak when it
comes to pipelining data and system interfacing.  If your goal is to
do ETL to a local database you're better off using Perl.

I've found they're both about equally clunky for dealing with the
experimental metadata, with a slight preference for Perl.  That's more
a reflection of the baroque MAGE model though than the programming
languages themselves.

-Allen

>
> Sean
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Paul.Boutros at utoronto.ca  Sat Dec 30 02:43:32 2006
From: Paul.Boutros at utoronto.ca (Paul Boutros)
Date: Sat, 30 Dec 2006 02:43:32 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>
Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main>

Hi Lincoln,

Thanks, that worked like a charm!  Can I suggest adding the
example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna?
Here's a patch against the 1.5.2 version of dna.pm to do that.

Paul

 
266c266,274

< in response to the dna() method.

---

> in response to the dna() method.  For example, you can use a

> Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq

> like this:

>    my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 );

>    my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800
);

>    $feature->attach_seq($dna);

>    $panel->add_track( $feature, -glyph => 'dna' );

> 

> A Bio::Graphics::Feature object may also be used.

 
  _____  

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of
Lincoln Stein
Sent: Sunday, December 24, 2006 12:23 PM
To: Paul.Boutros at utoronto.ca
Cc: BioPerl Mailing List
Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?

 
Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g. 
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having
some problems.  I'm starting with a fasta file, and I am running perl
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 

If I try simply using a Bio::Seq object like this:
$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
Can't locate object method "start" via package "Bio::Seq" at
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this: 
my $db = Bio::DB::GFF->new(
        -adaptor  => 'berkeleydb',
        -create   => 1,
        -dsn      => '/usr/local/share/gff/dmel'
        );

$db->initialize(1);

$db->load_sequence_string(
        $seq->primary_id(),
        $seq->seq()
        );

my $segment = Bio::DB::GFF::Segment->new(
        $db,
        $seq->primary_id(),
        $seq->primary_id(), 
        1,
        $seq->length()
        );

$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
implemented b
y package Bio::DB::GFF::Segment. 
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented 
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 
STACK: Bio::Graphics::Glyph::subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 
STACK: Bio::Graphics::Panel::_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track 
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated! 
Paul


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice) 
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From er at xs4all.nl  Sat Dec 30 19:05:16 2006
From: er at xs4all.nl (Erik)
Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET)
Subject: [Bioperl-l] acquiring a local refseq + index
Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>

Hi all,

I downloaded the refseq files (.gbff) and want to index the lot with
Bio::DB::Flat.

It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.

I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).

Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
 Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?

Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.

Thanks for your help,

Erikjan


-------------
use Bio::DB::Flat;

my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
   -directory  => $refseq_dir,
   -dbname     => 'refseq',
   -format     => 'genbank',
   -index      => 'bdb',
   -write_flag => 1,
);
my @files = getfiles($refseq_dir);
for my $f (@files) {
        db->build_index($f);
}


From hlapp at gmx.net  Sat Dec 30 20:48:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Dec 2006 20:48:33 -0500
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
Message-ID: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>

Can you send examples and the resulting error messages? Also, I'm  
assuming you running the 1.5.2 release of Bioperl; if not that's what  
I would try first.

	-hilmar

On Dec 30, 2006, at 7:05 PM, Erik wrote:

> Hi all,
>
> I downloaded the refseq files (.gbff) and want to index the lot with
> Bio::DB::Flat.
>
> It turns out that there are many cases where the SOURCE and  
> ORGANISM lines
> are messed up, sometimes to a degree where the indexing fails on a
> Bio::SeqIO::genbank error.
>
> I'd like to change Bio::SeqIO::genbank to let this parsing go at  
> least so
> far as to make the indexing of the refseq files possible, and  
> hopefully
> improving the taxonomic output ($seq->species->binomial is often  
> mutilated
> at the moment).
>
> Is it still worthwhile to change parsing modules like  
> Bio::SeqIO::genbank?
>  Is anyone already working on a rewrite? Because if this is the  
> case I may
> be better off writing my own indexing scheme?
>
> Below is (outline of) my indexing program, which uses  
> Bio::DB::Flat::DBD.
> If anyone knows of a better way to get a locally searchable refseq  
> flat
> file index, I would be very interested.
>
> Thanks for your help,
>
> Erikjan
>
>
> -------------
> use Bio::DB::Flat;
>
> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
> my $db=Bio::DB::Flat->new(
>    -directory  => $refseq_dir,
>    -dbname     => 'refseq',
>    -format     => 'genbank',
>    -index      => 'bdb',
>    -write_flag => 1,
> );
> my @files = getfiles($refseq_dir);
> for my $f (@files) {
>         db->build_index($f);
> }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sat Dec 30 21:33:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Dec 2006 20:33:23 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>

Agree with Hilmar, in that we need examples.  If you are referring to  
your submitted bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=2167

we could add this in as long as it passes (I'll try giving it a  
workout with my local bacterial seqs tonight or tomorrow).  However,  
in the not-too-distant future your patch would likely be rendered  
obsolete, as any parsing in Bio::SeqIO modules pertaining to  
Bio::Species-related matters will be deprecated in favor of simple  
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has  
optional db lookups using NCBI Taxonomy).  Bio::Species and anything  
related to it are considered marked for deprecation.  Fair warning...

chris

On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:

> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> 	-hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>>  Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>>    -directory  => $refseq_dir,
>>    -dbname     => 'refseq',
>>    -format     => 'genbank',
>>    -index      => 'bdb',
>>    -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>>         db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 31 14:36:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 31 Dec 2006 13:36:47 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
	<76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu>

As a followup, I have committed the fix Erik had in Bugzilla.  I  
don't know if this helps with the below issue Erik describes (they  
sound unrelated).

chris

On Dec 30, 2006, at 8:33 PM, Chris Fields wrote:

> Agree with Hilmar, in that we need examples.  If you are referring to
> your submitted bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow).  However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
> related to it are considered marked for deprecation.  Fair warning...
>
> chris
>
> On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
>
>> Can you send examples and the resulting error messages? Also, I'm
>> assuming you running the 1.5.2 release of Bioperl; if not that's what
>> I would try first.
>>
>> 	-hilmar
>>
>> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>>
>>> Hi all,
>>>
>>> I downloaded the refseq files (.gbff) and want to index the lot with
>>> Bio::DB::Flat.
>>>
>>> It turns out that there are many cases where the SOURCE and
>>> ORGANISM lines
>>> are messed up, sometimes to a degree where the indexing fails on a
>>> Bio::SeqIO::genbank error.
>>>
>>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>>> least so
>>> far as to make the indexing of the refseq files possible, and
>>> hopefully
>>> improving the taxonomic output ($seq->species->binomial is often
>>> mutilated
>>> at the moment).
>>>
>>> Is it still worthwhile to change parsing modules like
>>> Bio::SeqIO::genbank?
>>>  Is anyone already working on a rewrite? Because if this is the
>>> case I may
>>> be better off writing my own indexing scheme?
>>>
>>> Below is (outline of) my indexing program, which uses
>>> Bio::DB::Flat::DBD.
>>> If anyone knows of a better way to get a locally searchable refseq
>>> flat
>>> file index, I would be very interested.
>>>
>>> Thanks for your help,
>>>
>>> Erikjan
>>>
>>>
>>> -------------
>>> use Bio::DB::Flat;
>>>
>>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>>> my $db=Bio::DB::Flat->new(
>>>    -directory  => $refseq_dir,
>>>    -dbname     => 'refseq',
>>>    -format     => 'genbank',
>>>    -index      => 'bdb',
>>>    -write_flag => 1,
>>> );
>>> my @files = getfiles($refseq_dir);
>>> for my $f (@files) {
>>>         db->build_index($f);
>>> }
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From arareko at campus.iztacala.unam.mx  Fri Dec  1 00:56:02 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 30 Nov 2006 18:56:02 -0600
Subject: [Bioperl-l] [Root-l] Intermittent MySQL problems on BioPerl wiki
In-Reply-To: <E469E539-4DFA-4739-8A66-2BFDE3A89E54@sonsorol.org>
References: <000201c714b3$6198e4e0$15327e82@pyrimidine>
	<E469E539-4DFA-4739-8A66-2BFDE3A89E54@sonsorol.org>
Message-ID: <456F7DA2.7000408@campus.iztacala.unam.mx>

Chris & Chris,

I've run the maintenance scripts for MediaWiki (just in case they 
weren't run in the upgrade to 1.8.2), restarted Apache (with no 
significant changes on website response), then rebooted the machine 
(seems like MySQL restart didn't do the trick) and apparently its 
behaving much better. Please check if the reported error still happens.

Regards,
Mauricio.

Chris Dagdigian wrote:
> Reports like this need to go to support at helpdesk.open-bio.org so that  
> they enter our RT helpdesk queue --  the main reason is that  
> sometimes emails to the root-l at open-bio.org administrators mailing  
> list can get lost in the shuffle.
> 
> I am going to bounce this message into RT and will restart mysql on  
> the portal box. This is probably something we should be doing anyway  
> to free up memory -- the wikis in particular seem to be pretty hard  
> on mysql and free memory.
> 
> -Chris
> 
> On Nov 30, 2006, at 2:11 PM, Chris Fields wrote:
> 
>> I'm seeing some MySQL errors on the Bioperl wiki (using Firefox 2 and
>> WinXP):
>>
>> Database error
>>> From BioPerl
>> Jump to: navigation, search
>> A database query syntax error has occurred. This may indicate a bug  
>> in the
>> software. The last attempted database query was:
>>
>>     (SQL query hidden)
>>
>> from within function "MediaWikiBagOStuff::_doquery". MySQL returned  
>> error
>> "1205: Lock wait timeout exceeded; try restarting transaction  
>> (localhost)".
>>
>>
>> This occurs intermittently when editting pages, logging in, etc.   
>> Also,
>> pages loading to the browser seem much slower.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>> _______________________________________________
>> Root-l mailing list
>> Root-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/root-l
> 
> _______________________________________________
> Root-l mailing list
> Root-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/root-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From n.haigh at sheffield.ac.uk  Fri Dec  1 07:47:03 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 07:47:03 +0000
Subject: [Bioperl-l] Upgrading my BioPerl RC via ppm?
In-Reply-To: <519167.29410.qm@web50804.mail.yahoo.com>
References: <519167.29410.qm@web50804.mail.yahoo.com>
Message-ID: <456FDDF7.1080403@sheffield.ac.uk>

Caitlin wrote:
> Hi all.
>
> I'm currently using BioPerl 1.5.2 RC2 but I've seen multiple references
> to 1.5.2 RC5. Can anyone tell me how to upgrade to the latest version?
> The ppm GUI (ActivePerl Build 819) doesn't include any BioPerl packages
> among those deemed upgradable.
>
> Thanks,
>
> ~Katie
>
>
>   

Hi Katie,

Currently there is not an RC5 PPM package available - we are hoping to
have the official 1.5.2 release out pretty soon and there will
definitely be a PPM package for that! Are you experiencing any problems
with your current version of bioperl? If not, there is no need to worry,
once we've released an updated PPM package your PPM GUI should then be
able to see it as an upgrade - hopefully! :o)

Sendu, I know you were working on automatically generating PPM packages
- what is the current situation with regards to this?

Nath


---
avast! Antivirus: Inbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:46:58
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 07:47:04
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 09:00:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:00:18 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <456F27E9.70205@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>
	<456F27E9.70205@york.ac.uk>
Message-ID: <456FEF22.4090004@sendu.me.uk>

Samantha Thompson wrote:

You missed a step...


> use strict;
> use Bio::Perl;
> use Bio::Seq;
> use Bio::SeqIO;
> 
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> 
> #seq bit
> 
> #$seq_obj = Bio::Seq->new(-format => 'fasta');
> 
> my $seqio_obj = Bio::SeqIO->new(-file => 
> "/biol/people/mres/st537/MalEfasta.txt", -format => 'fasta');
> 
> my $seq_obj = $seqio_obj->next_seq;
> 
> 
> 
> #blast bit
> 
> my $remote_blast = Bio::Tools::Run::RemoteBlast->new (
>          -prog => 'blastp', -db => 'nr', -expect => '1e-15' );
> 
> my $blast_report = $remote_blast->submit_blast($seq_obj);

Go back to the Bptutorial:
http://www.bioperl.org/wiki/Bptutorial.pl#Running_BLAST_.28using_RemoteBlast.pm.29

And you'll see that submit_blast doesn't return a SearchIO object.

For a complete working example see the synopsis for RemoteBlast:
http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html


> #new part for SearchIO...
> 
> while( my $result = $blast_report->next_result ) {
>   while( my $hit = $result->next_hit ) {
>    while( my $hsp = $hit->next_hsp ) {
>     if( $hsp->length('total') > 100 ) {
>      if ( $hsp->percent_identity >= 75 ) {
>       print "Hit= ",       $hit->name,
>             ",Length=",     $hsp->length('total'),
>             ",Percent_id=", $hsp->percent_identity, "\n";
>      }
>     }
>    } 
>   }
> }


From bix at sendu.me.uk  Fri Dec  1 09:03:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:03:13 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <456FEFD1.4070704@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: The supplied lineage does not start near 'Photorhabdus luminescens
> subsp. laumondii'

In your uniprot_sprot.dat file there'll be some kind of entry with that 
Photorhabdus species. Can you post that entry (sans sequence if it has 
one) so I can take a look at it? Maybe post a few that cause problems, 
and a few that don't.


From bix at sendu.me.uk  Fri Dec  1 09:19:09 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:19:09 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <000301c714b4$7846e790$15327e82@pyrimidine>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
Message-ID: <456FF38D.3070508@sendu.me.uk>

Chris Fields wrote:
>> Nathan S. Haigh wrote:
>>> More updates:
>>>
>>> After the failed install I updating Module::Build, and re-ran the 
>>> install, I get:
>>>
>>> -- snip --
>>> Creating new 'Build' script for 'bioperl' version '1.005002005'
>>> Warning: while trying to determine prerequisites for 
>>> S/SE/SENDU/bioperl-1.5.2_005-RCb.tar.gz wi th the help of 
>>> Module::Build the following error occurred: 'Failed to re-load 
>>> 'ModuleBuildBiope
>>> rl': Can't locate ModuleBuildBioperl.pm in @INC (@INC contains: 
>>> _build\lib C:\Perl\site\lib C:\
>>> Perl\lib C:\Documents and Settings\test) at (eval 105) line 1.
>>> '
>>>
>>> Falling back to META.yml for prerequisites 'YAML' not installed, 
>>> cannot parse 'C:\Perl\cpan\build\bioperl-1.5.2_005-RC\META.yml'
>>> -- snip --
>> I had that problem fleetingly and it drove me crazy because 
>> later I couldn't reproduce it. Is it reproducible on your end?
> 
> During Module::Build installation I see this:
> 
> ...
> t\metadata........ok
>         8/43 skipped: YAML_support feature is not enabled

You were pointing out the YAML issue? I think I'm less concerned with 
that (solution: install YAML) and much more concerned with why it can't 
reload ModuleBuildBioperl (claiming it isn't in @INC). The module in 
question is in the same dir as the Build script, so it should be found 
automatically.

The only thing I can think of is that CPAN doesn't manage to chdir to 
the directory. Hopefully I'll be able to reproduce this and then I can 
investigate further.


From n.haigh at sheffield.ac.uk  Fri Dec  1 09:26:22 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 09:26:22 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <456FF53E.90907@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>>
>> I know that setting up the PPM is a pain, but I have to say it is 
>> much faster, and all required PPMs are available.  Which makes me 
>> curious: why bother with trying out a CPAN installation process at 
>> this point, especially when you have to use PPM to install some of 
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all 
> platforms, not just Windows. So thanks for trying it out and reporting 
> back. Secondly, the PPM method, like Bundle::BioPerl, is 
> all-or-nothing. The CPAN installation method allows an interactive 
> choice of which optional things to install.
>
> If what you say about DB_File is true, then that's a great shame!
>
>
> So I can do further trouble-shooting of my own, what is the sure-fire 
> way to completely clean-out an ActivePerl install, including any 
> modules you might have installed with PPMs or CPAN?
>
>

In addition, using CPAN allows you to run the test suite easily without 
the need to download it separately and run it after a PPM install.

I don't know of a way to clean out ActivePerl - I use VMWare Workstation 
and have a virtual machine with a fresh install of WinXP and ActivePerl 
5.8.8.819 - maybe someone else has ideas?

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 09:26:23
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 09:13:23 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 09:13:23 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
Message-ID: <456FF233.6040704@sendu.me.uk>

Chris Fields wrote:
> 
> I know that setting up the PPM is a pain, but I have to say it is much 
> faster, and all required PPMs are available.  Which makes me curious: 
> why bother with trying out a CPAN installation process at this point, 
> especially when you have to use PPM to install some of the prereqs 
> properly anyway?

Firstly, problems discovered and resulting fixes will help all 
platforms, not just Windows. So thanks for trying it out and reporting 
back. Secondly, the PPM method, like Bundle::BioPerl, is all-or-nothing. 
The CPAN installation method allows an interactive choice of which 
optional things to install.

If what you say about DB_File is true, then that's a great shame!


So I can do further trouble-shooting of my own, what is the sure-fire 
way to completely clean-out an ActivePerl install, including any modules 
you might have installed with PPMs or CPAN?


From cjfields at uiuc.edu  Fri Dec  1 14:08:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:08:55 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF233.6040704@sendu.me.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
Message-ID: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>


On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I know that setting up the PPM is a pain, but I have to say it is  
>> much faster, and all required PPMs are available.  Which makes me  
>> curious: why bother with trying out a CPAN installation process at  
>> this point, especially when you have to use PPM to install some of  
>> the prereqs properly anyway?
>
> Firstly, problems discovered and resulting fixes will help all  
> platforms, not just Windows. So thanks for trying it out and  
> reporting back. Secondly, the PPM method, like Bundle::BioPerl, is  
> all-or-nothing. The CPAN installation method allows an interactive  
> choice of which optional things to install.

Yes, I understand that.  My point is, you are generally forced to use  
PPM anyway due to several modules not installing properly (all the  
'trouble' distributions, like DB_File, are available via PPM).  I can  
see using CPAN as an alternative way of installing Bioperl for a  
distribution, or as the primary method via CVS or manually, but not  
for distributions.  At least not until the kinks are worked out for  
Windows users.

What are the significant issues for a bioperl PPM installation, based  
on the last PPM Nathan set up?  If there is a redirection problem,  
could we just modify the installation docs to address that ('due to  
problem X, you must install the following modules prior to installing  
BioPerl 1.5.2...').

> If what you say about DB_File is true, then that's a great shame!

We need to go through the various prereqs to see which ones need PPM  
vs CPAN.  In general, anything that requires C code compilation (and  
thus needs a recent VC++) will likely be an issue.

> So I can do further trouble-shooting of my own, what is the sure- 
> fire way to completely clean-out an ActivePerl install, including  
> any modules you might have installed with PPMs or CPAN?

Not sure, beyond uninstalling and cleaning out the Perl directory (I  
think you might be able to delete the site/ directory, but I haven't  
tried it).  ActivePerl comes preloaded with a number of non-core  
modules which makes it tricky to uninstall them one-by-one.

chris


From cjfields at uiuc.edu  Fri Dec  1 14:10:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 08:10:34 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <456FF38D.3070508@sendu.me.uk>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
Message-ID: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>


On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:

> You were pointing out the YAML issue? I think I'm less concerned  
> with that (solution: install YAML) and much more concerned with why  
> it can't reload ModuleBuildBioperl (claiming it isn't in @INC). The  
> module in question is in the same dir as the Build script, so it  
> should be found automatically.
>
> The only thing I can think of is that CPAN doesn't manage to chdir  
> to the directory. Hopefully I'll be able to reproduce this and then  
> I can investigate further.

My thought was the two were related in some way.  I'm not sure to  
tell the truth.

-chris


From bix at sendu.me.uk  Fri Dec  1 14:17:41 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:17:41 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk>
	<10BC5C25-616F-44D5-8CA8-4BD4C3EF82D6@uiuc.edu>
Message-ID: <45703985.5050203@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:13 AM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I know that setting up the PPM is a pain, but I have to say it is 
>>> much faster, and all required PPMs are available.  Which makes me 
>>> curious: why bother with trying out a CPAN installation process at 
>>> this point, especially when you have to use PPM to install some of 
>>> the prereqs properly anyway?
>>
>> Firstly, problems discovered and resulting fixes will help all 
>> platforms, not just Windows. So thanks for trying it out and reporting 
>> back. Secondly, the PPM method, like Bundle::BioPerl, is 
>> all-or-nothing. The CPAN installation method allows an interactive 
>> choice of which optional things to install.
> 
> Yes, I understand that.  My point is, you are generally forced to use 
> PPM anyway due to several modules not installing properly (all the 
> 'trouble' distributions, like DB_File, are available via PPM).  I can 
> see using CPAN as an alternative way of installing Bioperl for a 
> distribution, or as the primary method via CVS or manually, but not for 
> distributions.  At least not until the kinks are worked out for Windows 
> users.

CPAN isn't being suggested as the primary or preferred installation 
method for Windows. That will still be PPM. I'm mentioning CPAN / manual 
installation in the Windows INSTALL docs for the benefit of anyone who 
wants a simple install and test environment when checking out from CVS.


> What are the significant issues for a bioperl PPM installation

None that I'm aware of - I just need to find the time to start looking 
into generating an appropriate PPD. Hopefully Nathan's wiki page on the 
subject will be all I need.


From bix at sendu.me.uk  Fri Dec  1 14:18:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 14:18:43 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install on WinXP
	ActivePerl5.8.8.819
In-Reply-To: <6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
References: <000301c714b4$7846e790$15327e82@pyrimidine>
	<456FF38D.3070508@sendu.me.uk>
	<6E434A6A-0EA4-4FD6-9DA1-0D5CF196AE36@uiuc.edu>
Message-ID: <457039C3.30907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 3:19 AM, Sendu Bala wrote:
> 
>> You were pointing out the YAML issue? I think I'm less concerned with 
>> that (solution: install YAML) and much more concerned with why it 
>> can't reload ModuleBuildBioperl (claiming it isn't in @INC). The 
>> module in question is in the same dir as the Build script, so it 
>> should be found automatically.
>>
>> The only thing I can think of is that CPAN doesn't manage to chdir to 
>> the directory. Hopefully I'll be able to reproduce this and then I can 
>> investigate further.
> 
> My thought was the two were related in some way.  I'm not sure to tell 
> the truth.

They weren't, using YAML is the fall-back position incase of earlier 
failure.

I've fixed it now in any case.


From gwu at molbio.mgh.harvard.edu  Fri Dec  1 15:19:42 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Fri, 01 Dec 2006 10:19:42 -0500
Subject: [Bioperl-l] One more load_seqdatabase.pl question
In-Reply-To: <70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
References: <4a9ad8800611270907x64a4a4c0jad92bff6641e300@mail.gmail.com>	<53C6D534-6E36-4061-B955-E74537839265@gmx.net>	<456CA667.6010609@molbio.mgh.harvard.edu>
	<ED3F5F49-78A7-4E63-ACB8-5E8F745F0C34@gmx.net>
	<456F5648.6070207@molbio.mgh.harvard.edu>
	<70B28FBB-0250-4EB8-8775-CD0537369A3D@gmx.net>
Message-ID: <4570480E.1020701@molbio.mgh.harvard.edu>

Thanks Hilmar. I did include the -lookup switch on the command line. The 
warning messages say that the code failed to "INSERT" instead of 
"UPDATE", which sounds like a match was not found. But I was just 
loading the same Genbank file for the second time. To test if it 
actually updated the records, I made a minor modification on one of the 
COMMENT feature. Unfortunately it's not updated. By the way, the test 
genbank file has four "COMMENT" features but they are different. Any 
idea what's happening there?

I wonder if it's a bad idea to "UPDATE" a sequence.  Say I got a new 
sequence version with 5 features removed, 5 features modified and 5 
features new. If only --lookup is included, according to the POD, the 5 
new features will be inserted, the 5 modified features will be updated 
and the 5 removed features will be in the database untouched. This 
rendered the new sequence records a mixture of old and new versions. I 
did not see a reason anyone would like to have a sequence like this. 
Either include -remove to replace the old version if only one version is 
needed, or put the new version under a different name space if multiple 
versions are needed. Do I have the correct understanding of these issues?

I deeply appreciate your help.

Gang


Hilmar Lapp wrote:
> Right. You need to tell it to lookup sequences first if you know that 
> you are loading sequences which may be in the database already (see 
> the POD of load_seqdatabase.pl, switch --lookup; there are several 
> other command line options that control what will happen if a sequence 
> entry is already present in the database.).
>
> The messages in you report are warnings, not errors. It looks like 
> some of the comments are duplicated for a sequence, it doesn't look 
> like reason for concern. Is not so good if you get errors thrown.
>
>     -hilmar
>
> On Nov 30, 2006, at 5:08 PM, gang wu wrote:
>
>> Thanks Hilmar. Do you mean the NVL() clause will make 
>> load_seqdatabase.pl not work when update?
>>
>> I have problem with updating. Seems load_seqdatabase.pl only tries to 
>> insert instead of update. I used one of the test genbank file coming 
>> whith bioperl-db. Please take a look at the attached output.
>>
>> Thanks.
>>
>> Gang
>>
>> =========================================
>> >perl load_seqdatabase.pl -lookup -host elegans -driver Oracle 
>> -dbname sparc -dbuser biosqldb-sgowner -dbpass PASS -format genbank 
>> -namespace test 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb
>> Loading 
>> /root/.cpan/build/bioperl-db-1.5.2-RC3/scripts/biosql/data/AP000868.gb 
>> ...
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("This sequence was reannotated via the Ensembl system. 
>> Please visit the Ensembl web site, http://www.ensembl.org/ for more 
>> information. ","1") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>>
>> -------------------- WARNING ---------------------
>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed, 
>> values were ("The /gene indicates a unique id for a gene, /cds a 
>> unique id for a translation and a /exon a unique id for an exon. 
>> These ids are maintained wherever possible between versions. For more 
>> information on how to interpret the feature table, please visit 
>> http://www.ensembl.org/Docs/embl.html. ","2") FKs (389109)
>> ORA-00001: unique constraint (BIOSQLDB_SGOWNER.XAK1COMMENT) violated 
>> (DBD ERROR: OCIStmtExecute)
>> ---------------------------------------------------
>> ...
>> ...
>> ==========================================================
>> Hilmar Lapp wrote:
>>> These are the protein translations stored in the feature table as 
>>> tags of features, right? You can change the type of the column 
>>> (although there may be some issues when you update the column 
>>> because the NVL() clause won't work if I recall that correctly), but 
>>> doing so will deprive you of any 'normal' searches against that 
>>> column. (You can still use functions >from the DBMS_LOB package, but 
>>> they will be much slower and are completely non-standard.) It is up 
>>> to you whether that is too big of a price to pay for having some 
>>> redundant protein translations (translating the feature's DNA 
>>> sequence should give you the same) in the database. I always trimmed 
>>> those feature tags off (using a custom SeqProcessor). An alternative 
>>> is to convert these feature tags into actual bioentries (i.e., 
>>> Bio::Seq objects; again, a custom SeqProcessor will allow you to do 
>>> that). -hilmar On Nov 28, 2006, at 4:13 PM, gang wu wrote:
>>>> Hi everyone, I'm using load_seqdatabase.pl to upload some Genbank 
>>>> genome sequences to my Oracle BioSQL database. I saw some 
>>>> errors(See attached warning message) related to 
>>>> seqfeature_qualifier_value (SG_SEQFEATURE_QUALIFIER_ASSOC.VALUE 
>>>> column), which has Varchar2 data type of maximum 4000 bytes. Did 
>>>> anybody mention this issue before? Should I just modify the column 
>>>> to a type being able store more data such as LONG or CLOB? Thanks. 
>>>> Gang Log information: ============================================ 
>>>> load_seqdatabase.pl -host elegans -driver Oracle -dbname sparc 
>>>> -dbuser biosqldb-sgowner -dbpass PASS -format genbank -namespace 
>>>> genbank /genomeseq/arabidopsis//NC_003070.gbk Loading 
>>>> /genomeseq/arabidopsis//NC_003070.gbk ... -------------------- 
>>>> WARNING --------------------- MSG: SimpleValueAdaptor::add_assoc: 
>>>> unexpected failure of statement execution: ORA-01461: can bind a 
>>>> LONG value only for insert into a LONG column (DBD ERROR: error 
>>>> possibly near <*> indicator at char 12 in 'INSERT INTO 
>>>> <*>seqfeature_qualifier_value (fea_oid, trm_oid, value, rank) 
>>>> VALUES (:p1, :p2, :p3, :p4)') name: INSERT ASSOC [2] 
>>>> Bio::SeqFeature::Generic;Bio::Annotation::SimpleValue values: 
>>>> FK[Bio::SeqFeature::Generic]:14898, 
>>>> FK[Bio::Annotation::SimpleValue]:800, 
>>>> value:"MVAVTGEVLHLLRRYLGEYVHGLSTEALRISVWKGDVVLKDLKLKAEALNSLKLPVAVKSGFV 
>>>> GTITLKVPWKSLGKEPVIVLIDRVFVLAYPAPDDRTLKFFTLVGTEFAYTNYIPGGRQGKASRNQASADR 
>>>> GTSYFWLMELHGYEAETATLEARAKSKLGSPPQGNSWLGSIIATIIGNLKVSISNVHIRYEDSTRDSSEI 
>>>> LASFFSYFNNICSSNPGHPFAAGITLAKLAAVTMDEEGNETFDTSGALDKLRKSLQLERLALYHDSNSFP 
>>>> WEIEKQWDNITPEEWIEMFEDGIKEQTEHKIKSKWALNRHYLLSPINGSLKYHRLGNQERNNPEIPFERA 
>>>> SVILNDVNVTITEEQYHDWIKLVEVVSRYKTYIEISHLRPMVPVSEAPRLWWRFAAQASLQQKRLWYTRY 
>>>> IQLYANFLQQSSDVNYPEMREIEKDLDSKVILLWRLLAHAKVESVKSKEAAEQRKLKKGGWFSFNWRTEA 
>>>> EDDPEVDSVAGGSKLMEERLTKDEWKAINKLLSHQPDEEMNLYSGKDMQNMTHFLVTVSIGQGAARIVDI 
>>>> NQTEVLCGRFEQLDVTTKFRHRSTQCDVSLRFYGLSAPEGSLAQSVSSERKTNALMASFVNAPIGENIDW 
>>>> RLSATISPCHATIWTESYDRVLEFVKRSNAVSPTVALETAAVLQMKLEEVTRRAQEQLQIVLEEQSRFAL 
>>>> DIDIDAPKVRIPLRASGSSKCSSHFLLDFGNFTLTTMDTRSEEQRQNLYSRFCISGRDIAAFFTDCGSDN 
>>>> QGCSLVMEDFTNQPILSPILEKADNVYSLIDRCGMAVIVDQIKVPHPSYPSTRISIQVPNIGVHFSPTRY 
>>>> MRIMQLFDILYGAMKTYSQAPVDHMPDGIQPWSPTDLASDARILVWKGIGNSVATWQSCRLVLSGLYLYT 
>>>> FESEKSLDYQRYLCMAGRQVFEVPPANIGGSPYCLAVGVRGTDLKKALESSSTWIIEFQGEEKAAWLRGL 
>>>> VQATYQASA! 
>>>> PLSGDVLGQTSDGDGDFHEPQTRNMKAADLVITGALVETKLYLYGKIKNECDEQVEEVLLLKVLASGGKV 
>>>> HLISSESGLTVRTKLHSLKIKDELQQQQSGSAQYLAYSVLKNEDIQESLGTCDSFDKEMPVGHADDEDAY 
>>>> TDALPEFLSPTEPGTPDMDMIQCSMMMDSDEHVGLEDTEGGFHEKDTSQGKSLCDEVFYEVQGGEFSDFV 
>>>> SVVFLTRSSSSHDYNGIDTQMSIRMSKLEFFCSRPTVVALIGFGFDLSTASYIENDKDANTLVPEKSDSE 
>>>> KETNDESGRIEGLLGYGKDRVVFYLNMNVDNVTVFLNKEDGSQLAMFVQERFVLDIKVHPSSLSVEGTLG 
>>>> NFKLCDKSLDSGNCWSWLCDIRDPGVESLIKFKFSSYSAGDDDYEGYDYSLSGKLSAVRIVFLYRFVQEV 
>>>> TAYFMGLATPHSEEVIKLVDKVGGFEWLIQKDEMDGATAVKLDLSLDTPIIVVPRDSLSKDYIQLDLGQL 
>>>> EVSNEISWHGCPEKDATAVRVDVLHAKILGLNMSVGINGSIGKPMIREGQGLDIFVRRSLRDVFKKVPTL 
>>>> SVEVKIDFLHAVMSDKEYDIIVSCTSMNLFEEPKLPPDFRGSSSGPKAKMRLLADKVNLNSQMIMSRTVT 
>>>> ILAVDINYALLELRNSVNEESSLAHVAVRASEPNSSISWMTSLSETDLYVSVPKVSVLDIRPNTKPEMRL 
>>>> MLGSSVDASKQASSESLPFSLNKGSFKRANSRAVLDFDAPCSTMLLMDYRWRASSQSCVLRVQQPRILAV 
>>>> PDFLLAVGEFFVPALRAITGRDETLDPTNDPITRSRGIVLSEPLYKQTEDVVHLSPRRQLVADSLGIDEY 
>>>> TYDGCGKVISLSEQGEKDLNVGRLEPIIIVGHGKKLRFVNVKIKNGSLLSKCIYLSNDSSCLFSPEDGVD 
>>>> ISMLENASSNPENVLSNAHKSSDVSDTCQYDSKSGQSFTFEAQVVSPEFTFFDGTKSSLDDSSAVEKLLR 
>>>> VKLDFNFM! 
>>>> YASKEKDIWVRALLKNLVVETGSGLIILDPVDISGGYTSVKEKTNMSLTSTDIYMHLSLSALSLLLNLQS 
>>>> QVTGALQSGNAIPLASCTNFDRIWVSPKENGPRNNLTIWRPQAPSNYVILGDCVTSRAIPPTQAVMAVSN 
>>>> TYGRVRKPIGFNRIGLFSVIQGLEGDNVQHSHNSNECSLWMPVAPVGYTAMGCVANIGSEQPPDHIVYCL 
>>>> SIWRADNVLGAFYAHTSTAAPSKKYSPGLSHCLLWNPLQSKTSSSSDPSSTSGSRSEQSSDQTGNSSGWD 
>>>> ILRSISKATSYHVSTPNFERIWWDKGGDLRRPVSIWRPVPRPGFAILGDSITEGLEPPALGILFKADDSE 
>>>> IAAKPVQFNKVAHIVGKGFDEVFCWFPVAPPGYVSLGCVLSKFDEAPHVDSFCCPRIDLVNQANIYEASV 
>>>> TRSSSSKSSQLWSIWKVDNQACTFLARSDLKRPPSRMAFAVGESVKPKTQENVNAEIKLRCFSLTLLDGL 
>>>> HGMMTPLFDTTVTNIKLATHGRPEAMNAVLISSIAASTFNPQLEAWEPLLEPFDGIFKLETYDTALNQSS 
>>>> KPGKRLRIAATNILNINVSAANLETLGDAVVSWRRQLELEERAAKMKEESAASRESGDLSAFSALDEDDF 
>>>> QTIVVENKLGRDIYLKKLEENSDVVVKLCHDENTSVWVPPPRFSNRLNVADSSREARNYMTVQILEAKGL 
>>>> HIIDDGNSHSFFCTLRLVVDSQGAEPQKLFPQSARTKCVKPSTTIVNDLMECTSKWNELFIFEIPRKGVA 
>>>> RLEVEVTNLAAKAGKGEVVGSLSFPVGHGESTLRKVASVRMLHQSSDAENISSYTLQRKNAEDKHDNGCL 
>>>> LISTSYFEKTTIPNTLRNMESKDFVDGDTGFWIGVRPDDSWHSIRSLLPLCIAPKSLQNDFIAMEVSMRN 
>>>> GRKHATFRCLATVVNDSDVNLEISISSDQNVSSGVSNHNAVIASRSSYVLPWGCLSKDNEQCLHIRPKVE 
>>>> NSHHSYAWGYCIAVSSGCGKDQPFVDQGLLTRQNTIKQSSRASTFFLRLNQLEKKDMLFCCQPSTGSKPL 
>>>> WLSVGADAS! 
>>>> VLHTDLNTPVYDWKISISSPLKLENRLPCPVKFTVWEKTKEGTYLERQHGVVSSRKSAHVYSADIQRPVY 
>>>> LTLAVHGGWALEKDPIPVLDISSNDSVSSFWFVHQQSKRRLRVSIERDVGETGAAPKTIRFFVPYWITND 
>>>> SYLPLSYRVVEIEPSENVEAGSPCLTRASKSFKKNPVFSMERRHQKKNVRVLESIEDTSPMPSMLSPQES 
>>>> AGRSGVVLFPSQKDSYVSPRIGIAVAARDSDSYSPGISLLELEKKERIDVKAFCKDASYYMLSAVLNMTS 
>>>> DRTKVIHLQPHTLFINRVGVSICLQQCDCQTEEWINPSDPPKLFGWQSSTRLELLKLRVKGYRWSTPFSV 
>>>> FSEGTMRVPVPKEDGTDQLQLRVQVRSGTKNSRYEVIFRPNSISGPYRIENRSMFLPIRYRQVEGVSESW 
>>>> QFLPPNAAASFYWENLGRRHLFELLVDGNDPSNSEKFDIDKIGDYPPRSESGPTRPIRVTILKEDKKNIV 
>>>> RISDWMPAIEPTSSISRRLPASSLSELSGNESQQSHLLASEDSEFHVIVELAELGISVIDHAPEEILYMS 
>>>> VQNLFVAYSTGLGSGLSRFKLRMQGIQVDNQLPLAPMPVLFRPQRTGDKADYILKFSVTLQSNAGLDLRV 
>>>> YPYIDFQGRENTAFLINIHEPIIWRIHEMIQQANLSRLSDPNSTAVSVDPFIQIGVLNFSEVRFRVSMAM 
>>>> SPSQRPRGVLGFWSSLMTALGNTENMPVRISERFHENISMRQSTMINNAIRNVKKDLLGQPLQLLSGVDI 
>>>> LGNASSALGHMSQGIAALSMDKKFIQSRQRQENKGVEDFGDIIREGGGALAKGLFRGVTGILTKPLEGAK 
>>>> SSGVEGFVSGFGKGIIGAAAQPVSGVLDLLSKTTEGANAMRMKIAAAITSDEQLLRRRLPRAVGADSLLR 
>>>> PYNDYRAQGQVILQLAESGSFLGQVDLFKVRGKFALTDAYESHFILPKGKVLMITHRRVILLQQPSNIMG 
>>>> QRKFIPAK! 
>>>> DACSIQWDILWNDLVTMELSDGKKDPPNSPPSRLILYLKAKPHDPKEQFRVVKCIPNSKQAFDVYSAIDQ 
>>>> AINLYGQNALKGMVKNKVTRPYSPISESSWAEGASQQMPASVTPSSTFGTSPTTSSS", 
>>>> rank:"1" -------------------------------------------------- 
>>>> =============================================   
>>>> _______________________________________________ Bioperl-l mailing 
>>>> list Bioperl-l at lists.open-bio.org 
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>


From bosborne11 at verizon.net  Fri Dec  1 14:55:18 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 01 Dec 2006 09:55:18 -0500
Subject: [Bioperl-l] An announcement
Message-ID: <C195AC86.BB6A%bosborne11@verizon.net>

bioperl-l,

I would like to call your attention to a job posting and in doing so I
realize that I?m probably breaking a rule of this list. I apologize and and
acknowledge that I?ve transgressed. The reason I do this is because this is
an interesting job that is relevant to a lot of what we do in this mailing
list, and some of you might want to consider it. The posting is here:

http://www.nescent.org/main/employment.html#gmodhelpdesk

I encourage you to pass this on to anyone who you think might be interested.

Thanks again,

Brian O.


From cjfields at uiuc.edu  Fri Dec  1 16:49:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 10:49:32 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 install
	onWinXPActivePerl	5.8.8.819
In-Reply-To: <456FF53E.90907@sheffield.ac.uk>
References: <002401c714c6$53f65080$15327e82@pyrimidine>
	<456F500A.7010707@sheffield.ac.uk>
	<202B1F50-E905-46DE-9EB5-5F206AC04523@uiuc.edu>
	<456FF233.6040704@sendu.me.uk> <456FF53E.90907@sheffield.ac.uk>
Message-ID: <D464535F-E70F-44B4-AD48-3CC79181869C@uiuc.edu>


On Dec 1, 2006, at 3:26 AM, Nathan S. Haigh wrote:
...
> In addition, using CPAN allows you to run the test suite easily  
> without the need to download it separately and run it after a PPM  
> install.

A PPM, by design, is supposed to imply that the distribution passes  
tests for the specified platform, at that point in time, after all  
prereqs are installed and any additional postinstall operations  
(install C libraries, modify config files, etc) are complete.  The  
ActiveState automated PPM building process dictates that; if it fails  
any test, it will not be made into a PPM.  It's sort of a 'stamp of  
approval' that all tests pass, so you don't need to run them.

However, a test may fail (and a PPM may not get generated) for pretty  
superficial reasons, such as the makefile not specifying that a  
module is needed, server issues, etc, so the automated process isn't  
fullproof.  That's why Kobes and the other repositories are  
available, where the PPM/PPD is manually generated and made to work  
specifically for Windows (or whatever other platform).

Saying that, it is completely up to the person packaging the  
distribution to follow those rules if one were to make a PPM  
manually.  You don't even have to run tests prior to using 'nmake  
ppd'.  We can currently state, though, that all tests pass when all  
prereqs are installed for this distribution.  At least at this point  
in time!

> I don't know of a way to clean out ActivePerl - I use VMWare  
> Workstation and have a virtual machine with a fresh install of  
> WinXP and ActivePerl 5.8.8.819 - maybe someone else has ideas?

I haven't tried it that way.  I have Parallels on Mac OS X (I run a  
SigmaPlot/Excel combo off it).  My tests were using a native WinXP  
installation (i.e. not virtually) on my old Dell.  It shouldn't make  
a difference; VMWare, Parallels, and the like should all run  
ActivePerl for WinXP since it's a virtual machine.  Windows Vista, on  
the other hand...

I think with PPM4 you can install to a custom directory.  It may be  
possible to install all new modules to that directory, then you would  
at least have an idea of what was there (though I don't think you can  
delete it directly w/o screwing up the PPM database).

chris


From bix at sendu.me.uk  Fri Dec  1 17:12:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 17:12:49 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
Message-ID: <45706291.80201@sendu.me.uk>

pelikan at cs.pitt.edu wrote:
> Hello all,
> 
>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
> without Cygwin. The "make test"s have all completed without error. This
> is my first time dealing with bioperl, so bear with me.
> 
>    I've successfully loaded the most recent taxonomy information using the
> biosql-schema scripts. After this, I attempted to load the most recent
> release of the uniprot flat file dataset with the following command:
> 
> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
> 
> I am subsequently greeted by many of the following errors:
> 
> Could not store Q7N3Q6:

I extracted just Q7N3Q6 from 
ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
and was able to load it in using load_seqdatabase.pl under linux with no 
errors. If you make a file with just that sequence do you still get the 
error?

Is anyone else able to reproduce the problem?


From cjfields at uiuc.edu  Fri Dec  1 17:57:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 11:57:18 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45703985.5050203@sendu.me.uk>
Message-ID: <006301c71572$24be8830$15327e82@pyrimidine>


> Chris Fields wrote:
> PPM).  I can 
> > see using CPAN as an alternative way of installing Bioperl for a 
> > distribution, or as the primary method via CVS or manually, but not 
> > for distributions.  At least not until the kinks are worked out for 
> > Windows users.
> 
> CPAN isn't being suggested as the primary or preferred 
> installation method for Windows. That will still be PPM. I'm 
> mentioning CPAN / manual installation in the Windows INSTALL 
> docs for the benefit of anyone who wants a simple install and 
> test environment when checking out from CVS.

That's fine by me.  I think the focus is making sure the PPM works, but that
shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
was never released concurrently with the distribution (if at all); it
generally followed by a few weeks to a few months past a final release.

> > What are the significant issues for a bioperl PPM installation
> 
> None that I'm aware of - I just need to find the time to 
> start looking into generating an appropriate PPD. Hopefully 
> Nathan's wiki page on the subject will be all I need.

I'll try testing it out today and next week (the more people we have looking
into the issue the better).  I'm sure that Module::Build hasn't updated to
using PPM4 XML formatting, but the tags are similar enough.  I can always
create a local PPM database using a similar directory structure to
bioperl.org/DIST and test an installation from it.

chris


From n.haigh at sheffield.ac.uk  Fri Dec  1 18:52:55 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 18:52:55 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707A07.7000106@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   
>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>   

To clarify a few things about PPM4 XML and to highlight the main 
differences:

1) The use of PROVIDE and REQUIRE tags
2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma 
separated tuples like PPM3 XML
4) the VERSION in PROVIDE and REQUIRE are used internally to do version 
comparisons for upgrades and solving prereqs etc
5) Module names should all contain '::' either natively according their 
namespace, if it doesn't have one natively, then one is appended to the 
end e.g. "GD::"
6) the VERSION in the SOFTPKG key is for human readability only
7) the NAME in SOFTPKG is used to identify which packages are actually 
the same.

Nath


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 18:52:57
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From bix at sendu.me.uk  Fri Dec  1 18:52:44 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 18:52:44 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <457079FC.7010209@sendu.me.uk>

Sendu Bala wrote:
> pelikan at cs.pitt.edu wrote:
[snip]
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
> 
> I extracted just Q7N3Q6 from 
> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux with no 
> errors. If you make a file with just that sequence do you still get the 
> error?
> 
> Is anyone else able to reproduce the problem?

In fact, if I just try and load it again I reproduce the problem.
The situation is similar to http://bugzilla.bioperl.org/show_bug.cgi?id=2092

And I have a tentative fix that extends Brian's fix there. Committed to 
HEAD only atm. I don't know anything about bioperl-db and don't have the 
faintest clue why this is happening, nor the time to figure it out. Can 
someone please have a proper look at this and decide if my fix is sane?

All I can say is the the test suites for bioperl-live and bioperl-db 
continue to pass, but that isn't really saying much.


PS. having used load_seqdatabase.pl to load a sequence, how do I remove 
it afterwards?


From cjfields at uiuc.edu  Fri Dec  1 19:00:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:00:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <EAE311A7-DB66-4CFC-9598-EA6FCAED9B7F@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

I can reproduce on both WinXP and Mac OS X using the latest bioperl- 
db/bioperl-live and a BioSQL database preloaded with taxonomy.   
Notably the bug doesn't show up with a database lacking taxonomy,  
where no lookup is used (I guess).

Here's some overly verbose debugging (apologies):

Loading saved.flat ...
attempting to load adaptor class for Bio::Seq::RichSeq
	attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
	attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
	attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Tree::Tree
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Root::Root
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
	attempting to load module Bio::DB::BioSQL::RootIAdaptor
	attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Tree::TreeI
	attempting to load module Bio::DB::BioSQL::TreeIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeAdaptor
attempting to load adaptor class for Bio::Tree::NodeI
	attempting to load module Bio::DB::BioSQL::NodeIAdaptor
	attempting to load module Bio::DB::BioSQL::NodeAdaptor
attempting to load adaptor class for Bio::Tree::TreeFunctionsI
	attempting to load module Bio::DB::BioSQL::TreeFunctionsIAdaptor
	attempting to load module Bio::DB::BioSQL::TreeFunctionsAdaptor
no adaptor found for class Bio::Tree::Tree
attempting to load adaptor class for Bio::DB::Taxonomy::list
	attempting to load module Bio::DB::BioSQL::listAdaptor
attempting to load adaptor class for Bio::DB::Taxonomy
	attempting to load module Bio::DB::BioSQL::TaxonomyAdaptor
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load adaptor class for Bio::Annotation::Collection
	attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor
	attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
	attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
	attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
	attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
	attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
	attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
	attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
	attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
	attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
	attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
	attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
	attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
	attempting to load module Bio::DB::BioSQL::LocationIAdaptor
	attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
	attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,  
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "Swiss-Prot" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority)  
VALUES (?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "Swiss- 
Prot" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
no adaptor found for class Bio::Tree::Tree
no adaptor found for class Bio::DB::Taxonomy::list
attempting to load driver for adaptor class  
Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for  
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon,  
taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class  
= ? AND ncbi_taxon_id = ?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "141679" (ncbi_taxid)
prepare SELECT CLASSIFICATION: SELECT name.name, node.node_rank FROM  
taxon node, taxon taxon, taxon_name name WHERE name.taxon_id =  
node.taxon_id AND taxon.left_value BETWEEN node.left_value AND  
node.right_value AND taxon.taxon_id = ? AND name.name_class =  
'scientific name' ORDER BY node.left_value
attempting to load driver for adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load driver for adaptor class  
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver  
peer for Bio::DB::BioSQL::SeqAdaptor
Could not store Q7N3Q6:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The supplied lineage does not start near 'Photorhabdus  
luminescens subsp. laumondii'
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::Species::classification /Users/cjfields/src/bioperl-live/ 
Bio/Species.pm:166
STACK: Bio::DB::Persistent::PersistentObject::AUTOLOAD /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:552
STACK: Bio::DB::BioSQL::SpeciesAdaptor::populate_from_row /Library/ 
Perl/5.8.6/Bio/DB/BioSQL/SpeciesAdaptor.pm:281
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1305
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:973
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / 
Library/Perl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182
STACK: Bio::DB::Persistent::PersistentObject::create /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /Library/Perl/ 
5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /Library/Perl/ 
5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK: load_seqdatabase.pl:620
-----------------------------------------------------------

at load_seqdatabase.pl line 633


chris


From cjfields at uiuc.edu  Fri Dec  1 19:01:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:01:59 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <45707A07.7000106@sheffield.ac.uk>
References: <006301c71572$24be8830$15327e82@pyrimidine>
	<45707A07.7000106@sheffield.ac.uk>
Message-ID: <C233572F-BD36-4DBE-BE9B-2C097F4C939B@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>>> Chris Fields wrote:
>>> PPM).  I can
>>>> see using CPAN as an alternative way of installing Bioperl for a  
>>>> distribution, or as the primary method via CVS or manually, but  
>>>> not for distributions.  At least not until the kinks are worked  
>>>> out for Windows users.
>>>>
>>> CPAN isn't being suggested as the primary or preferred  
>>> installation method for Windows. That will still be PPM. I'm  
>>> mentioning CPAN / manual installation in the Windows INSTALL docs  
>>> for the benefit of anyone who wants a simple install and test  
>>> environment when checking out from CVS.
>>>
>>
>> That's fine by me.  I think the focus is making sure the PPM  
>> works, but that
>> shouldn't hold up the final 1.5.2 release.  The PPM for previous  
>> releases
>> was never released concurrently with the distribution (if at all); it
>> generally followed by a few weeks to a few months past a final  
>> release.
>>
>>
>>>> What are the significant issues for a bioperl PPM installation
>>>>
>>> None that I'm aware of - I just need to find the time to start  
>>> looking into generating an appropriate PPD. Hopefully Nathan's  
>>> wiki page on the subject will be all I need.
>>>
>>
>> I'll try testing it out today and next week (the more people we  
>> have looking
>> into the issue the better).  I'm sure that Module::Build hasn't  
>> updated to
>> using PPM4 XML formatting, but the tags are similar enough.  I can  
>> always
>> create a local PPM database using a similar directory structure to
>> bioperl.org/DIST and test an installation from it.
>>
>> chris
>>
>
> To clarify a few things about PPM4 XML and to highlight the main  
> differences:
>
> 1) The use of PROVIDE and REQUIRE tags
> 2) PPM4 XML "should" contain PROVIDE tags for ALL bioperl modules.
> 3) VERSION in PROVIDE and REQUIRE tags should be floats, not comma  
> separated tuples like PPM3 XML
> 4) the VERSION in PROVIDE and REQUIRE are used internally to do  
> version comparisons for upgrades and solving prereqs etc
> 5) Module names should all contain '::' either natively according  
> their namespace, if it doesn't have one natively, then one is  
> appended to the end e.g. "GD::"
> 6) the VERSION in the SOFTPKG key is for human readability only
> 7) the NAME in SOFTPKG is used to identify which packages are  
> actually the same.
>
> Nath

Okay.  Maybe place this in the wiki (PPM page) for future reference?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Dec  1 19:05:38 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 01 Dec 2006 19:05:38 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 RC5 installonWinXPActivePerl	5.8.8.819
In-Reply-To: <006301c71572$24be8830$15327e82@pyrimidine>
References: <006301c71572$24be8830$15327e82@pyrimidine>
Message-ID: <45707D02.9070504@sheffield.ac.uk>

Chris Fields wrote:
>> Chris Fields wrote:
>> PPM).  I can 
>>     
>>> see using CPAN as an alternative way of installing Bioperl for a 
>>> distribution, or as the primary method via CVS or manually, but not 
>>> for distributions.  At least not until the kinks are worked out for 
>>> Windows users.
>>>       
>> CPAN isn't being suggested as the primary or preferred 
>> installation method for Windows. That will still be PPM. I'm 
>> mentioning CPAN / manual installation in the Windows INSTALL 
>> docs for the benefit of anyone who wants a simple install and 
>> test environment when checking out from CVS.
>>     
>
> That's fine by me.  I think the focus is making sure the PPM works, but that
> shouldn't hold up the final 1.5.2 release.  The PPM for previous releases
> was never released concurrently with the distribution (if at all); it
> generally followed by a few weeks to a few months past a final release.
>
>   

Forgot to say, one really annoying thing about PPM is that it seems to 
display all the versions of Bioperl defined in the XML file. An 
addition, I think a bug in PPM4 means that if a package is available in 
ActiveStates repo PPM4 always want to install it rather than a more 
recent version in a different repo (this includes upgrades). This 
results in this annoying behaviour:
1) If activestate and bioperl repos are active, searching for bioperl 
lists several versions
2) If you are using PPM4 GUI, and have installed a non activestate 
version, then it says you can upgrade to the version in activestates 
repo (even if it's actually a downgrade).
3) Using ppm-shell, if you choose "install bioperl" or "upgrade bioperl" 
it will always install the version in the activestate repo.
4) I'm sure there are also some other annoyances.

In the end, it means the best way to install and upgrade bioperl, is to 
search for bioperl packages and install the latest version by eye rather 
than relying in the "upgrade feature" (at least for the time being). You 
may also need to remove an old version of bioperl before installing a 
more recent version. NOTE: using "upgrade" runs the risk of installing 
bioperl 1.2.3 from activestate and not the latest version in any other repo!

I'll update the wiki when I have time.
Nath


>>> What are the significant issues for a bioperl PPM installation
>>>       
>> None that I'm aware of - I just need to find the time to 
>> start looking into generating an appropriate PPD. Hopefully 
>> Nathan's wiki page on the subject will be all I need.
>>     
>
> I'll try testing it out today and next week (the more people we have looking
> into the issue the better).  I'm sure that Module::Build hasn't updated to
> using PPM4 XML formatting, but the tags are similar enough.  I can always
> create a local PPM database using a similar directory structure to
> bioperl.org/DIST and test an installation from it.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0652-4, 30/11/2006
> Tested on: 01/12/2006 18:29:23
> avast! - copyright (c) 1988-2006 ALWIL Software.
> http://www.avast.com
>
>
>
>   


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0652-4, 30/11/2006
Tested on: 01/12/2006 19:05:39
avast! - copyright (c) 1988-2006 ALWIL Software.
http://www.avast.com


From cjfields at uiuc.edu  Fri Dec  1 19:06:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:06:53 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <45706291.80201@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
Message-ID: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>


On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:

> pelikan at cs.pitt.edu wrote:
>> Hello all,
>>
>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>> without Cygwin. The "make test"s have all completed without error.  
>> This
>> is my first time dealing with bioperl, so bear with me.
>>
>>    I've successfully loaded the most recent taxonomy information  
>> using the
>> biosql-schema scripts. After this, I attempted to load the most  
>> recent
>> release of the uniprot flat file dataset with the following command:
>>
>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root - 
>> dbpass
>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>
>> I am subsequently greeted by many of the following errors:
>>
>> Could not store Q7N3Q6:
>
> I extracted just Q7N3Q6 from
> ftp://ftp.expasy.org/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat.gz
> and was able to load it in using load_seqdatabase.pl under linux  
> with no
> errors. If you make a file with just that sequence do you still get  
> the
> error?
>
> Is anyone else able to reproduce the problem?

Okay, just updated to get your latest CVS fixes for bioperl-live and  
it passes now for both Mac OS X and WinXP.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Fri Dec  1 19:09:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:09:15 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <A85B86B9-3DCD-4855-AC06-675D19E3689E@uiuc.edu>


On Dec 1, 2006, at 12:52 PM, Sendu Bala wrote:

>
> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

There's not much documentation on it, but it demonstrated several  
times in the test suite.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Dec  1 19:39:17 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 01 Dec 2006 19:39:17 +0000
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk>
	<0B67001A-9642-422E-A9FB-C9611004510E@uiuc.edu>
Message-ID: <457084E5.2050300@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 1, 2006, at 11:12 AM, Sendu Bala wrote:
> 
>> pelikan at cs.pitt.edu wrote:
>>> Hello all,
>>>
>>>  I'm running bioperl 1.5.2, bioperl-db 1.5.2 - RC005, under windows,
>>> without Cygwin. The "make test"s have all completed without error. This
>>> is my first time dealing with bioperl, so bear with me.
>>>
>>>    I've successfully loaded the most recent taxonomy information 
>>> using the
>>> biosql-schema scripts. After this, I attempted to load the most recent
>>> release of the uniprot flat file dataset with the following command:
>>>
>>> load_seqdatabase.pl -drive mysql -dbname bioseqdb -dbuser root -dbpass
>>> ********* -format swiss -safe c:\data\uniprot\uniprot_sprot.dat
>>>
>>> I am subsequently greeted by many of the following errors:
>>>
>>> Could not store Q7N3Q6:
>>
>> I extracted just Q7N3Q6 from
>> ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz 
>>
>> and was able to load it in using load_seqdatabase.pl under linux with no
>> errors. If you make a file with just that sequence do you still get the
>> error?
>>
>> Is anyone else able to reproduce the problem?
> 
> Okay, just updated to get your latest CVS fixes for bioperl-live and it 
> passes now for both Mac OS X and WinXP.

Can you confirm if it is actually working correctly though? Like, having 
stored a previously-problem sequence, can you get it back out from the 
database and is its ->species() correct?


From cjfields at uiuc.edu  Fri Dec  1 19:52:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 1 Dec 2006 13:52:13 -0600
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457084E5.2050300@sendu.me.uk>
Message-ID: <000001c71582$329d4d50$15327e82@pyrimidine>

> > 
> > Okay, just updated to get your latest CVS fixes for 
> bioperl-live and 
> > it passes now for both Mac OS X and WinXP.
> 
> Can you confirm if it is actually working correctly though? 
> Like, having stored a previously-problem sequence, can you 
> get it back out from the database and is its ->species() correct?

I would assume so, if we can trust the species tests.  I will have to try it
again over the weekend.  I planned on loading a ton of protein sequences in
anyway, most of which are bacterial; if anything breaks it will probably be
with those.

I think Jason and Hilmar were going to get together about the BioSQL paper
at the hackathon.  That may be a good place to bring some of the species
issues, if they persist.

chris


From hlapp at gmx.net  Sat Dec  2 01:42:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 1 Dec 2006 20:42:05 -0500
Subject: [Bioperl-l] Error with supplied lineages importing uniprot data
In-Reply-To: <457079FC.7010209@sendu.me.uk>
References: <1348.130.49.222.58.1164925169.squirrel@webmail.cs.pitt.edu>
	<45706291.80201@sendu.me.uk> <457079FC.7010209@sendu.me.uk>
Message-ID: <8414723F-BA02-4936-8F53-781276C3B526@gmx.net>

Either using SQL:

	-- theoretically you should convince yourself first that there
	-- is only one such record (the UK is over acc,version,namespace)
	SQL> DELETE FROM bioentry WHERE accession = 'Q7N3Q6';

or through bioperl-db (see the delete test for examples):

	my $db = Bio::DB::BioDB->new(....);
	my $seq = Bio::PrimarySeq->new(-accession_number=>'Q7N3Q6',
	                               -namespace=>'whatever you used when  
loading');
	my $adp = $db->get_persistence_adaptor($seq);
	my $pseq = $adp->find_by_unique_key($seq);
	$pseq->remove();
	$pseq->commit();

-hilmar

On Dec 1, 2006, at 1:52 PM, Sendu Bala wrote:

> PS. having used load_seqdatabase.pl to load a sequence, how do I  
> remove
> it afterwards?

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From chhalling at verizon.net  Mon Dec  4 01:56:51 2006
From: chhalling at verizon.net (Conrad Halling)
Date: Sun, 03 Dec 2006 20:56:51 -0500
Subject: [Bioperl-l] BioPerl Wiki is down
Message-ID: <45738063.1070504@verizon.net>

When I attempted to navigate to http://www.bioperl.org/, I got the 
following message:

A database query syntax error has occurred. This may indicate a bug in 
the software. The last attempted database query was:

    (SQL query hidden)

from within function "MediaWikiBagOStuff::_doquery". MySQL returned 
error "1205: Lock wait timeout exceeded; try restarting transaction 
(localhost)".

-- 
Conrad Halling
chhalling at verizon.net


From rbirnie at totalise.co.uk  Sun Dec  3 21:38:02 2006
From: rbirnie at totalise.co.uk (richard)
Date: Sun, 3 Dec 2006 21:38:02 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
Message-ID: <200612032138.02522.rbirnie@totalise.co.uk>

Hi all,

I'm having a little trouble getting Bio::Graphics to give me the correct 
output and I'm looking for some help. I am trying to extend from example 5 of 
the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 
Eventually I intend the script to follow example 6 but I thought I'd try the 
simpler version first.

The basic aim of the script is that it takes as input a file containing a list 
of GenBank IDs plus some other info for alternative transcripts of a gene. 
This information is stored in a hash and the GenBank IDs are used to retrieve 
the appropriate entries from GenBank. I then want to use Bio::Graphics to 
generate a figure from the feature tables showing the CDSs from the 
alternative transcripts. 

So far I have managed to retrieve the GenBank entries extract the feature 
tables and store a reference to these in the hash mentioned above. I've also 
got Bio::Graphics to draw a basic image but some of the details aren't right 
and I don't understand why. I have attached the code I have so far, the input 
file and the output image to this mail. I didn't want to display it all in 
the main message but I'm not actually sure which bit is causing the problem. 
The code is very rough and in need of polishing but I need to get it to work 
correctly first.

These are the problems:
1) As I understand it this:

my $wholeseq = Bio::SeqFeature::Generic->new (
		-start => 1,
		-end => $refseq->length,
		-display_name =>$refseq->display_name
		);

should display the name of the gene (CD133/Prominin1) near the top of image. 
It doesn't, am I misunderstanding or is there an error in the code?

2) In the quoted example the CDS is broken up into smaller regions which are 
then linked together in example 6. This isn't happening in my code and I 
think it should be, I get one solid block for the CDS. I don't understand why 
this is because I'm not clear which parts of the feature table are used to 
define where the CDS should be split. I think this is the relevant bit of 
code:

foreach my $alt_trans (keys %main) {
	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

		my $feature = $main{$alt_trans}{'features'}{$tag};

		$panel->add_track($feature,
				-glyph => 'generic',
				-bgcolor => $colors[$idx++ % @colors],
				-fgcolor => 'black',
				-font2color => 'black',
				-key => $alt_trans,
				-bump => +1,
				-height => 8,
				-label => 1,
				-description => 1,
				) if ($tag eq 'CDS');

}
}

Can anyone tell me what I am doing wrong?

RefSeq entry for the gene of interest is here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386
If I understand correctly the example file used in the HOWTO is this gene:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=116805320

Final question, does bioperl come with example scripts and is so where whould 
they normally be found on a Linux system?

If anyone is still reading this thanks for your patience. Any clarification 
will be appreciated.

regards,
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133_graphic_code
Type: application/x-perl
Size: 2702 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment.pl>
-------------- next part --------------
sequence_ID	Exon_Boundary	Assay_location	Amplicon_length
NM_006017	9 - 10	1118	106
AF027208.1	9 - 10	1118	106
AK027420.1	9 - 10	1312	106
AK027422.1	9 - 10	1334	106
BC012089.1	9 - 10	1289	106
AY449689.1	8 - 9	1054	106
AY449690.1	8 - 9	1054	106
AY449691.1	8 - 9	1054	106
AY449692.1	9 - 10	1081	106
AY449693.1	9 - 10	1081	106
AF507034.1	8 - 9	1091	106
AK075411.1	9 - 10	1289	106
AF117225.1	9 - 10	1334	106
AK226033.1	-	1312	106
DQ895452.1	-	1054	106
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CD133.png
Type: image/png
Size: 4322 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061203/d5bd52ae/attachment-0004.png>

From cjfields at uiuc.edu  Mon Dec  4 03:35:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Dec 2006 21:35:17 -0600
Subject: [Bioperl-l] BioPerl Wiki is down
In-Reply-To: <45738063.1070504@verizon.net>
References: <45738063.1070504@verizon.net>
Message-ID: <41422FC7-B579-4B45-B8CC-341B8F462BCB@uiuc.edu>

On Dec 3, 2006, at 7:56 PM, Conrad Halling wrote:

> When I attempted to navigate to http://www.bioperl.org/, I got the
> following message:
>
> A database query syntax error has occurred. This may indicate a bug in
> the software. The last attempted database query was:
>
>     (SQL query hidden)
>
> from within function "MediaWikiBagOStuff::_doquery". MySQL returned
> error "1205: Lock wait timeout exceeded; try restarting transaction
> (localhost)".
>
> -- Conrad Halling
> chhalling at verizon.net

This has been an ongoing problem with the server; I have reported it  
previously to open-bio support.  There have been a few attempts to  
fix it which seem to work short-term but something else must be  
wrong.  Jason?  Chris D?

For my part, Googling found the following link, which indicates that  
this error may be due to heavy server load:

http://tibia.erig.net/TibiaWiki:Bug_reports

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Derek.Fairley at bll.n-i.nhs.uk  Mon Dec  4 10:18:37 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Mon, 4 Dec 2006 10:18:37 -0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C63D@bllmail.bll.n-i.nhs.uk>

Richard,

 
You can find instructions for installing the example scripts directory
here:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPE
RL_SCRIPTS 

 
or you can get individual scripts from here:

http://www.bioperl.org/wiki/Bioperl_scripts11 

 
Derek.

 
-----Original Message-----

From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of richard

Sent: 03 December 2006 21:38

To: Bioperl list

Subject: [Bioperl-l] confused by Bio::Graphics

 
Hi all,

 
I'm having a little trouble getting Bio::Graphics to give me the correct


output and I'm looking for some help. I am trying to extend from example
5 of 

the Graphics HOWTO on the bioperl wiki using version 1.4 of Bioperl. 

Eventually I intend the script to follow example 6 but I thought I'd try
the 

simpler version first.

 
The basic aim of the script is that it takes as input a file containing
a list 

of GenBank IDs plus some other info for alternative transcripts of a
gene. 

This information is stored in a hash and the GenBank IDs are used to
retrieve 

the appropriate entries from GenBank. I then want to use Bio::Graphics
to 

generate a figure from the feature tables showing the CDSs from the 

alternative transcripts. 

 
So far I have managed to retrieve the GenBank entries extract the
feature 

tables and store a reference to these in the hash mentioned above. I've
also 

got Bio::Graphics to draw a basic image but some of the details aren't
right 

and I don't understand why. I have attached the code I have so far, the
input 

file and the output image to this mail. I didn't want to display it all
in 

the main message but I'm not actually sure which bit is causing the
problem. 

The code is very rough and in need of polishing but I need to get it to
work 

correctly first.

 
These are the problems:

1) As I understand it this:

 
my $wholeseq = Bio::SeqFeature::Generic->new (

            -start => 1,

            -end => $refseq->length,

            -display_name =>$refseq->display_name

            );

 
should display the name of the gene (CD133/Prominin1) near the top of
image. 

It doesn't, am I misunderstanding or is there an error in the code?

 
2) In the quoted example the CDS is broken up into smaller regions which
are 

then linked together in example 6. This isn't happening in my code and I


think it should be, I get one solid block for the CDS. I don't
understand why 

this is because I'm not clear which parts of the feature table are used
to 

define where the CDS should be split. I think this is the relevant bit
of 

code:

 
foreach my $alt_trans (keys %main) {

      foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {

 
            my $feature = $main{$alt_trans}{'features'}{$tag};

 
            $panel->add_track($feature,

                        -glyph => 'generic',

                        -bgcolor => $colors[$idx++ % @colors],

                        -fgcolor => 'black',

                        -font2color => 'black',

                        -key => $alt_trans,

                        -bump => +1,

                        -height => 8,

                        -label => 1,

                        -description => 1,

                        ) if ($tag eq 'CDS');

 
}

}

 
Can anyone tell me what I am doing wrong?

 
RefSeq entry for the gene of interest is here:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=5174386

If I understand correctly the example file used in the HOWTO is this
gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1168053
20

 
Final question, does bioperl come with example scripts and is so where
whould 

they normally be found on a Linux system?

 
If anyone is still reading this thanks for your patience. Any
clarification 

will be appreciated.

 
regards,

Richard

 
From rbirnie at totalise.co.uk  Mon Dec  4 09:30:36 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 09:30:36 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/551f1442/attachment-0004.html>

From bix at sendu.me.uk  Mon Dec  4 14:37:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:37:16 +0000
Subject: [Bioperl-l] BLASTing with a seqio/seq object...
In-Reply-To: <45706671.9000201@york.ac.uk>
References: <01ba01c714a2$b9659c10$15327e82@pyrimidine>	<456F27E9.70205@york.ac.uk>
	<456FEF22.4090004@sendu.me.uk> <45706671.9000201@york.ac.uk>
Message-ID: <4574329C.2030905@sendu.me.uk>

Samantha Thompson wrote:
> Hi,
> Thanks for all your help so far, I am still trying to understand a 
> couple of things...

You should make sure your replies are sent to the list, as you're likely 
to get a faster response.


[where $blast_report is the value returned by 
Bio::Tools::Run::RemoteBlast->submit_blast($seq_object)]
> when I run this line..
> 
> $searchio = Bio::SearchIO->new(-format <http://www.perldoc.com/perl5.6/pod/func/format.html> => 'blast',
>                                -file   => $blast_report);
> 
> between submitting the blast search and trying to to process the searchio object like I was attempting before I get the following errors back:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not open 1: No such file or directory
[snip]
> Does this mean that my BLAST is failing when I submit it?

No, the -file option of SearchIO->new() takes, unsurprisingly, a 
filename. I'd tell you to pay careful attention to the docs, but sadly 
the RemoteBlast docs are currently wrong.

submit_blast() claims to return 'Blast report object' (which in any case 
certainly wouldn't be a filename) when in fact it returns, as you 
discovered, a (for our purposes) meaningless number.

As I suggested before, you need to look at the synopsis for 
Bio::Tools::Run::RemoteBlast instead.

(having called submit_blast you must do the each_rid loop)


Does anyone care to go through the POD for RemoteBlast and update it to 
an accurate state?


From bix at sendu.me.uk  Mon Dec  4 14:40:27 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 14:40:27 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
References: <mailman.161.1165197640.2417.bioperl-l@lists.open-bio.org>
	<BV.WM.2.0.pv.1.0.16.0612040930360.48622@webm7.global.net.uk>
Message-ID: <4574335B.805@sendu.me.uk>

rbirnie at totalise.co.uk wrote:
> Hi all,
> 
> I've just seen my previous mail come through on the digest and I noticed 
> that the code I attached has been scrubbed which means that the message 
> won't make much sense. If I've contravened list rules by posting 
> attachments then apologies, I did look for a posting guide but couldn't 
> see one on the wiki. I deliberatley didn't put the whole code in the 
> main message because it's quite long. I'm not sure which part is wrong 
> so I don't know which part to post I'm just not seeing the output I 
> would expect from the example. What is the best thing for me to do?

I saw a few attachments on your post (including your code example), so I 
think what you did was fine.


From cjfields at uiuc.edu  Mon Dec  4 15:40:20 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 09:40:20 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <4574335B.805@sendu.me.uk>
Message-ID: <002001c717ba$823c1500$15327e82@pyrimidine>


> rbirnie at totalise.co.uk wrote:
> > Hi all,
> > 
> > I've just seen my previous mail come through on the digest and I 
> > noticed that the code I attached has been scrubbed which means that 
> > the message won't make much sense. If I've contravened list 
> rules by 
> > posting attachments then apologies, I did look for a 
> posting guide but 
> > couldn't see one on the wiki. I deliberatley didn't put the 
> whole code 
> > in the main message because it's quite long. I'm not sure 
> which part 
> > is wrong so I don't know which part to post I'm just not seeing the 
> > output I would expect from the example. What is the best 
> thing for me to do?
> 
> I saw a few attachments on your post (including your code 
> example), so I think what you did was fine.

Same here.  I received a PNG file and two text files (a script and a data
file).

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

 
From rbirnie at totalise.co.uk  Mon Dec  4 16:06:51 2006
From: rbirnie at totalise.co.uk (rbirnie at totalise.co.uk)
Date: 04 Dec 2006 16:06:51 +0000
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <002001c717ba$823c1500$15327e82@pyrimidine>
References: <002001c717ba$823c1500$15327e82@pyrimidine>
Message-ID: <BV.WM.2.0.pv.1.0.16.0612041606510.37306@webm5.global.net.uk>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/22c3c5e0/attachment-0004.html>

From dmessina at wustl.edu  Mon Dec  4 16:46:16 2006
From: dmessina at wustl.edu (David Messina)
Date: Mon, 4 Dec 2006 10:46:16 -0600
Subject: [Bioperl-l] confused by Bio::Graphics
In-Reply-To: <200612032138.02522.rbirnie@totalise.co.uk>
References: <200612032138.02522.rbirnie@totalise.co.uk>
Message-ID: <ACE259C3-DC1C-41CC-88F3-7ACF8B9D66AA@wustl.edu>

Hi Richard,


> [richard]
>
> These are the problems:
> 1) As I understand it this:
>
> my $wholeseq = Bio::SeqFeature::Generic->new (
> 		-start => 1,
> 		-end => $refseq->length,
> 		-display_name =>$refseq->display_name
> 		);
>
> should display the name of the gene (CD133/Prominin1) near the top  
> of image.
> It doesn't, am I misunderstanding or is there an error in the code?

The contents of a sequence object's display_name varies depending on  
the type of sequence record; for a sequence object created from a  
Genbank record, it's the value of the LOCUS field on the first line  
of the record.

If you want the gene name, you'll have to dig it out of the feature  
table. If you look at the  Genbank record for your first sequence,  
you'll see that under both the gene and CDS primary features, the  
HUGO gene abbreviation is stored under the "gene" secondary tag, and  
various synonyms are under the "note" and "product" secondary tags.

LOCUS       NM_006017               3794 bp    mRNA    linear   PRI  
17-NOV-2006
DEFINITION  Homo sapiens prominin 1 (PROM1), mRNA.
ACCESSION   NM_006017
VERSION     NM_006017.1  GI:5174386
[...skipping irrelevant part of the Genbank record...]
FEATURES             Location/Qualifiers
      source          1..3794
                      /organism="Homo sapiens"
                      /mol_type="mRNA"
                      /db_xref="taxon:9606"
                      /chromosome="4"
                      /map="4p15.32"
      gene            1..3794
                      /gene="PROM1"
                      /note="prominin 1; synonyms: AC133, CD133, PROML1,
                      MSTP061"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
      CDS             38..2635
                      /gene="PROM1"
                      /go_component="integral to plasma membrane  
[pmid 9389720];
                      membrane"
                      /go_process="response to stimulus; visual  
perception"
                      /note="hProminin; prominin (mouse)-like 1;  
hematopoietic
                      stem cell antigen"
                      /codon_start=1
                      /product="prominin 1"
                      /protein_id="NP_006008.1"
                      /db_xref="GI:5174387"
                      /db_xref="GeneID:8842"
                      /db_xref="HGNC:9454"
                      /db_xref="HPRD:HPRD_05079"
                      /db_xref="MIM:604365"
[....more...]

In your script, you grab the primary features between lines 34-60.  
You can grab the secondary feature you want with something like:

[cribbed from the Feature-Annotation HOWTO]
for my $feat_object ($seq_object->get_SeqFeatures) {
    push @ids, $feat_object->get_tag_values("gene") if ($feat_object- 
 >has_tag("gene"));
}


> 2) In the quoted example the CDS is broken up into smaller regions  
> which are
> then linked together in example 6. This isn't happening in my code  
> and I
> think it should be, I get one solid block for the CDS. I don't  
> understand why
> this is because I'm not clear which parts of the feature table are  
> used to
> define where the CDS should be split. I think this is the relevant  
> bit of
> code:
>
> foreach my $alt_trans (keys %main) {
> 	foreach my $tag (keys %{ $main{$alt_trans}{'features'} }) {
>
> 		my $feature = $main{$alt_trans}{'features'}{$tag};
>
> 		$panel->add_track($feature,
> 				-glyph => 'generic',
> 				-bgcolor => $colors[$idx++ % @colors],
> 				-fgcolor => 'black',
> 				-font2color => 'black',
> 				-key => $alt_trans,
> 				-bump => +1,
> 				-height => 8,
> 				-label => 1,
> 				-description => 1,
> 				) if ($tag eq 'CDS');
>
> }
> }


The problem here is that RefSeq mRNA records don't contain intron- 
exon boundary information. I think you'll have to get that from an  
assembly record. From the Entrez gene page for PROM1, I obtained a  
Genbank record for the PROM1 genomic locus:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
val=NC_000004.10&from=15578955&to=15686664&strand=2&dopt=gb

Saving that as 'PROM1.gb' (the suffix is important), and running the  
bp_embl2picture.pl script on it, I got an image similar to Figure 6  
(attached).

Hope this helps,
Dave


?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PROM1.png
Type: image/png
Size: 8646 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061204/4add2cbc/attachment-0004.png>

From bix at sendu.me.uk  Mon Dec  4 19:37:13 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 04 Dec 2006 19:37:13 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <000001c717db$3ca7b910$15327e82@pyrimidine>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
Message-ID: <457478E9.3060405@sendu.me.uk>

Chris Fields wrote:
> Sendu,
> 
> Are current plans to still try getting the final 1.5.2 release out
> before the hackathon next week?

Yes, I seriously hope so. I was kind of hoping to see test results from 
you and Nathan on the wiki though...


> There are a few commits I want to make, but I may wait until after
> 1.5.2 is out before I add them.

But don't let the release stop you. As long as you don't commit to the
1.5.2 branch it will be fine.


From cjfields at uiuc.edu  Mon Dec  4 19:34:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 13:34:34 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
Message-ID: <000001c717db$3ca7b910$15327e82@pyrimidine>

Sendu,

Are current plans to still try getting the final 1.5.2 release out before
the hackathon next week?  There are a few commits I want to make, but I may
wait until after 1.5.2 is out before I add them.

chris

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From cjfields at uiuc.edu  Mon Dec  4 20:23:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 4 Dec 2006 14:23:45 -0600
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
Message-ID: <000001c717e2$19d18e00$15327e82@pyrimidine>

> Chris Fields wrote:
> > Sendu,
> > 
> > Are current plans to still try getting the final 1.5.2 release out 
> > before the hackathon next week?
> 
> Yes, I seriously hope so. I was kind of hoping to see test 
> results from you and Nathan on the wiki though...

Ah, forgot to post those!  Working on that now...

> > There are a few commits I want to make, but I may wait until after
> > 1.5.2 is out before I add them.
> 
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.

There are a few things I plan on adding over the next few weeks, including
some things for Bio::Location::SplitLocation.  However I'm sure some of the
latter will break tests, so I'll be adding it in a bit at a time.

It all depends when I can squeeze time in to work on them!

chris 


From pelikan at cs.pitt.edu  Mon Dec  4 22:34:59 2006
From: pelikan at cs.pitt.edu (pelikan at cs.pitt.edu)
Date: Mon, 4 Dec 2006 17:34:59 -0500 (EST)
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
Message-ID: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>

Hello,

    My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC, and the
latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
memory. "make test"s past fine.

The problem is that I'm not getting similar numbers of anything when I
load datasets using load_seqdatabase.pl. For instance, if I want to load
only protiens from Homo Sapiens,
I go to UniProt,
use the database search function,
do a text search for Homo Sapiens (returns 70914 hits),
export the hits to flat file format (--format swiss) using the data set
manager,
and load it using load_seqdatabase.pl.

The result of  "select count(*) from bioentry;" results in only 1003 entries.
Moreover it seems like the entries don't go past the B's in the alphabet -
I can't find bioentry.descriptions like '%cytochrome%' or '%myoglobin%',
but I can find apolipoproteins, for example.

I know this is an annoying question, but if someone has more experience in
dealing with this issue, I would be grateful for any assistance. I don't
get any error messages, so it's difficult for me to tell what's going on.

-Richard


From n.haigh at sheffield.ac.uk  Tue Dec  5 06:53:34 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 06:53:34 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <4575176E.3020906@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

OK, I'll get onto this today.

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From n.haigh at sheffield.ac.uk  Tue Dec  5 11:43:16 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Tue, 05 Dec 2006 11:43:16 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <457478E9.3060405@sendu.me.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk>
Message-ID: <45755B54.7080902@sheffield.ac.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>   
>> Sendu,
>>
>> Are current plans to still try getting the final 1.5.2 release out
>> before the hackathon next week?
>>     
>
> Yes, I seriously hope so. I was kind of hoping to see test results from 
> you and Nathan on the wiki though...
>
>
>   

I've added my test results for Debian to the wiki.
Nath

>> There are a few commits I want to make, but I may wait until after
>> 1.5.2 is out before I add them.
>>     
>
> But don't let the release stop you. As long as you don't commit to the
> 1.5.2 branch it will be fine.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


-- 
> A: Yes.
>> Q: Are you sure?
>>     
>>> A: Because it reverses the logical flow of conversation.
>>>       
>>>> Q: Why is top posting frowned upon?
>>>>         
Get Thunderbird <http://www.mozilla.org/products/thunderbird/>


From bix at sendu.me.uk  Tue Dec  5 11:47:06 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 05 Dec 2006 11:47:06 +0000
Subject: [Bioperl-l] Timeline on the 1.5.2 release?
In-Reply-To: <45755B54.7080902@sheffield.ac.uk>
References: <000001c717db$3ca7b910$15327e82@pyrimidine>
	<457478E9.3060405@sendu.me.uk> <45755B54.7080902@sheffield.ac.uk>
Message-ID: <45755C3A.9050903@sendu.me.uk>

Nathan S. Haigh wrote:
> Sendu Bala wrote:
>> Chris Fields wrote:
>>   
>>> Sendu,
>>>
>>> Are current plans to still try getting the final 1.5.2 release out
>>> before the hackathon next week?
>>>     
>> Yes, I seriously hope so. I was kind of hoping to see test results from 
>> you and Nathan on the wiki though...
>
> I've added my test results for Debian to the wiki.

Thanks (and to Chris as well). I can't tell you how much I loath and 
despise TCoffee and Tmhmm now ;)


From cjfields at uiuc.edu  Tue Dec  5 16:04:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Dec 2006 10:04:38 -0600
Subject: [Bioperl-l] Build.PL changes
Message-ID: <001b01c71887$10be3160$15327e82@pyrimidine>

Sendu,

I think the Build.PL commits which force installation of XML::SAX::Expat
should be rolled back.  XML::Simple works with any XML::SAX backend, not
just XML::SAX::Expat, which hasn't been actively maintained since 2003 and
is deprecated in favor of XML::SAX::ExpatXS.  In fact, forcing
XML::SAX::Expat to install as the default XML::SAX backend currently breaks
blastxml parsing.

Note that forcing this also forces one to install the Expat library (now at
v 2), which now has some compatibility problems with XML::SAX::Expat (but
not ExpatXS).

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


From qetzal at tutopia.com.br  Wed Dec  6 15:21:20 2006
From: qetzal at tutopia.com.br (giovani)
Date: Wed, 06 Dec 2006 10:21:20 -0500
Subject: [Bioperl-l] Biodiversity graphic
Message-ID: <auto-000222418003@frontend01.cg.ifxnetworks.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061206/9d9e4a09/attachment-0004.html>

From benoit at ebi.ac.uk  Wed Dec  6 17:30:12 2006
From: benoit at ebi.ac.uk (Benoit Ballester)
Date: Wed, 06 Dec 2006 17:30:12 +0000
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <4576FE24.1030807@ebi.ac.uk>

giovani wrote:
> 
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having troubles. 
> here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.


It looks to me that the graph you pointing too has been made by gnuplot.
Why don't you use gnuplot or R instead ?

Ben

> 
> #!/usr/bin/perl -w
> 
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
> 
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
> 
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
> 
>  
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From gwu at molbio.mgh.harvard.edu  Wed Dec  6 21:12:57 2006
From: gwu at molbio.mgh.harvard.edu (gang wu)
Date: Wed, 06 Dec 2006 16:12:57 -0500
Subject: [Bioperl-l] Biodiversity graphic
In-Reply-To: <auto-000222418003@frontend01.cg.ifxnetworks.com>
References: <auto-000222418003@frontend01.cg.ifxnetworks.com>
Message-ID: <45773259.3010405@molbio.mgh.harvard.edu>

Do you mean the GD code can not run or it does not generate image as you 
wanted?

Gang

giovani wrote:
>
>
> Hello there. I'm trying to write a programa to set a graphic with two 
> axis and two data sets to each axis. Anyone know some tool similar to 
> the GD module to set this graphic, because with GD I'm having 
> troubles. here is an example of what I want to do: 
> http://libshuff.mib.uga.edu/YvsX.png, and below is the code that I'm 
> using with GD module.
>
> #!/usr/bin/perl -w
>
> use GD::Graph::mixed;
> @data = (
>    ["1st","2nd","3rd","4th","5th","6th","7th", "8th", "9th"],
>    [    3,   4,   14,   30,   12,    8,    7,    20,    15],
>    [    2,   8,    2,    5,    3,  1,    3,     4,     1],
>    [    5,   12,   24,   33,   19,    8,    6,    15,    21],
>    [    1,    2,    5,    6,    3,  1.5,    1,     3,     4],
> );
>
> $my_graph = new GD::Graph::mixed( );
> $my_graph->set(
>        x_label => 'X Label',
>        y1_label => 'Y1 label',
>        y2_label => 'Y2 label',
>        title => 'Using two axes',
>        y1_max_value => 40,
>        y2_max_value => 8,
>        y_tick_number => 8,
>        y_label_skip => 2,
>        long_ticks => 1,
>        two_axes => 1,
>                use_axis => [1,2,1,2],
>        legend_placement => 'BR',
>        x_labels_vertical => 1,
>        x_label_position => 1/2,
> );
>
> $my_graph->set_legend( 'X', 'XY', 'diff-X/XY', '95%XY');
> my $gd = $my_graph->plot(\@data) or die $my_graph->error;
> open(IMG, '>graphTest.gif') or die "N o posso abrir arquivo$!\n";
> binmode IMG;
> print IMG $gd->gif;
> close IMG;
>
>  
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Dec  6 22:39:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 06 Dec 2006 22:39:49 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
Message-ID: <457746B5.2020006@sendu.me.uk>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.


From cjfields at uiuc.edu  Thu Dec  7 02:30:44 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 6 Dec 2006 20:30:44 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c719a7$b48beb90$15327e82@pyrimidine>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


From hlapp at gmx.net  Thu Dec  7 03:20:14 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:20:14 -0500
Subject: [Bioperl-l] Bioperl-db doesn't seem to load all entries
In-Reply-To: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
References: <4812.130.49.222.58.1165271699.squirrel@webmail.cs.pitt.edu>
Message-ID: <8E15592D-6475-4A4D-BA6D-BD669C4233C3@gmx.net>

I seriously doubt that load_seqdatabase.pl would have deliberately  
stopped loading the file. Either there was an error in loading an  
entry (which you should see, and you can also ask the script to just  
keep going by providing the --safe option), or the file only  
contained 1003 entries.

Note that you can get progress logging by using the --logchunk  
option, which will also give you a final count of the number of  
sequences loaded.

I'm not sure how you ran your search and your download on Uniprot. If  
I try what you describe I get 70491 hits, and if I try to export them  
using the data set manager I get the message:

This download mechanism only supports 1000 proteins. The first 1000  
proteins have been added from the selected.

Which perfectly explains what you see.

Did you convince yourself that the file contains 70491 entries? If  
you don't have grep and wc on your windows machine, you can use perl  
one-liners directly, e.g.,

perl -n -e '/^ID / && ++$n; END {print "$n entries\n";}' <your-file- 
here>

	-hilmar

On Dec 4, 2006, at 5:34 PM, pelikan at cs.pitt.edu wrote:

> Hello,
>
>     My system is running bioperl 1.5.2, bioperl-db 1.5.2-005 RC,  
> and the
> latest mySQL under Windows, Activeperl, without Cygwin. I have 4 GB
> memory. "make test"s past fine.
>
> The problem is that I'm not getting similar numbers of anything when I
> load datasets using load_seqdatabase.pl. For instance, if I want to  
> load
> only protiens from Homo Sapiens,
> I go to UniProt,
> use the database search function,
> do a text search for Homo Sapiens (returns 70914 hits),
> export the hits to flat file format (--format swiss) using the data  
> set
> manager,
> and load it using load_seqdatabase.pl.
>
> The result of  "select count(*) from bioentry;" results in only  
> 1003 entries.
> Moreover it seems like the entries don't go past the B's in the  
> alphabet -
> I can't find bioentry.descriptions like '%cytochrome%' or '% 
> myoglobin%',
> but I can find apolipoproteins, for example.
>
> I know this is an annoying question, but if someone has more  
> experience in
> dealing with this issue, I would be grateful for any assistance. I  
> don't
> get any error messages, so it's difficult for me to tell what's  
> going on.
>
> -Richard
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lzhtom at hotmail.com  Thu Dec  7 03:13:47 2006
From: lzhtom at hotmail.com (zhihua li)
Date: Thu, 07 Dec 2006 03:13:47 +0000
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
Message-ID: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>

Hi netters,

Recently I found this:

For constructing a new SeqI object, I had to write:
$seq_obj=Bio::SeqIO->new(
      -file => '/home/myfile',
      -format => 'Fasta');              #Note the dash before the two 
arguments.

If I omitted the dash:
$seq_obj=Bio::SeqIO->new(
     file => '/home/myfile',
     format => 'Fasta');
I'd get error:
MSG: Unknown format given or could not determine it []
STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377

So it seems to me that the dashes before the arguments are essential.  
However, when I tried to build a factory for StandaloneBlast, I found the 
other way around.

If the script had the dash:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             -program => 'blastn',
             -database => '/home/mydatabase');

I'd get the error message: 
MSG: Unallowed parameter: - !
STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
STACK Bio::Tools::Run::StandAloneBlast::new 
/usr/lib/perl5/site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400

If I left out the dash by saying:
$blast_obj=Bio::Tools::Run::StandAloneBlast->new(
             program => 'blastn',
             database => '/home/mydatabase');

Everyting is fine.

Now I'm confused. Why sometimes I have to add the dash, while sometimes I'm 
not allowed to?

Thanks in advance!

_________________________________________________________________
?????????????? MSN Messenger:  http://messenger.msn.com/cn  


From hlapp at gmx.net  Thu Dec  7 03:56:44 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 6 Dec 2006 22:56:44 -0500
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <CE76F074-5897-431C-9E39-9E096DBD1973@gmx.net>

Congrats! Great work, Sendu! Don't forget to celebrate.

	-hilmar

On Dec 6, 2006, at 5:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From arareko at campus.iztacala.unam.mx  Thu Dec  7 03:53:21 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 06 Dec 2006 21:53:21 -0600
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <45779031.3050202@campus.iztacala.unam.mx>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From jason at bioperl.org  Thu Dec  7 05:06:36 2006
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Dec 2006 21:06:36 -0800
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <41A863C9-1B69-4C7B-9271-C577EDD011BB@bioperl.org>

hear! hear!  Excellent work.   Thanks for leading the effort on this  
release and all of the behind the scenes work, attention to detail,   
and cat herding work it took make this possible.

-jason

On Dec 6, 2006, at 2:39 PM, Sendu Bala wrote:

> I am proud to announce the final release of Bioperl 1.5.2.
>
> http://www.bioperl.org/wiki/Release_1.5.2
>
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
>
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
>
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
>
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
>
> http://bioperl.org/DIST/SIGNATURES.md5
>
> (all are also available via CVS, and for Windows users, using the Perl
> Package Manager - see the wiki for details)
>
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree
> and bioperl-pipeline) did not see a unified release for 1.5.2.
>
>
>
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of
> Bioperl and believe it to be suitable for most people. It is marked
> 'developer' or even 'unstable' because its API may change on short
> notice. It will also not be maintained or supported beyond the next
> bioperl release.
>
> 1.5.2 introduces the following new (core) features:
>
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
>
> For details, and a complete change log, see the wiki.
>
> API documentation is available here: http://doc.bioperl.org/
>
>
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher  
> Fields
> (bug fixing, testing, documentation, discussion), Nathan Haigh
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra
> (testing, documentation, support). Feedback and ideas provided by  
> Hilmar
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list  
> and
> elsewhere proved invaluable. None of this would have been possible
> without the behind-the-scenes work of the open-bio support team. I'd
> also like to acknowledge Andreas J. Koenig for his help with CPAN  
> matters.
>
> Finally, thank you to everyone who tried out the release  
> candidates, and
> especially those that took the time to file bug reports or report  
> problems.
>
>
> Remember, Bioperl can only go from strength to strength with /your/
> help. If you'd like to experience the fame and fortune that naturally
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
>
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From n.haigh at sheffield.ac.uk  Thu Dec  7 07:23:47 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 07 Dec 2006 07:23:47 +0000
Subject: [Bioperl-l] Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <4577C183.7010501@sheffield.ac.uk>

I know I'm very new to Bioperl development and don't know very much yet,
so I'm probably not the best person to express the views of the Bioperl
developers or users. However, I'm sure I'm safe in saying that on behalf
of everyone associated with Bioperl a *huge* thank you must go out to
Sendu for the gargantuan effort he has put into this release.

Just looking over some of the e-mails he's sent over the past few weeks
alone, it's clear that he has devoted a huge amount of time to the
effort and in some cases with little sleep. Since there is very little
(or should I say no) monetary recognition in such an important and time
consuming role as "Release Pumpkin", I hope Sendu has a warm glow, safe
in the knowledge that his efforts have helped enormously and are clearly
recognised and fully appreciated by the Bioperl community.

Therefore, I'd just like to iterate what others have already
said.....Well done, excellent work!!!

Nath


From valiente at lsi.upc.edu  Thu Dec  7 08:25:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 09:25:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110 species
In-Reply-To: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
Message-ID: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>

The following popped out when input more the 110 species to  
taxonomy2tree script version 1.4:

         (in cleanup)
------------- EXCEPTION  -------------
MSG: Must supply a Bio::Taxon
STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
flatfile.pm:260
STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
STACK (eval) taxonomy2tree.pl:0
STACK toplevel taxonomy2tree.pl:0

Any clues? Thanks,

Gabriel


From bix at sendu.me.uk  Thu Dec  7 09:24:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:24:39 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DDD7.7060208@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Are you able to narrow the problem down? What was your command line, 
what species were you using? Does it work with the first 110 species you 
tried? Is there anything special about the 111th?

Do I understand correctly that this was a problem during cleanup only, 
and didn't affect the correctness and completeness of the result?


From bix at sendu.me.uk  Thu Dec  7 09:33:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Dec 2006 09:33:18 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
Message-ID: <4577DFDE.6000500@sendu.me.uk>

Gabriel Valiente wrote:
> The following popped out when input more the 110 species to  
> taxonomy2tree script version 1.4:
> 
>          (in cleanup)
> ------------- EXCEPTION  -------------
> MSG: Must supply a Bio::Taxon
> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/ 
> flatfile.pm:260
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
> STACK (eval) taxonomy2tree.pl:0
> STACK toplevel taxonomy2tree.pl:0
> 
> Any clues? Thanks,

Oh, does it work with option -e? Or does it work if you delete your old 
indexes of the nodes and names files and let it re-create them?


From valiente at lsi.upc.edu  Thu Dec  7 09:38:03 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Thu, 7 Dec 2006 10:38:03 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577DDD7.7060208@sendu.me.uk>
References: <mailman.8205.1161981511.2493.bioperl-l@lists.open-bio.org>
	<4DA1DAE9-92B8-46C1-A3CE-F8D1AE4BB334@lsi.upc.edu>
	<4577DDD7.7060208@sendu.me.uk>
Message-ID: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>

Hi,

If you run the attached shell script you should be able to reproduce  
the problem. It is not about any species in particular, but about the  
total number of species: it crushes with more than 120 species. The  
resulting tree is not correct, I'm checking it further now. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/00f0aeda/attachment-0004.obj>
-------------- next part --------------

On Dec 7, 2006, at 10:24 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>> The following popped out when input more the 110 species to   
>> taxonomy2tree script version 1.4:
>>          (in cleanup)
>> ------------- EXCEPTION  -------------
>> MSG: Must supply a Bio::Taxon
>> STACK Bio::DB::Taxonomy::flatfile::ancestor Bio/DB/Taxonomy/  
>> flatfile.pm:260
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:476
>> STACK Bio::Taxon::remove_Descendent Bio/Taxon.pm:703
>> STACK Bio::Tree::Node::ancestor Bio/Tree/Node.pm:346
>> STACK Bio::Taxon::ancestor Bio/Taxon.pm:466
>> STACK Bio::Tree::Tree::cleanup_tree Bio/Tree/Tree.pm:325
>> STACK Bio::Root::Root::DESTROY Bio/Root/Root.pm:409
>> STACK (eval) taxonomy2tree.pl:0
>> STACK toplevel taxonomy2tree.pl:0
>> Any clues? Thanks,
>
> Are you able to narrow the problem down? What was your command  
> line, what species were you using? Does it work with the first 110  
> species you tried? Is there anything special about the 111th?
>
> Do I understand correctly that this was a problem during cleanup  
> only, and didn't affect the correctness and completeness of the  
> result?


From cjfields at uiuc.edu  Thu Dec  7 15:22:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 09:22:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110species
In-Reply-To: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
Message-ID: <000001c71a13$8feec840$15327e82@pyrimidine>

> Hi,
> 
> If you run the attached shell script you should be able to 
> reproduce the problem. It is not about any species in 
> particular, but about the total number of species: it crushes 
> with more than 120 species. The resulting tree is not 
> correct, I'm checking it further now. Thanks,
> 
> Gabriel

Gabriel, 

My guess is this may have to do with using an old taxonomy dump file.  I got
this to work on winXP using the latest NCBI taxonomy.  I had to modify
taxonomy2tree and your shell script to get it to play nice with Windows, but
I didn't get the error and I did get a tree (abbreviated for brevity):

(((((("Agrobacterium tumefaciens str. C58","Sinorhizobium
meliloti")Rhizobiaceae,...

chris


From cjfields at uiuc.edu  Thu Dec  7 18:44:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 12:44:32 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
References: <BAY110-F30C26DE384E916A297FA86C7DC0@phx.gbl>
Message-ID: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>


On Dec 6, 2006, at 9:13 PM, zhihua li wrote:

> Hi netters,
>
> Recently I found this:
>
> For constructing a new SeqI object, I had to write:
> $seq_obj=Bio::SeqIO->new(
>      -file => '/home/myfile',
>      -format => 'Fasta');              #Note the dash before the  
> two arguments.
>
> If I omitted the dash:
> $seq_obj=Bio::SeqIO->new(
>     file => '/home/myfile',
>     format => 'Fasta');
> I'd get error:
> MSG: Unknown format given or could not determine it []
> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>
> So it seems to me that the dashes before the arguments are  
> essential.  However, when I tried to build a factory for  
> StandaloneBlast, I found the other way around.
>
> If the script had the dash:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             -program => 'blastn',
>             -database => '/home/mydatabase');
>
> I'd get the error message: MSG: Unallowed parameter: - !
> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/ 
> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>
> If I left out the dash by saying:
> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>             program => 'blastn',
>             database => '/home/mydatabase');
>
> Everyting is fine.
>
> Now I'm confused. Why sometimes I have to add the dash, while  
> sometimes I'm not allowed to?
>
> Thanks in advance!

I agree that this should be more consistent.  Does anyone know the  
reasoning for this?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bosborne11 at verizon.net  Thu Dec  7 19:32:21 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 07 Dec 2006 14:32:21 -0500
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
 constructor?
In-Reply-To: <7513E9D5-E055-4EBE-B8CF-538A8DEDB8E9@uiuc.edu>
Message-ID: <C19DD675.BD72%bosborne11@verizon.net>

Chris,

The latest StandAloneBlast takes "dashed parameters", as in:

 @params = (-database => 'swissprot',-outfile => 'blast1.out');
 $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

Or

 my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp",
                                                     -database=>"swissprot",
                                                     -e => 1e-20);

So that's why I asked "what version?"

Someone made the change to allow dashes in @params a few months ago and I
believe that that someone was you!

Brian O.


On 12/7/06 1:44 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> 
> On Dec 6, 2006, at 9:13 PM, zhihua li wrote:
> 
>> Hi netters,
>> 
>> Recently I found this:
>> 
>> For constructing a new SeqI object, I had to write:
>> $seq_obj=Bio::SeqIO->new(
>>      -file => '/home/myfile',
>>      -format => 'Fasta');              #Note the dash before the
>> two arguments.
>> 
>> If I omitted the dash:
>> $seq_obj=Bio::SeqIO->new(
>>     file => '/home/myfile',
>>     format => 'Fasta');
>> I'd get error:
>> MSG: Unknown format given or could not determine it []
>> STACK Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.7/Bio/SeqIO.pm:377
>> 
>> So it seems to me that the dashes before the arguments are
>> essential.  However, when I tried to build a factory for
>> StandaloneBlast, I found the other way around.
>> 
>> If the script had the dash:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             -program => 'blastn',
>>             -database => '/home/mydatabase');
>> 
>> I'd get the error message: MSG: Unallowed parameter: - !
>> STACK Bio::Tools::Run::StandAloneBlast::AUTOLOAD /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:433
>> STACK Bio::Tools::Run::StandAloneBlast::new /usr/lib/perl5/
>> site_perl/5.8.7/Bio/Tools/Run/StandAloneBlast.pm:400
>> 
>> If I left out the dash by saying:
>> $blast_obj=Bio::Tools::Run::StandAloneBlast->new(
>>             program => 'blastn',
>>             database => '/home/mydatabase');
>> 
>> Everyting is fine.
>> 
>> Now I'm confused. Why sometimes I have to add the dash, while
>> sometimes I'm not allowed to?
>> 
>> Thanks in advance!
> 
> I agree that this should be more consistent.  Does anyone know the
> reasoning for this?
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Dec  7 19:44:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 13:44:19 -0600
Subject: [Bioperl-l] different syntaxes for SeqI constructor and Factory
	constructor?
In-Reply-To: <C19DD675.BD72%bosborne11@verizon.net>
References: <C19DD675.BD72%bosborne11@verizon.net>
Message-ID: <A12BC418-6400-46FC-8383-66E21D997E56@uiuc.edu>


On Dec 7, 2006, at 1:32 PM, Brian Osborne wrote:

> Chris,
>
> The latest StandAloneBlast takes "dashed parameters", as in:
>
>  @params = (-database => 'swissprot',-outfile => 'blast1.out');
>  $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>
> Or
>
>  my $factory = Bio::Tools::Run::StandAloneBlast->new(-program  
> =>"wublastp",
>                                                      - 
> database=>"swissprot",
>                                                      -e => 1e-20);
>
> So that's why I asked "what version?"
>
> Someone made the change to allow dashes in @params a few months ago  
> and I
> believe that that someone was you!
>
> Brian O.

Nope, I plead innocent (at least to this!).  I haven't made any  
commits to StandAloneBlast.  These were added in by Torsten (see  
commits 1.59, 1.60), so you'll need to blame/thank him...

http://tinyurl.com/y7ym9g

So they're now a bit more consistent.  That's not to say  
StandAloneBlast doesn't need some major revisions....

BTW, I didn't see a post from you asking about the version.

Chris


From akarger at CGR.Harvard.edu  Thu Dec  7 21:32:51 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 7 Dec 2006 16:32:51 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>

I need to know how to get the frame information in exon features
(created by Bio::Tools::GFF) into a whole-gene feature that will be
translated into a protein.

I'm reading in some fungal GFFs generated by Jason Stajich. I

- Use Bio::Tools::GFF to create a feature for each exon in a gene
- Create a Bio::Location::Split object containing each feature's
location
- Create a Bio::SeqFeature::Generic object whose location is the above
BL::Split
- Attach my contig Bio::Seq to the feature
- get the protein with feature->spliced_seq->translate->seq

(Code below)

Unfortunately, I get the wrong result when the GFF features have frame
!= 0. This happens for only a few percent of the exons, but when it
does, I end up translating in the wrong frame.

If I read the docs correctly, Location objects don't have a frame. So
how do I get the correct spliced_seq, which skips one or two bp at the
beginning of certain exons?

I suspect the answer to this is that I'm going about this in completely
the wrong way, in which case, please tell me how I ought to be doing it.

Thanks,
- Amir Karger
Research Computing
Life Sciences Division
Harvard University

P.S. In case you want to see actual code, here it is. After using
Bio::Tools::GFF to create a sorted list of features for each exon
(basically stolen from the module POD), I:
    # Create a new object representing the exons' gene
    my $coding_loc_obj = new Bio::Location::Split;
    foreach my $exon (@sorted_exons) {
        $coding_loc_obj->add_sub_Location($exon->location);
    }

    # Build a spliced feature representing the whole gene
    my $spliced_feat = new Bio::SeqFeature::Generic(
        -start  => $coding_loc_obj->start,
        -end    => $coding_loc_obj->end,
        -strand => $strand_num,
        -primary=> "splicedGene",
    );
    $spliced_feat->location($coding_loc_obj);

    # Attach a contig object containing the sequence
    $spliced_feat->attach_seq($contig_obj->bioperl_object);

    # Get the spliced seq and translate to protein:
    my $coding_seq = $spliced_feat->spliced_seq->seq;
    my $protein = $spliced_feat->spliced_seq->translate->seq;


From bix at sendu.me.uk  Thu Dec  7 22:45:32 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 7 Dec 2006 15:45:32 -0700
Subject: [Bioperl-l] [Bioperl-announce-l] Bioperl 1.5.2 Release
Message-ID: <000001c71a51$671a79d0$6400a8c0@CodonSolutions.local>

I am proud to announce the final release of Bioperl 1.5.2.

http://www.bioperl.org/wiki/Release_1.5.2

bioperl (core):
cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-1.5.2_100.zip

bioperl-run:
cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip

bioperl-db:
cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip

bioperl-network:
cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip

http://bioperl.org/DIST/SIGNATURES.md5

(all are also available via CVS, and for Windows users, using the Perl 
Package Manager - see the wiki for details)

The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
and bioperl-pipeline) did not see a unified release for 1.5.2.


This release represents a developer release which has been thoroughly
tested. We consider it the most stable (in terms of bugs) version of 
Bioperl and believe it to be suitable for most people. It is marked 
'developer' or even 'unstable' because its API may change on short 
notice. It will also not be maintained or supported beyond the next 
bioperl release.

1.5.2 introduces the following new (core) features:

  * Taxonomy (Bio::Species) overhaul
  * Bio::Map improvements
  * Bio::SearchIO speedup
  * Build.PL installation

For details, and a complete change log, see the wiki.

API documentation is available here: http://doc.bioperl.org/


Acknowledgements:
Enumerable thanks are due for the tireless efforts of Christopher Fields 
(bug fixing, testing, documentation, discussion), Nathan Haigh 
(Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
(testing, documentation, support). Feedback and ideas provided by Hilmar 
Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
elsewhere proved invaluable. None of this would have been possible 
without the behind-the-scenes work of the open-bio support team. I'd 
also like to acknowledge Andreas J. Koenig for his help with CPAN matters.

Finally, thank you to everyone who tried out the release candidates, and 
especially those that took the time to file bug reports or report problems.


Remember, Bioperl can only go from strength to strength with /your/ 
help. If you'd like to experience the fame and fortune that naturally 
follow becoming a Bioperl developer (?!), become one!
http://www.bioperl.org/wiki/Becoming_a_developer

On behalf of the Bioperl team,
Sendu Bala.
_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cjfields at uiuc.edu  Thu Dec  7 23:00:43 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 16:00:43 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a53$85cb4f10$6400a8c0@CodonSolutions.local>

Great job Sendu!  

A bit of icing on the cake: all the WinXP PPMs (core, db, network, run)
installed w/o a hitch following normal instructions using PPM4 (GUI and
command line shell) using clean ActiveState installations.  Looks like all
the correct prereqs were installed with shell (only XML::SAX::ExpatXS was
left out in the GUI installation for reasons outlined before).  

I'll run more tests tomorrow to see if tests pass with the installed bioperl
(this should catch any prereq issues with PPM installation we missed).

chris

> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using 
> the Perl Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, 
> bioperl-pedigree and bioperl-pipeline) did not see a unified 
> release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of 
> Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas 
> provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the 
> mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with 
> CPAN matters.
> 
> Finally, thank you to everyone who tried out the release 
> candidates, and 
> especially those that took the time to file bug reports or 
> report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.


_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From kaboroev at sfu.ca  Thu Dec  7 22:26:35 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Thu, 07 Dec 2006 14:26:35 -0800
Subject: [Bioperl-l] Bio::Graphics xyplot
Message-ID: <4578951B.5050206@sfu.ca>

Hi everyone,

I'm attempting to add an xyplot of the phred quality scores to an
Bio::Graphics image, and cannot get it to work.
I have the panel with a track for both the scale and the DNA displaying
properly.  When I attempt to add the xyplot i just get a garbled track
of, what looks like, timy xyplots for each datapoint.  I have the cvs
(updated today) of bioperl-live running.  I think what I am missing is
the creation of a "Sequence Feature Group" to hold the individual points
of the plot.  However, I cannot seem to find such an object. This is
what I attempted:

-------BEGIN---CODE-----------
# start panel
my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
                      -width     => $f_seqlen*10,
                      -pad_left  => 10,
                      -pad_right => 10,
                      -grid      => 1
                      );
# add scale
$panel->add_track(arrow =>
Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
              -double  => 1,
              -tick    => 2,
              -fgcolor => 'black');
# add DNA ($feature is of type Bio::SeqFeature::Annotated)
$panel->add_track(dna => $feature);
# get list of quality scores from database
my ($pqs_value) = $dbh->selectrow_array($sql);
my @pqs_value = split(/\s/,$pqs_value);
# create track
my $track =  $panel->add_track(-glyph        => 'xyplot',
                   -graph_type   => 'points',
                   -point_symbol => 'point',
                   -max_score    => 100,
                   -min_score    => 0,
                   -scale        => 'none');
# add "subfeatures" to
for (my $i=0;$i<$f_seqlen;$i++) {
   
$track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));

}
print $panel->png();
$panel->finished;
------END---CODE----------

I also attempted to create an array of the point features and passed
that by reference to the panel "add_track" as it describes in the xyplot
documentation, but that resulted in the exact same image.

keith

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From arareko at campus.iztacala.unam.mx  Thu Dec  7 23:15:53 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 7 Dec 2006 16:15:53 -0700
Subject: [Bioperl-l] [Bioperl-announce-l]  Bioperl 1.5.2 Release
In-Reply-To: <457746B5.2020006@sendu.me.uk>
References: <457746B5.2020006@sendu.me.uk>
Message-ID: <000001c71a55$a479da60$6400a8c0@CodonSolutions.local>

This has been a great effort. Congrats and thanks to everyone involved!

Mauricio.

Sendu Bala wrote:
> I am proud to announce the final release of Bioperl 1.5.2.
> 
> http://www.bioperl.org/wiki/Release_1.5.2
> 
> bioperl (core):
> cpan>install S/SE/SENDU/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-1.5.2_100.zip
> 
> bioperl-run:
> cpan>install S/SE/SENDU/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-run-1.5.2_100.zip
> 
> bioperl-db:
> cpan>install S/SE/SENDU/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-db-1.5.2_100.zip
> 
> bioperl-network:
> cpan>install S/SE/SENDU/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.gz
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.tar.bz2
> http://bioperl.org/DIST/bioperl-network-1.5.2_100.zip
> 
> http://bioperl.org/DIST/SIGNATURES.md5
> 
> (all are also available via CVS, and for Windows users, using the Perl 
> Package Manager - see the wiki for details)
> 
> The other bioperl packages (bioperl-ext, bioperl-gui, bioperl-pedigree 
> and bioperl-pipeline) did not see a unified release for 1.5.2.
> 
> 
> 
> This release represents a developer release which has been thoroughly
> tested. We consider it the most stable (in terms of bugs) version of 
> Bioperl and believe it to be suitable for most people. It is marked 
> 'developer' or even 'unstable' because its API may change on short 
> notice. It will also not be maintained or supported beyond the next 
> bioperl release.
> 
> 1.5.2 introduces the following new (core) features:
> 
>   * Taxonomy (Bio::Species) overhaul
>   * Bio::Map improvements
>   * Bio::SearchIO speedup
>   * Build.PL installation
> 
> For details, and a complete change log, see the wiki.
> 
> API documentation is available here: http://doc.bioperl.org/
> 
> 
> Acknowledgements:
> Enumerable thanks are due for the tireless efforts of Christopher Fields 
> (bug fixing, testing, documentation, discussion), Nathan Haigh 
> (Windows&pre-requisite issues, testing) and Mauricio Herrera Cuadra 
> (testing, documentation, support). Feedback and ideas provided by Hilmar 
> Lapp, Jason Stajich, Torsten Seemann and others on the mailing list and 
> elsewhere proved invaluable. None of this would have been possible 
> without the behind-the-scenes work of the open-bio support team. I'd 
> also like to acknowledge Andreas J. Koenig for his help with CPAN matters.
> 
> Finally, thank you to everyone who tried out the release candidates, and 
> especially those that took the time to file bug reports or report problems.
> 
> 
> Remember, Bioperl can only go from strength to strength with /your/ 
> help. If you'd like to experience the fame and fortune that naturally 
> follow becoming a Bioperl developer (?!), become one!
> http://www.bioperl.org/wiki/Becoming_a_developer
> 
> On behalf of the Bioperl team,
> Sendu Bala.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM

_______________________________________________
Bioperl-announce-l mailing list
Bioperl-announce-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l


From cain at cshl.edu  Thu Dec  7 22:46:09 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 07 Dec 2006 17:46:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	a	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <1165531569.2569.49.camel@localhost.localdomain>

Amir,

I don't know for sure what the problem is, but here is one possibility:
the number in column 8 of a GFF file is not the frame, it is the phase.
See the GFF3 spec for a description of what the phase is:

  http://www.sequenceontology.org/gff3.shtml

(It doesn't matter if you are using GFF3 or GFF2, as the phase is the
same in both).

Scott


On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
> 
> I'm reading in some fungal GFFs generated by Jason Stajich. I
> 
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
> 
> (Code below)
> 
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
> 
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
> 
> I suspect the answer to this is that I'm going about this in completely
> the wrong way, in which case, please tell me how I ought to be doing it.
> 
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
> 
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
> 
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
> 
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> 
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061207/913096a5/attachment.sig>

From cjfields at uiuc.edu  Fri Dec  8 02:52:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 7 Dec 2006 20:52:47 -0600
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
In-Reply-To: <1165531569.2569.49.camel@localhost.localdomain>
Message-ID: <002d01c71a73$f16ecc40$15327e82@pyrimidine>

Another issue is the splittype() is not defined, though I don't think that
would kill anything as currently implemented.  However, one thing we have
passingly discussed is having Bio::Location::Split objects possibly exhibit
different (but expected) behaviors based upon the splittype() (order, join,
or bond).  It's one of the things I want to work out for the next release.

If Scott's fix doesn't work and the problem persists, you should file a bug
report with some sample data for us to test out.

chris

> Amir,
> 
> I don't know for sure what the problem is, but here is one 
> possibility:
> the number in column 8 of a GFF file is not the frame, it is 
> the phase.
> See the GFF3 spec for a description of what the phase is:
> 
>   http://www.sequenceontology.org/gff3.shtml
> 
> (It doesn't matter if you are using GFF3 or GFF2, as the 
> phase is the same in both).
> 
> Scott
> 
> 
> On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > I need to know how to get the frame information in exon features 
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be 
> > translated into a protein.
> > 
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > 
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's 
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above 
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> > 
> > (Code below)
> > 
> > Unfortunately, I get the wrong result when the GFF features 
> have frame 
> > != 0. This happens for only a few percent of the exons, but when it 
> > does, I end up translating in the wrong frame.
> > 
> > If I read the docs correctly, Location objects don't have a 
> frame. So 
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the 
> > beginning of certain exons?
> > 
> > I suspect the answer to this is that I'm going about this in 
> > completely the wrong way, in which case, please tell me how 
> I ought to be doing it.
> > 
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> > 
> > P.S. In case you want to see actual code, here it is. After using 
> > Bio::Tools::GFF to create a sorted list of features for each exon 
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> > 
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> > 
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > 
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;


From jason at bioperl.org  Fri Dec  8 02:01:33 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 7 Dec 2006 18:01:33 -0800
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E022BE901@huls5.nucleus.harvard.edu>
Message-ID: <866F6CEE-62BB-4880-9B13-6DDE29EAF94E@bioperl.org>

This was a problem in the gene prediction output I suspect, more  
recent versions of the program should have fixed this.  I do not  
currently have free time to deal with the errors in the small number  
of ORFs where this has happened.

I think you just need to do
  start -= start- (frame*strand)
for 1st exons.

You can also probably provide the 1st exon's frame to the translate  
function as another possibility but you should try and get the CDS  
correct first depending on your downstream analyses.

-jason
On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:

> I need to know how to get the frame information in exon features
> (created by Bio::Tools::GFF) into a whole-gene feature that will be
> translated into a protein.
>
> I'm reading in some fungal GFFs generated by Jason Stajich. I
>
> - Use Bio::Tools::GFF to create a feature for each exon in a gene
> - Create a Bio::Location::Split object containing each feature's
> location
> - Create a Bio::SeqFeature::Generic object whose location is the above
> BL::Split
> - Attach my contig Bio::Seq to the feature
> - get the protein with feature->spliced_seq->translate->seq
>
> (Code below)
>
> Unfortunately, I get the wrong result when the GFF features have frame
> != 0. This happens for only a few percent of the exons, but when it
> does, I end up translating in the wrong frame.
>
> If I read the docs correctly, Location objects don't have a frame. So
> how do I get the correct spliced_seq, which skips one or two bp at the
> beginning of certain exons?
>
> I suspect the answer to this is that I'm going about this in  
> completely
> the wrong way, in which case, please tell me how I ought to be  
> doing it.
>
> Thanks,
> - Amir Karger
> Research Computing
> Life Sciences Division
> Harvard University
>
> P.S. In case you want to see actual code, here it is. After using
> Bio::Tools::GFF to create a sorted list of features for each exon
> (basically stolen from the module POD), I:
>     # Create a new object representing the exons' gene
>     my $coding_loc_obj = new Bio::Location::Split;
>     foreach my $exon (@sorted_exons) {
>         $coding_loc_obj->add_sub_Location($exon->location);
>     }
>
>     # Build a spliced feature representing the whole gene
>     my $spliced_feat = new Bio::SeqFeature::Generic(
>         -start  => $coding_loc_obj->start,
>         -end    => $coding_loc_obj->end,
>         -strand => $strand_num,
>         -primary=> "splicedGene",
>     );
>     $spliced_feat->location($coding_loc_obj);
>
>     # Attach a contig object containing the sequence
>     $spliced_feat->attach_seq($contig_obj->bioperl_object);
>
>     # Get the spliced seq and translate to protein:
>     my $coding_seq = $spliced_feat->spliced_seq->seq;
>     my $protein = $spliced_feat->spliced_seq->translate->seq;
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Fri Dec  8 10:21:50 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 8 Dec 2006 15:51:50 +0530
Subject: [Bioperl-l] need help with phrap parser
Message-ID: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>

Can anyone point me to a Phrap parser which parses the ace file to extract
what reads make up each contig (eg. read_a and read_b make contig1; read_d
read_e and read_z make contig2, and other information of the reads (like
whether the read is complemented or not with respect to the contig, what
region of the contig does each read contribute etc), basically the AF and BS
lines of the ACE output.

-- 
-Neeti
Even my blood says, B positive


From pmiguel at purdue.edu  Fri Dec  8 14:17:02 2006
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Fri, 08 Dec 2006 09:17:02 -0500
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
References: <764978cf0612080221o709514a1rf5f97054c5eabb51@mail.gmail.com>
Message-ID: <457973DE.6050900@purdue.edu>

neeti somaiya wrote:
> Can anyone point me to a Phrap parser which parses the ace file to extract
> what reads make up each contig (eg. read_a and read_b make contig1; read_d
> read_e and read_z make contig2, and other information of the reads (like
> whether the read is complemented or not with respect to the contig, what
> region of the contig does each read contribute etc), basically the AF and BS
> lines of the ACE output.
>
>   
neeti,

    To find the reads that went into each contig, you do *not* want the BS tagged records. My understanding is that BS is just what consed uses to populate its consensus line from the ace file. 
I write this because of an email sent me by David Gordon in 2001 included here 
without his permission:


> > Phrap writes BS lines which
> > indicate, for each consensus position, which read phrap uses at that
> > position to become the consensus.  These BS ("base segments") are 
> > manipulated by Consed when there are changes to the assembly, such as
> > joins, tears, removing reads, or changing the consensus.
>   
    The simplest way is:

egrep '^CO|AF|RD' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF|RD/);
}

But then you would need to parse the fields of interest. You get the 
position/strand in the contig from AF, then you get the length of the 
read from RD.

There does look like there is a part of bioperl that meant to perform 
this task--including Bio::Assembly::IO::ace but it looks like it was 
started, but never completed.


From cjfields at uiuc.edu  Fri Dec  8 15:17:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:17:31 -0600
Subject: [Bioperl-l] NAR Database Issue Papers
Message-ID: <000601c71adb$fdd60490$15327e82@pyrimidine>

For those interested, the Nucleic Acids Research Database issue papers have
been popping up in the Advance Access section of the NAR website:

http://nar.oxfordjournals.org/papbyrecent.dtl

Ensembl, UCSC Browser, Entrez Gene, and a number of others of possible are
represented.  Of particular note are a few mentions of formatting changes to
UniProt, EMBL, and other records, which should be taken care of in the
latest BioPerl release (fingers crossed!).

chris


From cjfields at uiuc.edu  Fri Dec  8 15:31:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 09:31:19 -0600
Subject: [Bioperl-l] need help with phrap parser
In-Reply-To: <457973DE.6050900@purdue.edu>
Message-ID: <000001c71add$ec7147d0$15327e82@pyrimidine>

...
> But then you would need to parse the fields of interest. You get the 
> position/strand in the contig from AF, then you get the length of the 
> read from RD.
> 
> There does look like there is a part of bioperl that meant to perform 
> this task--including Bio::Assembly::IO::ace but it looks like it was 
> started, but never completed.

...and if anyone wants to chip in and work on it, let us know!   The various
Bio::Assembly modules are one of many areas that needs some updating.

chris


From akarger at CGR.Harvard.edu  Fri Dec  8 18:25:47 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:25:47 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting a
	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>

> This was a problem in the gene prediction output I suspect, more  
> recent versions of the program should have fixed this.  I do not  
> currently have free time to deal with the errors in the small number  
> of ORFs where this has happened.
> 
> I think you just need to do
>   start -= start- (frame*strand)
> for 1st exons.

I used
    if (strand==1) {start += exon->frame}
    else {end -= exon->frame}

This took me from 90 translations that had * within the sequence to just
9, out of 5500 CDS in S bayanus.

> You can also probably provide the 1st exon's frame to the translate  
> function as another possibility but you should try and get the CDS  
> correct first depending on your downstream analyses.

Yes, I think. Scott Cain pointed out that GFF column 8 is the "phase",
which I had never heard of before. My current, very limited,
understanding is that sometimes you'll have an exon with, say, 31 bp,
followed by an exon with 29 bp. When the intron gets spliced out, you
eventually get an mRNA of 60 bp, which translates to a protein of 20 aa.
But the second exon has a phase of 1, not 0, because you can't just
start translating at the first bp of the second exon and expect to get
nice amino acids.

By the way, whether or not phase is the same thing as frame, when I call
the frame() method on the features created by Bio::Tools::GFF, I get the
phase info. I assume that's a feature (no pun intended), not a bug?

I'm still confused as to why you would have a phase in the first exon,
though. Why not just say the CDS starts 1 or 2 bp later? (This is
probably a bio question, not a bioperl question, but a quick Google
didn't get me an answer. "Phase" isn't a very good search term.)

I guess the real question here, which Jason alludes to, is whether
SeqFeature->spliced_seq ought to take into account the phase information
of the first exon. Right now, it doesn't, so when you call
SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
where you would want spliced_seq to include the first bp or two? Should
there be an option to spliced_seq for whether you want to take phase
information into account?

I can't submit a bug report until we confirm it's a bug.

Thanks,
-Amir Karger

> -jason
> On Dec 7, 2006, at 1:32 PM, Amir Karger wrote:
> 
> > I need to know how to get the frame information in exon features
> > (created by Bio::Tools::GFF) into a whole-gene feature that will be
> > translated into a protein.
> >
> > I'm reading in some fungal GFFs generated by Jason Stajich. I
> >
> > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > - Create a Bio::Location::Split object containing each feature's
> > location
> > - Create a Bio::SeqFeature::Generic object whose location 
> is the above
> > BL::Split
> > - Attach my contig Bio::Seq to the feature
> > - get the protein with feature->spliced_seq->translate->seq
> >
> > (Code below)
> >
> > Unfortunately, I get the wrong result when the GFF features 
> have frame
> > != 0. This happens for only a few percent of the exons, but when it
> > does, I end up translating in the wrong frame.
> >
> > If I read the docs correctly, Location objects don't have a 
> frame. So
> > how do I get the correct spliced_seq, which skips one or 
> two bp at the
> > beginning of certain exons?
> >
> > I suspect the answer to this is that I'm going about this in  
> > completely
> > the wrong way, in which case, please tell me how I ought to be  
> > doing it.
> >
> > Thanks,
> > - Amir Karger
> > Research Computing
> > Life Sciences Division
> > Harvard University
> >
> > P.S. In case you want to see actual code, here it is. After using
> > Bio::Tools::GFF to create a sorted list of features for each exon
> > (basically stolen from the module POD), I:
> >     # Create a new object representing the exons' gene
> >     my $coding_loc_obj = new Bio::Location::Split;
> >     foreach my $exon (@sorted_exons) {
> >         $coding_loc_obj->add_sub_Location($exon->location);
> >     }
> >
> >     # Build a spliced feature representing the whole gene
> >     my $spliced_feat = new Bio::SeqFeature::Generic(
> >         -start  => $coding_loc_obj->start,
> >         -end    => $coding_loc_obj->end,
> >         -strand => $strand_num,
> >         -primary=> "splicedGene",
> >     );
> >     $spliced_feat->location($coding_loc_obj);
> >
> >     # Attach a contig object containing the sequence
> >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> >
> >     # Get the spliced seq and translate to protein:
> >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From akarger at CGR.Harvard.edu  Fri Dec  8 18:33:09 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Fri, 8 Dec 2006 13:33:09 -0500
Subject: [Bioperl-l] Using frame info from GFF in
	gettinga	Seq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>

> Another issue is the splittype() is not defined, though I 
> don't think that
> would kill anything as currently implemented.  However, one 
> thing we have
> passingly discussed is having Bio::Location::Split objects 
> possibly exhibit
> different (but expected) behaviors based upon the splittype() 
> (order, join,
> or bond).  It's one of the things I want to work out for the 
> next release.

Should I be writing -splittype => "JOIN" or some such in my new()?

-Amir Karger

> 
> chris
> 
> > Amir,
> > 
> > I don't know for sure what the problem is, but here is one 
> > possibility:
> > the number in column 8 of a GFF file is not the frame, it is 
> > the phase.
> > See the GFF3 spec for a description of what the phase is:
> > 
> >   http://www.sequenceontology.org/gff3.shtml
> > 
> > (It doesn't matter if you are using GFF3 or GFF2, as the 
> > phase is the same in both).
> > 
> > Scott
> > 
> > 
> > On Thu, 2006-12-07 at 16:32 -0500, Amir Karger wrote:
> > > I need to know how to get the frame information in exon features 
> > > (created by Bio::Tools::GFF) into a whole-gene feature 
> that will be 
> > > translated into a protein.
> > > 
> > > I'm reading in some fungal GFFs generated by Jason Stajich. I
> > > 
> > > - Use Bio::Tools::GFF to create a feature for each exon in a gene
> > > - Create a Bio::Location::Split object containing each feature's 
> > > location
> > > - Create a Bio::SeqFeature::Generic object whose location 
> > is the above 
> > > BL::Split
> > > - Attach my contig Bio::Seq to the feature
> > > - get the protein with feature->spliced_seq->translate->seq
> > > 
> > > (Code below)
> > > 
> > > Unfortunately, I get the wrong result when the GFF features 
> > have frame 
> > > != 0. This happens for only a few percent of the exons, 
> but when it 
> > > does, I end up translating in the wrong frame.
> > > 
> > > If I read the docs correctly, Location objects don't have a 
> > frame. So 
> > > how do I get the correct spliced_seq, which skips one or 
> > two bp at the 
> > > beginning of certain exons?
> > > 
> > > I suspect the answer to this is that I'm going about this in 
> > > completely the wrong way, in which case, please tell me how 
> > I ought to be doing it.
> > > 
> > > Thanks,
> > > - Amir Karger
> > > Research Computing
> > > Life Sciences Division
> > > Harvard University
> > > 
> > > P.S. In case you want to see actual code, here it is. After using 
> > > Bio::Tools::GFF to create a sorted list of features for each exon 
> > > (basically stolen from the module POD), I:
> > >     # Create a new object representing the exons' gene
> > >     my $coding_loc_obj = new Bio::Location::Split;
> > >     foreach my $exon (@sorted_exons) {
> > >         $coding_loc_obj->add_sub_Location($exon->location);
> > >     }
> > > 
> > >     # Build a spliced feature representing the whole gene
> > >     my $spliced_feat = new Bio::SeqFeature::Generic(
> > >         -start  => $coding_loc_obj->start,
> > >         -end    => $coding_loc_obj->end,
> > >         -strand => $strand_num,
> > >         -primary=> "splicedGene",
> > >     );
> > >     $spliced_feat->location($coding_loc_obj);
> > > 
> > >     # Attach a contig object containing the sequence
> > >     $spliced_feat->attach_seq($contig_obj->bioperl_object);
> > > 
> > >     # Get the spliced seq and translate to protein:
> > >     my $coding_seq = $spliced_feat->spliced_seq->seq;
> > >     my $protein = $spliced_feat->spliced_seq->translate->seq;
> 
> 
> 
> 


From cjfields at uiuc.edu  Fri Dec  8 19:04:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 13:04:55 -0600
Subject: [Bioperl-l] Using frame info from GFF
	ingettinga	Seq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA71@huls5.nucleus.harvard.edu>
Message-ID: <000901c71afb$bf504210$15327e82@pyrimidine>


> > Another issue is the splittype() is not defined, though I 
> don't think 
> > that would kill anything as currently implemented.  
> However, one thing 
> > we have passingly discussed is having Bio::Location::Split objects 
> > possibly exhibit different (but expected) behaviors based upon the 
> > splittype() (order, join, or bond).  It's one of the things 
> I want to 
> > work out for the next release.
> 
> Should I be writing -splittype => "JOIN" or some such in my new()?
> 
> -Amir Karger

I missed the fact that 'JOIN' is the default splittype() from looking at the
constructor in Location::Split, so you actually don't have to explicitly set
it; apologies for that.  

If we make any changes that affect how Location::Split behaves we'll likely
leave the default splittype() as 'JOIN' as it's by far the most common join
operator.  

chris


From cjfields at uiuc.edu  Fri Dec  8 20:03:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 8 Dec 2006 14:03:16 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E022BEA6D@huls5.nucleus.harvard.edu>
Message-ID: <000001c71b03$e6741e90$15327e82@pyrimidine>

> Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> "phase", which I had never heard of before. My current, very 
> limited, understanding is that sometimes you'll have an exon 
> with, say, 31 bp, followed by an exon with 29 bp. When the 
> intron gets spliced out, you eventually get an mRNA of 60 bp, 
> which translates to a protein of 20 aa.
> But the second exon has a phase of 1, not 0, because you 
> can't just start translating at the first bp of the second 
> exon and expect to get nice amino acids.

I think the use of 'frame' here is meant relative to the DNA sequence (i.e.
ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
translation, three frames).  At least I think that's what is meant!

> By the way, whether or not phase is the same thing as frame, 
> when I call the frame() method on the features created by 
> Bio::Tools::GFF, I get the phase info. I assume that's a 
> feature (no pun intended), not a bug?
> 
> I'm still confused as to why you would have a phase in the 
> first exon, though. Why not just say the CDS starts 1 or 2 bp 
> later? (This is probably a bio question, not a bioperl 
> question, but a quick Google didn't get me an answer. "Phase" 
> isn't a very good search term.)

It could be b/c the location coordinates delineate the exon coding boundary.
It's conceivable the first exon in a sequence record is not the first exon
of the mRNA (i.e. there may be one or more exons prior to or past the exon
of interest that are in 'remote' sequence records).  Like this admittedly
extreme example (GB acc AF130134):

join(AF130124.1:2563..2964,AF130125.1:21..157,AF130126.1:12..174,
AF130127.1:21..112,AF130128.1:21..162,AF130128.1:281..595,
AF130128.1:661..842,AF130128.1:916..1030,AF130129.1:21..115,
AF130130.1:21..165,AF130131.1:21..125,AF130132.1:21..428,
AF130132.1:492..746,AF130133.1:21..168,AF130133.1:232..401,
AF130133.1:475..906,AF130133.1:970..1107,AF130133.1:1176..1367,21..128)

Also, the ends of the lcoation may be uncertain ('fuzzy'):

join(complement(1009..>1260),complement(AF081827.1:<1..177))

> I guess the real question here, which Jason alludes to, is whether
> SeqFeature->spliced_seq ought to take into account the phase 
> information
> of the first exon. Right now, it doesn't, so when you call
> SeqFeature->spliced_seq->translate, you get gibberish. Are there cases
> where you would want spliced_seq to include the first bp or 
> two? Should there be an option to spliced_seq for whether you 
> want to take phase information into account?
> 
> I can't submit a bug report until we confirm it's a bug.
> 
> Thanks,
> -Amir Karger

You can already pass the frame or an offset to PrimarySeqI::translate().
Here are the args:

 Args    : -terminator    - character for terminator        default is *
           -unknown       - character for unknown           default is X
           -frame         - frame                           default is 0
           -codontable_id - codon table id                  default is 1
           -complete      - complete CDS expected           default is 0
           -throw         - throw exception if not complete default is 0
           -orf           - find 1st ORF                    default is 0
           -start         - alternative initiation codon
           -codontable    - Bio::Tools::CodonTable object
           -offset        - offset for fuzzy locations      default is 0

The offset comes from some GenBank seqfeatures which have an '\codon_start'
tag indicating which nucleotide to start translation from (1,2,3).  This is
essentially just the phase+1.  We could add a '-phase' argument for
convenience which accepts 0,1,2.

chris


From bobfreemanma at speakeasy.net  Fri Dec  8 20:47:15 2006
From: bobfreemanma at speakeasy.net (Bob Freeman)
Date: Fri, 8 Dec 2006 15:47:15 -0500
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
Message-ID: <p0623090bc19f7f46bd1d@[10.0.107.251]>

Can't seem to find a good post on this to answer my question:

Does anyone know a good way to (re)write BLAST reports in XML format? 
I've got about 30,000 reports I need to rewrite for a (good!) piece 
of java software that will only import xml formatted BLAST reports. 
Right now, all mine are plain text.

I don't think bioperl can do this yet, correct? If not, any 
suggestions, besides reblasting all 30,000? I'd like to save a few 
trees and lumps of coal.

TIA,
Bob

-- 

-----------------------------------------------------
Bob Freeman, Ph.D.
Bioinformatics consultant
51 Downer Avenue, #2
Dorchester, MA  02125
617/699.7057, vox

If brains were taxed, he'd get a refund.
-- Anonymous


From camp_boot at hotmail.com  Sun Dec 10 10:00:55 2006
From: camp_boot at hotmail.com (synapse)
Date: Sun, 10 Dec 2006 10:00:55 +0000 (UTC)
Subject: [Bioperl-l] Driver program for PestFind.pm
Message-ID: <loom.20061210T105614-429@post.gmane.org>

   Dear All, 

   I apologize in advance for my almost total lack of knowledge of perl as a 
programming language. 

   I need to use PestFind program, part of the biop_run package of bioperl. My 
understanding is that I will need a simple wrapper program that will read 
arguments from the command line, and pass them to that module. 

   - Is there such program available that I can just use?

   - Does anyone know if pestfind can work on multiple sequence files (in fasta 
format), or does it only process single sequence files?

   Thanks a lot for the feedback. 


From cjfields at uiuc.edu  Sun Dec 10 18:45:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:45:26 -0600
Subject: [Bioperl-l] writing blastxml
In-Reply-To: <p0623090bc19f7f46bd1d@[10.0.107.251]>
References: <4b5350650610250728s1a421199if2493c9c4660474d@mail.gmail.com>
	<000301c6f846$d6227760$15327e82@pyrimidine>
	<4b5350650610250820w1498b27dnd155896fbf9a2012@mail.gmail.com>
	<p0623090bc19f7f46bd1d@[10.0.107.251]>
Message-ID: <7FB4EBB9-BEDC-4250-BE2F-3F695D36F350@uiuc.edu>


On Dec 8, 2006, at 2:47 PM, Bob Freeman wrote:

> Can't seem to find a good post on this to answer my question:
>
> Does anyone know a good way to (re)write BLAST reports in XML format?
> I've got about 30,000 reports I need to rewrite for a (good!) piece
> of java software that will only import xml formatted BLAST reports.
> Right now, all mine are plain text.
>
> I don't think bioperl can do this yet, correct? If not, any
> suggestions, besides reblasting all 30,000? I'd like to save a few
> trees and lumps of coal.
>
> TIA,
> Bob

The only BioPerl writers for BLAST reports are in BSML and HTML, not  
BLAST XML.  I don't think there there have been any requests for it,  
and no one has really stepped forward to submit one.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 10 18:55:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 12:55:16 -0600
Subject: [Bioperl-l] Driver program for PestFind.pm
In-Reply-To: <loom.20061210T105614-429@post.gmane.org>
References: <loom.20061210T105614-429@post.gmane.org>
Message-ID: <32B0F15D-4144-43B6-AA81-5ED9BA848F45@uiuc.edu>


On Dec 10, 2006, at 4:00 AM, synapse wrote:

>    Dear All,
>
>    I apologize in advance for my almost total lack of knowledge of  
> perl as a
> programming language.
>
>    I need to use PestFind program, part of the biop_run package of  
> bioperl. My
> understanding is that I will need a simple wrapper program that  
> will read
> arguments from the command line, and pass them to that module.

PestFind is part of the EMBOSS suite of programs:

http://emboss.sourceforge.net/

The PestFind module in bioperl-run is actually used via Pise.

>    - Is there such program available that I can just use?

See above

>    - Does anyone know if pestfind can work on multiple sequence  
> files (in fasta
> format), or does it only process single sequence files?
>
>    Thanks a lot for the feedback.

No idea there, but the EMBOSS docs should tell you.

chris


From cjfields at uiuc.edu  Mon Dec 11 05:38:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 10 Dec 2006 23:38:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
Message-ID: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>

I am writing up a few bioperl-run modules and have a simple question,  
though I don't know if anyone knows the answer.  I was curious as to  
why parameters for most (all?) bioperl-run modules lack the '-'  
preceding them.  This came up re: StandAloneBlast last week  
(something Torsten fixed), but I noticed just about every bioperl-run  
module uses the dashless parameters.

chris


From n.haigh at sheffield.ac.uk  Mon Dec 11 06:44:25 2006
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Mon, 11 Dec 2006 06:44:25 +0000
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457CFE49.5010201@sheffield.ac.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   

No idea!

Is there any reason for/against using dashed/dashless parameters? I
suppose dshed parameters allow you to easy see which tokens on the
command line are parameters and which are values. Should modules be able
to accept both? Should dashed be preferred?

Nath


From cjfields at uiuc.edu  Mon Dec 11 13:06:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 07:06:32 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457CFE49.5010201@sheffield.ac.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457CFE49.5010201@sheffield.ac.uk>
Message-ID: <D223B6BF-7C0C-41BF-B267-8C07F82FDD7D@uiuc.edu>


On Dec 11, 2006, at 12:44 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple question,
>> though I don't know if anyone knows the answer.  I was curious as to
>> why parameters for most (all?) bioperl-run modules lack the '-'
>> preceding them.  This came up re: StandAloneBlast last week
>> (something Torsten fixed), but I noticed just about every bioperl-run
>> module uses the dashless parameters.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> No idea!
>
> Is there any reason for/against using dashed/dashless parameters? I
> suppose dshed parameters allow you to easy see which tokens on the
> command line are parameters and which are values. Should modules be  
> able
> to accept both? Should dashed be preferred?
>
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

I'm thinking about it from the point of consistency.  When using a  
mix of core and run modules it can be a bit confusing, particularly  
when (as pointed out in the previous thread on StandAloneBlast) you  
can use only dashed parameters with core modules, while most (all?)  
run modules only accept dashless ones (in most cases some exception  
is thrown).  Torsten fixed this in StandAloneBlast so it accepts  
both, but shouldn't this rule also apply to all run modules?

Much of this probably is probably due to the donated nature of much  
of the bioperl-run code and Jason's 'cat-herding', and I understand  
that it would be a lot of work to change this for all run modules.   
However, we could at least try to start enforcing some loose rules  
with new bioperl-run wrappers (e.g. implement WrapperBase, use core- 
like parameters, etc).

chris


From akarger at CGR.Harvard.edu  Mon Dec 11 16:20:03 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 11 Dec 2006 11:20:03 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>

Chris Fields wrote:
> 
> > Yes, I think. Scott Cain pointed out that GFF column 8 is the 
> > "phase", which I had never heard of before. My current, very 
> > limited, understanding is that sometimes you'll have an exon 
> > with, say, 31 bp, followed by an exon with 29 bp. When the 
> > intron gets spliced out, you eventually get an mRNA of 60 bp, 
> > which translates to a protein of 20 aa.
> > But the second exon has a phase of 1, not 0, because you 
> > can't just start translating at the first bp of the second 
> > exon and expect to get nice amino acids.
> 
> I think the use of 'frame' here is meant relative to the DNA 
> sequence (i.e.
> ORF searching, 6 frames) and the 'phase' is relative to the mRNA (i.e.
> translation, three frames).  At least I think that's what is meant!

I agree. By the way, I'd love a reference to a simple bio-explanation of
what's happening here. Google searches for "coding sequence phase" are
not all that relevant.

> > I'm still confused as to why you would have a phase in the 
> > first exon, though. Why not just say the CDS starts 1 or 2 bp 
> > later? (This is probably a bio question, not a bioperl 
> > question, but a quick Google didn't get me an answer. "Phase" 
> > isn't a very good search term.)
> 
> It could be b/c the location coordinates delineate the exon 
> coding boundary.
> It's conceivable the first exon in a sequence record is not 
> the first exon
> of the mRNA (i.e. there may be one or more exons prior to or 
> past the exon
> of interest that are in 'remote' sequence records).

That's certainly not the case here, because the files have the entire
genomes in them.

> Also, the ends of the lcoation may be uncertain ('fuzzy'):
> 
> join(complement(1009..>1260),complement(AF081827.1:<1..177))

Also not the case here. These locations aren't listed as fuzzy.

Any other thoughts?

> > I guess the real question here, which Jason alludes to, is whether
> > SeqFeature->spliced_seq ought to take into account the phase 
> > information
> > of the first exon. Right now, it doesn't, so when you call
> > SeqFeature->spliced_seq->translate, you get gibberish. Are 
> there cases
> > where you would want spliced_seq to include the first bp or 
> > two? Should there be an option to spliced_seq for whether you 
> > want to take phase information into account?
> 
> You can already pass the frame or an offset to 
> PrimarySeqI::translate().
>  We could add a '-phase' argument for
> convenience which accepts 0,1,2.

But as Jason pointed out, you should find the problem earlier. What if I
want to get the RNA sequence that will become the protein? then having a
phase arg to translate() doesn't help. Should there be a phase arg to
spliced_seq?

Which raises another bio question: at what point are the first 1 or 2 bp
dropped when you have a phase of 1 or 2? Do they appear in the mRNA? 

-Amir Karger


From bix at sendu.me.uk  Mon Dec 11 18:21:42 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 13:21:42 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
Message-ID: <457DA1B6.1060706@sendu.me.uk>

Chris Fields wrote:
> I am writing up a few bioperl-run modules and have a simple question,  
> though I don't know if anyone knows the answer.  I was curious as to  
> why parameters for most (all?) bioperl-run modules lack the '-'  
> preceding them.  This came up re: StandAloneBlast last week  
> (something Torsten fixed), but I noticed just about every bioperl-run  
> module uses the dashless parameters.

I didn't follow that particular thread, but from my experience there is 
a useful distinction between bioperl options using the - as normal for 
full consistency with core (eg. -verbose), whilst the options that 
belong to the program the run module is a wrapper for do not take 
dashes. Again, this seems consistent within the run package.

I'd suggest sticking to the current pattern.


Cheers,
Sendu.


From cjfields at uiuc.edu  Mon Dec 11 20:07:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 14:07:16 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DA1B6.1060706@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
Message-ID: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>


On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am writing up a few bioperl-run modules and have a simple  
>> question,  though I don't know if anyone knows the answer.  I was  
>> curious as to  why parameters for most (all?) bioperl-run modules  
>> lack the '-'  preceding them.  This came up re: StandAloneBlast  
>> last week  (something Torsten fixed), but I noticed just about  
>> every bioperl-run  module uses the dashless parameters.
>
> I didn't follow that particular thread, but from my experience  
> there is a useful distinction between bioperl options using the -  
> as normal for full consistency with core (eg. -verbose), whilst the  
> options that belong to the program the run module is a wrapper for  
> do not take dashes. Again, this seems consistent within the run  
> package.

I respectfully disagree that this is a 'useful' distinction.  My main  
point is consistency.  To me, it's counterintuitive to have two  
Bioperl classes, both which inherit Bio::Root::Root, use two  
different syntaxes for any parameters passed to the constructor, even  
if some are 'program' parameters.  It's also not consistent with  
StandAloneBlast or RemoteBlast, both which are considered bioperl-run  
modules even though they are in core, and both or which use dashed  
parameters (StandAloneBlast actually allows both).  In fact, it isn't  
consistent within bioperl-run itself.   
Bio::Tools::Run::EMBOSSApplication uses dashes for parameters in a  
hashref!

Okay, judging by the previous examples, 'consistency' isn't a word I  
would use to describe bioperl-run as a whole (back to Jason's 'cat- 
herding' analogy).  It would be easier to let it slide for now,  
especially since changing them would be a serious pain, not to  
mention an API issue.  But shouldn't there be some consistency?

And what about new modules?  Do we follow the historical (possibly  
confusing) 'dashless' route, or use the core-like dashed approach  
(thus breaking from the other run modules)?

> I'd suggest sticking to the current pattern.
>
>
> Cheers,
> Sendu.

I'll allow for both, ala StandAloneBlast.  Doesn't hurt to be safe. ; >

Have fun at the hackathon!

chris


From bix at sendu.me.uk  Mon Dec 11 21:19:55 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Dec 2006 16:19:55 -0500
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
Message-ID: <457DCB7B.8050500@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 11, 2006, at 12:21 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> I am writing up a few bioperl-run modules and have a simple 
>>> question,  though I don't know if anyone knows the answer.  I was 
>>> curious as to  why parameters for most (all?) bioperl-run modules 
>>> lack the '-'  preceding them.  This came up re: StandAloneBlast last 
>>> week  (something Torsten fixed), but I noticed just about every 
>>> bioperl-run  module uses the dashless parameters.
>>
>> I didn't follow that particular thread, but from my experience there 
>> is a useful distinction between bioperl options using the - as normal 
>> for full consistency with core (eg. -verbose), whilst the options that 
>> belong to the program the run module is a wrapper for do not take 
>> dashes. Again, this seems consistent within the run package.
> 
> I respectfully disagree that this is a 'useful' distinction.  My main 
> point is consistency.
[snip]

We're on the same page in terms of what we think would be a Good Thing, 
and allowing both ways (dashed and dashless) sounds reasonable. I was 
just suggesting why bioperl-run might be the way it was. Further to 
that, there is the practical aspect that it is a lot simpler to figure 
out which are the program options so they can be farmed out to the 
AUTOLOAD methods - again something that isn't done in core.

If you come up with some generic way of dealing with options and farming 
to AUTOLOAD, perhaps there's scope for applying it to all the run 
wrappers (ideally via one of their base classes), so they all instantly 
gain dashed-mode capability.


From cjfields at uiuc.edu  Mon Dec 11 22:05:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 16:05:56 -0600
Subject: [Bioperl-l] bioperl-run parameter question
In-Reply-To: <457DCB7B.8050500@sendu.me.uk>
References: <163AF1E6-7CEA-4CAC-9BA1-84DBA95C494E@uiuc.edu>
	<457DA1B6.1060706@sendu.me.uk>
	<F8A9FAC2-A189-463B-B8CA-E66D66863553@uiuc.edu>
	<457DCB7B.8050500@sendu.me.uk>
Message-ID: <F046DB23-35C7-414A-8616-46D3C5760B49@uiuc.edu>


On Dec 11, 2006, at 3:19 PM, Sendu Bala wrote:
...

>>
>> I respectfully disagree that this is a 'useful' distinction.  My main
>> point is consistency.
> [snip]
>
> We're on the same page in terms of what we think would be a Good  
> Thing,
> and allowing both ways (dashed and dashless) sounds reasonable. I was
> just suggesting why bioperl-run might be the way it was. Further to
> that, there is the practical aspect that it is a lot simpler to figure
> out which are the program options so they can be farmed out to the
> AUTOLOAD methods - again something that isn't done in core.

Maybe b/c AUTOLOAD is frowned upon for a number of reasons, mainly  
code maintenance.  I'm somewhat neutral on the idea of using AUTOLOAD  
as a short-term solution, though using heredoc and an eval{} block  
works well for me (and shows up when using $self->can('method') or  
when checking for methods via Class::Inspector).

> If you come up with some generic way of dealing with options and  
> farming
> to AUTOLOAD, perhaps there's scope for applying it to all the run
> wrappers (ideally via one of their base classes), so they all  
> instantly
> gain dashed-mode capability.

I think that's the crux of the problem; they do not all have the same  
base class (except Bio::Root::Root).  Most use WrapperBase.  I  
thought at one point a Run-specific root module would be a good idea,  
but WrapperBase already works well.

I'll go ahead with my modules and think about it some more.  You  
could ask the powers-that-be (jason, hilmar, etc) what they think as  
well.

chris


From bosborne11 at verizon.net  Mon Dec 11 22:24:54 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Mon, 11 Dec 2006 17:24:54 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <C1A344E6.BE53%bosborne11@verizon.net>

Amir,

Google "intron phase", you will see a number of useful links.

Brian O.


On 12/11/06 11:20 AM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:

> I agree. By the way, I'd love a reference to a simple bio-explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.


From cjfields at uiuc.edu  Tue Dec 12 03:20:06 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 11 Dec 2006 21:20:06 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DD2E@huls5.nucleus.harvard.edu>
Message-ID: <E6F0CA09-EF9F-42AF-BF67-35E4FDBCAD8C@uiuc.edu>


On Dec 11, 2006, at 10:20 AM, Amir Karger wrote:

>> I think the use of 'frame' here is meant relative to the DNA
>> sequence (i.e.
>> ORF searching, 6 frames) and the 'phase' is relative to the mRNA  
>> (i.e.
>> translation, three frames).  At least I think that's what is meant!
>
> I agree. By the way, I'd love a reference to a simple bio- 
> explanation of
> what's happening here. Google searches for "coding sequence phase" are
> not all that relevant.

Ah, Brian found some links I see...

>> It could be b/c the location coordinates delineate the exon
>> coding boundary.
>> It's conceivable the first exon in a sequence record is not
>> the first exon
>> of the mRNA (i.e. there may be one or more exons prior to or
>> past the exon
>> of interest that are in 'remote' sequence records).
>
> That's certainly not the case here, because the files have the entire
> genomes in them.
>
>> Also, the ends of the lcoation may be uncertain ('fuzzy'):
>>
>> join(complement(1009..>1260),complement(AF081827.1:<1..177))
>
> Also not the case here. These locations aren't listed as fuzzy.
>
> Any other thoughts?

Which GFF files did you use?  More specifically, which genes in which  
GFF file?  I saw a reference to S. bayanus, but it's hard to work out  
what could be the problem unless we know a bit more.

>>> I guess the real question here, which Jason alludes to, is whether
>>> SeqFeature->spliced_seq ought to take into account the phase
>>> information
>>> of the first exon. Right now, it doesn't, so when you call
>>> SeqFeature->spliced_seq->translate, you get gibberish. Are
>> there cases
>>> where you would want spliced_seq to include the first bp or
>>> two? Should there be an option to spliced_seq for whether you
>>> want to take phase information into account?
>>
>> You can already pass the frame or an offset to
>> PrimarySeqI::translate().
>>  We could add a '-phase' argument for
>> convenience which accepts 0,1,2.
>
> But as Jason pointed out, you should find the problem earlier. What  
> if I
> want to get the RNA sequence that will become the protein? then  
> having a
> phase arg to translate() doesn't help. Should there be a phase arg to
> spliced_seq?

You'll also note Jason mentioned there were possible errors in the  
gene prediction programs which produced the output

spliced_seq() is supposed to return the DNA sequence of a split  
location by splicing together the sublocation sequences in their  
'join' order.  So, if the first exon was out of phase, once spliced  
they should all be out of phase to the same degree, assuming all  
exons are joined together correctly.   Translating this using the  
phase should produce the correct amino acid sequence.

Note that Jason suggested passing the frame/phase of the first exon  
to translate(), not spliced_seq().  I also suggested translate().

> Which raises another bio question: at what point are the first 1 or  
> 2 bp
> dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
>
> -Amir Karger

Any sequence present in the sublocations (exons) would be in the  
spliced sequence.  This would have to include those nucleotides in  
exons skipped b/c of the phase since they are part of the coding region.

chris


From neetisomaiya at gmail.com  Tue Dec 12 12:06:20 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:36:20 +0530
Subject: [Bioperl-l] need help in phredPhrap
Message-ID: <764978cf0612120406m796b116dncd3a9e6c82ffe682@mail.gmail.com>

Hi,

I am running phredPharp, which runs phred, phrap and polyphred. Please refer
to the "Using a reference sequence" section of this link
http://droog.mbt.washington.edu/poly_doc50.html#REFER.
I am using the reference sequence as described in the link above.
With this I am getting the SNP positions on the contig sequence as well as
on the reference sequence.
Does anyone know if there is some output file which can also give me mapping
between contig sequence and reference sequence?
-- 
-Neeti
Even my blood says, B positive


From akarger at CGR.Harvard.edu  Tue Dec 12 16:05:43 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue, 12 Dec 2006 11:05:43 -0500
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
Message-ID: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>

(sorry if this thread is boring people)

Chris Fields wrote: 

> > I agree. By the way, I'd love a reference to a simple bio- 
> > explanation of
> > what's happening here. Google searches for "coding sequence 
> phase" are
> > not all that relevant.
> 
> Ah, Brian found some links I see...

Thanks, Brian! Amazing how "coding sequence phase" finds nothing but
"intron phase" finds a ton. This is why you need to actually learn
biology, rather than Googling it.

> Which GFF files did you use?  More specifically, which genes 
> in which  
> GFF file?  I saw a reference to S. bayanus, but it's hard to 
> work out  
> what could be the problem unless we know a bit more.

http://fungal.genome.duke.edu/annotations/sbay/gff/saccharomyces_bayanus
.20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)

c127 (for example) has two lines in that file:
sbay_c127       AUGUSTUS        mRNA    263     723     .       +
.       ID=sbay_c127-g1.1
sbay_c127       AUGUSTUS        CDS     263     723     .       +
1       Parent=sbay_c127-g1.1

Now go to gbrowse page:
http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
Type "sbay_c127:250-300" in the search box. 

As you can see from the translation track, if you start at bp 263, you
hit a stop codon after just a few aas. But if you use frame2/phase 1,
you get no stop codons all the way to the end of the contig.

> >> You can already pass the frame or an offset to
> >> PrimarySeqI::translate().
> >>  We could add a '-phase' argument for
> >> convenience which accepts 0,1,2.
> >
> >  What if I
> > want to get the RNA sequence that will become the protein? then  
> > having a
> > phase arg to translate() doesn't help. Should there be a 
> phase arg to
> > spliced_seq?
> 
> You'll also note Jason mentioned there were possible errors in the  
> gene prediction programs which produced the output

That's certainly possible. No gene prediction program will be perfect.
In this case, though, it's clear that it found a large region without
stop codons in it, and correctly identified the place to start
translating. I guess I'm just surprised that, if it found just one exon
in a gene (in the whole contig) why it would say the exon starts at 263
with a phase 1, instead of just saying it starts at 264.

> spliced_seq() is supposed to return the DNA sequence of a split  
> location by splicing together the sublocation sequences in their  
> 'join' order.  So, if the first exon was out of phase, once spliced  
> they should all be out of phase to the same degree, assuming all  
> exons are joined together correctly.   Translating this using the  
> phase should produce the correct amino acid sequence.
> 
> Note that Jason suggested passing the frame/phase of the first exon  
> to translate(), not spliced_seq().  I also suggested translate().

You're right. This brings the number of translated polypeptide sequences
that have lots of *s in them to 9 instead of 90. 

I guess I have two requests here. The first is, if a person wants to see
exactly which bps are translated to aas -- a nucelotide sequece of
exactly 3N bp starting (usually) with ATG -- then they might want an
argument to spliced_seq that skips the first one or two bp when
necessary. After all, they might want to study the DNA, not the
peptides.

The second request is for "intelligent objects". If my SeqFeatures know
that they're in phase 1, then when I call spliced_seq I want the
resulting objects to know that they're phase one, such that when I call
translate, Bioperl automatically skips the first bp or two. Admittedly,
there might be big ramifications to this.

Both requests of course made in the knowledge that Bioperl is open
source & developers have a lot to do with their time.

-Amir Karger

> > Which raises another bio question: at what point are the 
> first 1 or  
> > 2 bp
> > dropped when you have a phase of 1 or 2? Do they appear in the mRNA?
> >
> > -Amir Karger
> 
> Any sequence present in the sublocations (exons) would be in the  
> spliced sequence.  This would have to include those nucleotides in  
> exons skipped b/c of the phase since they are part of the 
> coding region.
> 
> chris
> 


From neetisomaiya at gmail.com  Tue Dec 12 12:14:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:44:10 +0530
Subject: [Bioperl-l] needle parser in bioperl?
Message-ID: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>

Hi,

Does anyone know of a bioperl parser for needle output, basically I won't
where the target sequence aligns on the template (i.e. coordinate on the
template where the taget aligns).

-- 
-Neeti
Even my blood says, B positive


From cjfields at uiuc.edu  Tue Dec 12 16:57:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 10:57:27 -0600
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>


On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

> Hi,
>
> Does anyone know of a bioperl parser for needle output, basically I  
> won't
> where the target sequence aligns on the template (i.e. coordinate  
> on the
> template where the taget aligns).
>
> -- 
> -Neeti
> Even my blood says, B positive

I answered this a number of months back:

http://tinyurl.com/yzlbx5

Basically, newer versions of EMBOSS have changed the output for the  
AlignIO::emboss parser (which parses needle).  I don't believe the  
parser has been fixed to deal with that, but Jason has pointed out  
you can use MSF output when running needle, then parse using AlignIO  
with the format set to 'msf'.

chris


From bosborne11 at verizon.net  Tue Dec 12 16:51:05 2006
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 12 Dec 2006 11:51:05 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
Message-ID: <C1A44829.BE76%bosborne11@verizon.net>

Neeti,

EMBOSS' needle and water produce alignments in what Bioperl calls 'emboss'
format, so you can use AlignIO to get SimpleAlign objects. The best
description of how to use SimpleAlign is the documentation in the module.

Brian O.


On 12/12/06 7:14 AM, "neeti somaiya" <neetisomaiya at gmail.com> wrote:

> Hi,
> 
> Does anyone know of a bioperl parser for needle output, basically I won't
> where the target sequence aligns on the template (i.e. coordinate on the
> template where the taget aligns).


From kaboroev at sfu.ca  Tue Dec 12 17:14:39 2006
From: kaboroev at sfu.ca (Keith Anthony Boroevich)
Date: Tue, 12 Dec 2006 09:14:39 -0800
Subject: [Bioperl-l] BLAST reports
Message-ID: <457EE37F.4020000@sfu.ca>

Hi everyone,

I would like to manipulate my blast results with bioperl but would also
like to have the html output of the blast.  What would be the best way
of going about this, as I don't see any write functions in any of the
blast modules I have looked at.  Would it be better to create my own
html layout from the blast data then attempt to recover this from bioperl?

keith

p.s. - does anyone know what the most informative blast "alignment view"
output is? xml i suppose?

-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276


From cjfields at uiuc.edu  Tue Dec 12 18:45:05 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 12:45:05 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <E073C68D-F5FD-4C48-A3E4-925B696E956A@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:
...

> http://fungal.genome.duke.edu/annotations/sbay/gff/ 
> saccharomyces_bayanus
> .20031001.AUGUSTUS.gff3.gz (Thanks for a Really Useful site, Jason!)
>
> c127 (for example) has two lines in that file:
> sbay_c127       AUGUSTUS        mRNA    263     723     .       +
> .       ID=sbay_c127-g1.1
> sbay_c127       AUGUSTUS        CDS     263     723     .       +
> 1       Parent=sbay_c127-g1.1
>
> Now go to gbrowse page:
> http://fungal.genome.duke.edu/cgi-bin/gbrowse/sbay/
> Type "sbay_c127:250-300" in the search box.
>
> As you can see from the translation track, if you start at bp 263, you
> hit a stop codon after just a few aas. But if you use frame2/phase 1,
> you get no stop codons all the way to the end of the contig.

Yes, but there are two things.  First, there is no distinct start  
codon.  Second, this is what the top NCBI BLASTX hit for that  
particular exon is:

 >gi|6323195|ref|NP_013267.1| Gene info Essential 100kDa subunit of  
the exocyst complex (Sec3p, Sec5p,
Sec6p, Sec8p, Sec10p, Sec15p, Exo70p, and Exo84p), which has
the essential function of mediating polarized targeting of
secretory vesicles to active sites of exocytosis; Sec10p [Saccharomyces
cerevisiae]
  gi|2498891|sp|Q06245|SEC10_YEAST Gene info Exocyst complex  
component SEC10
  gi|1234854|gb|AAB67490.1| Gene info L9362.12 gene product
  gi|1781307|emb|CAA70041.1| Gene info 100 kD exocyst complex  
component [Saccharomyces cerevisiae]
Length=871

  Score =  285 bits (728),  Expect = 7e-77
  Identities = 141/152 (92%), Positives = 149/152 (98%), Gaps = 0/152  
(0%)
  Frame = +2

Query  2     
FNDFYSMGKSDIVEQLRLSKNWKFNLKSVILMKNLLILSSKLETNSIPKTINTKLIIEKY  181
             +NDFYSMGKSDIVEQLRLSKNWK NLKSV LMKNLLILSSKLET+SIPKTINTKL 
+IEKY
Sbjct  168   
YNDFYSMGKSDIVEQLRLSKNWKLNLKSVKLMKNLLILSSKLETSSIPKTINTKLVIEKY  227

Query  182   
SEMMENKLLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  361
             SEMMEN 
+LLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE
Sbjct  228   
SEMMENELLENFNSAYRENNFTKLNEIAIILNNFNGGVNVIQSFINQHDYFIDTKQIDLE  287

Query  362  NEFENVFIKNVKFKERLVDFESHSVIVEASMQ  457
             NEFENVFIKNVKFKE+L+DFE+HSVI+E SMQ
Sbjct  288  NEFENVFIKNVKFKEQLIDFENHSVIIETSMQ  319


Note the query start is well into the predicted coding sequence.   
Both the lack of a start codon and the above BLASTX hit suggest this  
is not actually the first exon in the coding region.  Therefore the  
sequence retrieved from spliced_seq() is only part of the full coding  
region (it seems to lack at least one 3' exon as well).

>>>> You can already pass the frame or an offset to
>>>> PrimarySeqI::translate().
>>>>  We could add a '-phase' argument for
>>>> convenience which accepts 0,1,2.
>>>
>>>  What if I
>>> want to get the RNA sequence that will become the protein? then
>>> having a
>>> phase arg to translate() doesn't help. Should there be a
>> phase arg to
>>> spliced_seq?
>>
>> You'll also note Jason mentioned there were possible errors in the
>> gene prediction programs which produced the output
>
> That's certainly possible. No gene prediction program will be perfect.
> In this case, though, it's clear that it found a large region without
> stop codons in it, and correctly identified the place to start
> translating. I guess I'm just surprised that, if it found just one  
> exon
> in a gene (in the whole contig) why it would say the exon starts at  
> 263
> with a phase 1, instead of just saying it starts at 264.

Maybe the gene prediction didn't find the first exon, or didn't tie  
the predicted exons together.  Not unusual considering the number of  
predictions made.

>> spliced_seq() is supposed to return the DNA sequence of a split
>> location by splicing together the sublocation sequences in their
>> 'join' order.  So, if the first exon was out of phase, once spliced
>> they should all be out of phase to the same degree, assuming all
>> exons are joined together correctly.   Translating this using the
>> phase should produce the correct amino acid sequence.
>>
>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger

You may want to post these as enhancement requests to Bugzilla just  
so we can keep track.  I think passing a phase parameter to  
spliced_seq() can be easily accomplished; it's just a matter of  
returning a subseq of the spliced sequence based on the phase if  
set.  In fact, I am testing it out now.

The second may be more problematic, since there may be a time when  
one would want those extra nucleotides, so I don't think we would  
want removal of said nucleotides to be the default behavior.

Chris


From dmessina at wustl.edu  Tue Dec 12 18:44:29 2006
From: dmessina at wustl.edu (David Messina)
Date: Tue, 12 Dec 2006 12:44:29 -0600
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
References: <457EE37F.4020000@sfu.ca>
Message-ID: <083B4D17-CC7A-406C-9037-4DA5DC31AA05@wustl.edu>

Hi Keith,

Take a look at:
http://www.bioperl.org/wiki/HOWTO:SearchIO

You can read in a whole bunch of different blast formats (see Table  
1), and it is possible to write out in HTML. See:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Writing_and_formatting_output


I'm not sure what you mean by the most informative blast output. If  
you mean which one gives the most information, I'm pretty sure the  
standard Blast report has everything.


Dave


From neetisomaiya at gmail.com  Tue Dec 12 12:09:39 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 12 Dec 2006 17:39:39 +0530
Subject: [Bioperl-l] problem in running needle
Message-ID: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>

I am trying to run needle for the attached two sequence files, on a linux
machine. It says "Uncaught exception:  Assertion failed, raised at ajmem.c
:187".
Can anyone tell me what this could be coz of?

-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SEQ_1.REF
Type: application/octet-stream
Size: 44208 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq_of_contig11
Type: application/octet-stream
Size: 44344 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061212/2f733c0d/attachment-0009.obj>

From cjfields at uiuc.edu  Tue Dec 12 20:55:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 14:55:07 -0600
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <E5BB270E-46D1-4A8C-A268-938FF8235B67@uiuc.edu>


On Dec 12, 2006, at 6:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

This would be an EMBOSS error, not a BioPerl error.  Maybe the emboss  
list is the best place for this question?

http://emboss.open-bio.org/mailman/listinfo/emboss

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec 12 21:30:30 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 12 Dec 2006 15:30:30 -0600
Subject: [Bioperl-l] Using frame info from GFF in getting
	aSeq->spliced_seq
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
References: <B9182BFF5B004245BABC12956EA6322E0254DF2D@huls5.nucleus.harvard.edu>
Message-ID: <093AE0FF-3C88-4F97-B33F-836B295E3DE3@uiuc.edu>


On Dec 12, 2006, at 10:05 AM, Amir Karger wrote:

>> Note that Jason suggested passing the frame/phase of the first exon
>> to translate(), not spliced_seq().  I also suggested translate().
>
> You're right. This brings the number of translated polypeptide  
> sequences
> that have lots of *s in them to 9 instead of 90.
>
> I guess I have two requests here. The first is, if a person wants  
> to see
> exactly which bps are translated to aas -- a nucelotide sequece of
> exactly 3N bp starting (usually) with ATG -- then they might want an
> argument to spliced_seq that skips the first one or two bp when
> necessary. After all, they might want to study the DNA, not the
> peptides.
>
> The second request is for "intelligent objects". If my SeqFeatures  
> know
> that they're in phase 1, then when I call spliced_seq I want the
> resulting objects to know that they're phase one, such that when I  
> call
> translate, Bioperl automatically skips the first bp or two.  
> Admittedly,
> there might be big ramifications to this.
>
> Both requests of course made in the knowledge that Bioperl is open
> source & developers have a lot to do with their time.
>
> -Amir Karger
...

Amir,

I committed some code to CVS where I added a -phase parameter option  
to SeqFeatureI::spliced_seq().  I also added some tests to SeqFeature.t.

If you run the following after creating the SeqFeature object $sf  
(the seq object is $seq):

$sf->attach_seq($seq);

for my $phase (-1..3) {
     my $spliced = $sf->spliced_seq(-phase => $phase);
     print $spliced->seq,"\n";
     print $spliced->translate->seq,"\n";
}

You should get warnings for any other value than 0, 1, or 2.

I'll also note that the sequence you are having trouble with  
(sbay_c127) is 712 bp, so it doesn't contain the complete coding  
region.  I used it in the test case in SeqFeature.t.

Chris


From boris.steipe at utoronto.ca  Tue Dec 12 21:26:14 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Tue, 12 Dec 2006 16:26:14 -0500
Subject: [Bioperl-l] problem in running needle
In-Reply-To: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
References: <764978cf0612120409tc857053s7059e62a7f8aafc8@mail.gmail.com>
Message-ID: <F0B737D0-8555-4723-8B8D-50DAFF522AC8@utoronto.ca>

Looks like a memory allocation problem. Your whole sequence is in one  
single line, throwing a few linebreaks in there every 80th character  
or so will probably do the trick.

HTH
Boris

On 12-Dec-06, at 7:09 AM, neeti somaiya wrote:

> I am trying to run needle for the attached two sequence files, on a  
> linux
> machine. It says "Uncaught exception:  Assertion failed, raised at  
> ajmem.c
> :187".
> Can anyone tell me what this could be coz of?
>
> -- 
> -Neeti
> Even my blood says, B positive
> <SEQ_1.REF>
> <seq_of_contig11>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Derek.Fairley at bll.n-i.nhs.uk  Wed Dec 13 10:00:16 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Wed, 13 Dec 2006 10:00:16 -0000
Subject: [Bioperl-l] BLAST reports
In-Reply-To: <457EE37F.4020000@sfu.ca>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C657@bllmail.bll.n-i.nhs.uk>

Hi Keith,

>I would like to manipulate my blast results with bioperl but would also
>like to have the html output of the blast.  What would be the best way
>of going about this, as I don't see any write functions in any of the
>blast modules I have looked at.  Would it be better to create my own
>html layout from the blast data then attempt to recover this from bioperl?

Take a look at some of the example scripts here:
http://www.bioperl.org/wiki/Bioperl_scripts
Depending on your Bioperl installation, you may already have these in your /scripts directory or similar. The /examples/searchio/htmlwriter.pl script may be a good starting point.

>p.s. - does anyone know what the most informative blast "alignment view"
>output is? xml i suppose?

Assuming you want to get the HSPs, parsing blastxml reports seems to be the most reliable approach. Again, there's a useful script for this: take a look at /scripts/utilities/search2alnblocks.pls.

Derek.


-- 
 ><)))?> -cGRASP- <?(((><
 Keith Anthony Boroevich
 Davidson Lab
 Dept of Molecular Biology
 Simon Fraser University
 Tel: 604-268-7276

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Wed Dec 13 18:02:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 13 Dec 2006 12:02:14 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>

I am working on a few RNA-related things related to structure and  
have a few questions, specifically about Meta data.  This is sort of  
a proposal, but I would like to get everybody's thoughts about this  
to gauge what everyone thinks.  Jason, sorry to bug you but I thought  
it might be something that would be of use phylohackathon-wise.

Heikki has several modules present which adds meta data to sequences  
(Bio::Seq::Meta).  In this case, the meta data is stored as a string  
(Bio::Seq::Meta) or an array (Bio::Seq::Meta::Array).  In both cases  
you can have multiple types of meta data for a sequence based on a  
particular tag.  However, this also assumes that the meta data is  
somehow attached strictly to sequence data of some type.  It also  
doesn't allow for having mixed meta data types for a single sequence,  
such as attaching array data and string data to the same sequence.

Hence, I was thinking of a having a simple, generic meta data type  
(Bio::Meta), one which could encompass simple strings  
(Bio::Meta::Simple), arrays (Bio::Meta::Array), or any other  
structured type of data.  This could be used to annotate any  
PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,  
maybe in a collection (similar to AnnotationCollection).  I thought  
something like this may be of general use for any PrimarySeq  
(quality, structure), alignments like NEXUS and Stockholm,  
SeqFeatures where structure could be stored (tRNA or riboswitches), etc.

However, this also seems to fall into the category of sequence  
annotation.  So, would it be better to have a set of Bio::Annotation  
classes used for this purpose?

Flames and jibes welcome; I'm wearing my asbestos suit today....

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 01:06:14 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 13 Dec 2006 20:06:14 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
Message-ID: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>

I am trying to StandAloneBlast->blastall an array or Bio::Seq  
objects.  The documentation claims that blastall can be passed a file  
name, a Bio::Seq object, or an array of Bio::Seq objects, while the  
usage suggests that a reference to an array of Bio::Seq objects is  
what must be passed to blastall.

(from http://doc.bioperl.org/releases/bioperl-current/bioperl-live/ 
Bio/Tools/Run/StandAloneBlast.html#POD5)
Usage:
	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
Bio::Seq objects
	$blast_report = $factory->blastall(\@seq_array);

Should this be...
$report = $factory->blastall(@seq_array);
or
$report = $factory->blastall(\@seq_array);
???

And if you are blastall'ing an array of Seq objects, then does  
blastall just return one big blast report or should I be expecting an  
array of blast reports?

I've tried $report = $factory->blastall(@seq_array); which seems to  
work ok, except that when I process the results, there are only  
results for the first Seq object in the array.


-Andrew

--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From arareko at campus.iztacala.unam.mx  Thu Dec 14 01:37:27 2006
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 13 Dec 2006 19:37:27 -0600
Subject: [Bioperl-l] BioPerl page in Wikipedia
Message-ID: <4580AAD7.3000900@campus.iztacala.unam.mx>

Folks,

I've updated a little bit of the BioPerl page in the Wikipedia. I think 
it would be nice if we expand the article a little bit more since it's 
tagged as a "stub". Here's the link:

http://en.wikipedia.org/wiki/BioPerl

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Thu Dec 14 10:54:07 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Thu, 14 Dec 2006 11:54:07 +0100
Subject: [Bioperl-l] (no subject)
Message-ID: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>

Hello,
I am new bioperl and I have been trying to run the examples available in
bptutorial.pl and other basic literature. I have installed the latest
release of bioperl 1.5.2 in a usr/local/src directory. Any time I try to
retrieve the SwissProt and EMBL databases it gives me an error. With genbank
it seems to be fine. I wonder if the installation was not successful, as  I
would expect that these databases accesses were included in the modules of
BioPerl Core. In addition, I would like to ask whether to run Clustaw within
the setting of BioPerl I need to download and install it in the same
directory in which I have installed bioperl, or is it included in the module
of Bio::Align.
I am not sure whether this is the best place to ask these very basic
questions. If not, could anyone please refer me to the proper e mail
account?
Thank you very much in advance.

Luba Pardo MD, PhD


From bix at sendu.me.uk  Thu Dec 14 14:10:43 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:10:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
Message-ID: <45815B63.1020003@sendu.me.uk>

Andrew Stewart wrote:
> I am trying to StandAloneBlast->blastall an array or Bio::Seq  
> objects.  The documentation claims that blastall can be passed a file  
> name,

You're referring to 'In addition, sequence input may be in the form of 
either a Bio::Seq object or or an array of Bio::Seq objects'? I agree 
its not clear, but supplying a reference to an array is still supplying 
an array. Anyway, I'll clarify it.


In any case, the usage for the method is what you should pay attention to:

> Usage:
> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of  
> Bio::Seq objects
> 	$blast_report = $factory->blastall(\@seq_array);
> 
> Should this be...
> $report = $factory->blastall(@seq_array);
> or
> $report = $factory->blastall(\@seq_array);
> ???

It should be exactly what it says. A reference to the array.


> And if you are blastall'ing an array of Seq objects, then does  
> blastall just return one big blast report or should I be expecting an  
> array of blast reports?

Returns : Reference to a Blast object or BPlite object
            containing the blast report.

That means, just one big object, not an array.


From bix at sendu.me.uk  Thu Dec 14 14:42:18 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 09:42:18 -0500
Subject: [Bioperl-l] (no subject)
In-Reply-To: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
References: <58ff33550612140254gc7c52afs279b65390d40cda1@mail.gmail.com>
Message-ID: <458162CA.5030803@sendu.me.uk>

Luba Pardo wrote:
> Hello, I am new bioperl and I have been trying to run the examples
> available in bptutorial.pl and other basic literature. I have
> installed the latest release of bioperl 1.5.2 in a usr/local/src
> directory. Any time I try to retrieve the SwissProt and EMBL
> databases it gives me an error.

What exactly are you trying? Paste some relevant code along with the
exact error message you get when running that code.


> I wonder if the installation was not successful, as  I would expect
> that these databases accesses were included in the modules of BioPerl
> Core.

They should work with just core installed.


  In addition, I would like to ask whether to run Clustaw within
> the setting of BioPerl I need to download and install it in the same 
> directory in which I have installed bioperl, or is it included in the
> module of Bio::Align.

The ClustalW module is in the bioperl-run package, so install that in
the same way you installed bioperl (core). The actual ClustalW program 
you need to download and install according to its own instructions. You 
let Bioperl know about where you installed ClustalW by eg. setting an 
environment variable.

See 
http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/Alignment/Clustalw.html#DESCRIPTION
for details.


> I am not sure whether this is the best place to ask these very basic 
> questions. If not, could anyone please refer me to the proper e mail 
> account?

Its certainly the correct place, I hope we can resolve your problems.


From neetisomaiya at gmail.com  Thu Dec 14 08:02:37 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 14 Dec 2006 13:32:37 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
References: <764978cf0612120414o1eb77e28l1132eb4fa4cd9e1d@mail.gmail.com>
	<C60106D0-9A11-4B67-8B3D-87DF885F1D40@uiuc.edu>
Message-ID: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>

How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?

On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).  I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3.out
Type: application/octet-stream
Size: 204960 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/1416cef5/attachment-0004.obj>

From stewarta at nmrc.navy.mil  Thu Dec 14 16:34:43 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 11:34:43 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <45815B63.1020003@sendu.me.uk>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
Message-ID: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>

Thanks for the reply, Sendu.

So I've tried passing a reference to an array of Seq objects with the  
following code...
	
	push @blast_run, $factory->blastall(\@query);  # where @query is an  
array of Bio::Seq objects

(In case you're wondering, I'm pushing the report into an array of  
reports because I'm running several instances of blastall with  
different parameters each time.)

....and it throws me the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/ 
common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E

STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/ 
perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/ 
Bio/Tools/Run/StandAloneBlast.pm:557
STACK: main::run_blastall ./new_blast_script.pl:215
STACK: ./new_blast_script.pl:115
-----------------------------------------------------------

And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm returns...
757         my $status = system($commandstring);
758
759         $self->throw("$executable call crashed: $? $commandstring 
\n")
760           unless ($status==0) ;

So it looks like the system call isn't returning a happy $status.  At  
this point I'm pretty much stuck, though.  Blastall works just fine  
if I only send it a single Seq object.  Looking at _setinput, it  
appears a reference to an array of Seq objects should end up creating  
a multi-fasta file.  The only possibilities I can think of to explain  
this is...

- The -i file isn't be created for some reason when an (ref to) array  
of Seqs is passed
- There is something wrong with the -i file that is created and sent  
to blastall.
- Something else is wrong with the $commandstring being sent to the  
system call.

Does anyone see something here that I don't?


Thanks,
Andrew


On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:

> Andrew Stewart wrote:
>> I am trying to StandAloneBlast->blastall an array or Bio::Seq   
>> objects.  The documentation claims that blastall can be passed a  
>> file  name,
>
> You're referring to 'In addition, sequence input may be in the form  
> of either a Bio::Seq object or or an array of Bio::Seq objects'? I  
> agree its not clear, but supplying a reference to an array is still  
> supplying an array. Anyway, I'll clarify it.
>
>
> In any case, the usage for the method is what you should pay  
> attention to:
>
>> Usage:
>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of   
>> Bio::Seq objects
>> 	$blast_report = $factory->blastall(\@seq_array);
>> Should this be...
>> $report = $factory->blastall(@seq_array);
>> or
>> $report = $factory->blastall(\@seq_array);
>> ???
>
> It should be exactly what it says. A reference to the array.
>
>
>> And if you are blastall'ing an array of Seq objects, then does   
>> blastall just return one big blast report or should I be expecting  
>> an  array of blast reports?
>
> Returns : Reference to a Blast object or BPlite object
>            containing the blast report.
>
> That means, just one big object, not an array.


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Thu Dec 14 17:03:12 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 11:03:12 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>


On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?

The error pops up when the executable returns a bad status, so maybe  
it's choking on too many input sequences (i.e. Bioperl is doing  
everything correctly, but you are attempting to BLAST too many  
sequences in one go).  How many sequences are you attempting to use  
as input?  What happens when you use fewer input sequences?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 17:49:45 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 12:49:45 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
Message-ID: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>

> So can you look at the tempfile that is created and see if it is sane?
>
> Set -save_tempfiles => 1 whene you initialize the factory object or do
> $factory->save_tempfiles(1)
> before calling the blastall.
>
> -jason
>

Jason,
I was actually wondering how to do that.  Thanks.  Odd though, it  
still doesn't seem to be saving the tempfiles.  Might not matter  
though, because...

> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>

I was processing 738 sequences for input.  I cut that down to 20  
sequences and I'm getting some other exception thrown further  
downstream, so it appears you may be correct.  You don't happen to  
know what the max number of sequences that blastall allows for input,  
would ya? ;)  I suppose I'll have to break @query down into smaller  
doses or something.

Thanks,
Andrew


On Dec 14, 2006, at 12:03 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 10:34 AM, Andrew Stewart wrote:
>
>> Thanks for the reply, Sendu.
>>
>> So I've tried passing a reference to an array of Seq objects with the
>> following code...
>> 	
>> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
>> array of Bio::Seq objects
>>
>> (In case you're wondering, I'm pushing the report into an array of
>> reports because I'm running several instances of blastall with
>> different parameters each time.)
>>
>> ....and it throws me the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  - 
>> d  "/
>> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
>> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/ 
>> lib/
>> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
>> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/ 
>> perl5/5.8.6/
>> Bio/Tools/Run/StandAloneBlast.pm:557
>> STACK: main::run_blastall ./new_blast_script.pl:215
>> STACK: ./new_blast_script.pl:115
>> -----------------------------------------------------------
>>
>> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
>> returns...
>> 757         my $status = system($commandstring);
>> 758
>> 759         $self->throw("$executable call crashed: $? $commandstring
>> \n")
>> 760           unless ($status==0) ;
>>
>> So it looks like the system call isn't returning a happy $status.  At
>> this point I'm pretty much stuck, though.  Blastall works just fine
>> if I only send it a single Seq object.  Looking at _setinput, it
>> appears a reference to an array of Seq objects should end up creating
>> a multi-fasta file.  The only possibilities I can think of to explain
>> this is...
>>
>> - The -i file isn't be created for some reason when an (ref to) array
>> of Seqs is passed
>> - There is something wrong with the -i file that is created and sent
>> to blastall.
>> - Something else is wrong with the $commandstring being sent to the
>> system call.
>>
>> Does anyone see something here that I don't?
>
> The error pops up when the executable returns a bad status, so  
> maybe it's choking on too many input sequences (i.e. Bioperl is  
> doing everything correctly, but you are attempting to BLAST too  
> many sequences in one go).  How many sequences are you attempting  
> to use as input?  What happens when you use fewer input sequences?
>
> chris
>


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From Derek.Fairley at bll.n-i.nhs.uk  Thu Dec 14 17:58:10 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Thu, 14 Dec 2006 17:58:10 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>

Neeti,

 
>From http://emboss.sourceforge.net/apps/cvs/needle.html:

 
"The results can be output in one of several styles by using the
command-line qualifier -aformat xxx, where 'xxx' is replaced by the name
of the required format. Some of the alignment formats can cope with an
unlimited number of sequences, while others are only for pairs of
sequences. 

 
The available multiple alignment format names are: unknown, multiple,
simple, fasta, msf, trace, srs 

 
The available pairwise alignment format names are: pair, markx0, markx1,
markx2, markx3, markx10, srspair, score 

 
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
information on alignment formats."

 
Not sure based on this whether you can get pairwise alignment in .msf
format; can't think of a good reason why not. The BioPerl Align::IO
module will allow you to parse alignments in .msf format.

 
HTH,

 
Derek.

 
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

 
How do I run needle specifying that I want the MSF format, on a linux
box?

The help doesnt show me any format option. Is there anything available
to

pasre MSF format?

Please find an example alignment file attached. Here the seq_of_contig

aligns with the reference sequence (i.e. SEQ_1.REF) starting at position

(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from
the

output alignment, how can I parse the result to get this?

 
On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:

>

>

> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:

>

> > Hi,

> >

> > Does anyone know of a bioperl parser for needle output, basically I

> > won't

> > where the target sequence aligns on the template (i.e. coordinate

> > on the

> > template where the taget aligns).

> >

> > --

> > -Neeti

> > Even my blood says, B positive

>

> I answered this a number of months back:

>

> http://tinyurl.com/yzlbx5

>

> Basically, newer versions of EMBOSS have changed the output for the

> AlignIO::emboss parser (which parses needle).  I don't believe the

> parser has been fixed to deal with that, but Jason has pointed out

> you can use MSF output when running needle, then parse using AlignIO

> with the format set to 'msf'.

>

> chris

>

 
-- 

-Neeti

Even my blood says, B positive


From cjfields at uiuc.edu  Thu Dec 14 18:36:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 12:36:09 -0600
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
Message-ID: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>


On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:

>> So can you look at the tempfile that is created and see if it is  
>> sane?
>>
>> Set -save_tempfiles => 1 whene you initialize the factory object  
>> or do
>> $factory->save_tempfiles(1)
>> before calling the blastall.
>>
>> -jason
>>
>
> Jason,
> I was actually wondering how to do that.  Thanks.  Odd though, it
> still doesn't seem to be saving the tempfiles.  Might not matter

That needs to be checked out.  Can anyone verify that?

>> The error pops up when the executable returns a bad status, so
>> maybe it's choking on too many input sequences (i.e. Bioperl is
>> doing everything correctly, but you are attempting to BLAST too
>> many sequences in one go).  How many sequences are you attempting
>> to use as input?  What happens when you use fewer input sequences?
>>
>> chris
>>
>
> I was processing 738 sequences for input.  I cut that down to 20
> sequences and I'm getting some other exception thrown further
> downstream, so it appears you may be correct.  You don't happen to
> know what the max number of sequences that blastall allows for input,
> would ya? ;)  I suppose I'll have to break @query down into smaller
> doses or something.
>
> Thanks,
> Andrew

It was a shot in the dark, really.  The fact that the return status  
was bad could be due to a number of problems (permissions issues, bad  
data, etc).  The fact that a single sequence worked indicated that  
permissions and output format likely weren't to blame.  The only  
other thing left was a problem with blastall itself.

BTW, the blast docs do not indicate whether there is a maximum number  
of sequences.  There may be a point where available memory becomes  
the limiting issue.

chris


From vaughn at cshl.edu  Thu Dec 14 19:09:34 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Thu, 14 Dec 2006 14:09:34 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type checking
Message-ID: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>

Dear all,

I'm trying to bring some of my code into compliance with the BioPerl  
1.5.2 and am running into some design decisions that I am unclear on.  
Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking  
of the 'type' against SOFA? It seems to me that this should be  
optional behavior as is the case with the Bio::FeatureIO family. I'd  
be happy to write the patch if there is any agreement with me on this  
case.

Thanks,

Matt

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2413 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061214/59a9ac32/attachment.p7s>

From jason at bioperl.org  Thu Dec 14 16:59:20 2006
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Dec 2006 11:59:20 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
Message-ID: <640E2BB7-33F3-44C9-B903-9DDA54F02D12@bioperl.org>

So can you look at the tempfile that is created and see if it is sane?

Set -save_tempfiles => 1 whene you initialize the factory object or do
$factory->save_tempfiles(1)
before calling the blastall.

-jason
On Dec 14, 2006, at 11:34 AM, Andrew Stewart wrote:

> Thanks for the reply, Sendu.
>
> So I've tried passing a reference to an array of Seq objects with the
> following code...
> 	
> 	push @blast_run, $factory->blastall(\@query);  # where @query is an
> array of Bio::Seq objects
>
> (In case you're wondering, I'm pushing the report into an array of
> reports because I'm running several instances of blastall with
> different parameters each time.)
>
> ....and it throws me the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 11 /common/bin/blastall -p  blastp  -d  "/
> common/data/BACILLUS.pep"  -i  /tmp/Z69hzaqEbR  -o  /tmp/02Zja7AF3E
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:759
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast /sw/lib/
> perl5/5.8.6/Bio/Tools/Run/StandAloneBlast.pm:706
> STACK: Bio::Tools::Run::StandAloneBlast::blastall /sw/lib/perl5/5.8.6/
> Bio/Tools/Run/StandAloneBlast.pm:557
> STACK: main::run_blastall ./new_blast_script.pl:215
> STACK: ./new_blast_script.pl:115
> -----------------------------------------------------------
>
> And % more -Nl 759 /path/to/Bio/Tools/Run/StandAloneBlast.pm  
> returns...
> 757         my $status = system($commandstring);
> 758
> 759         $self->throw("$executable call crashed: $? $commandstring
> \n")
> 760           unless ($status==0) ;
>
> So it looks like the system call isn't returning a happy $status.  At
> this point I'm pretty much stuck, though.  Blastall works just fine
> if I only send it a single Seq object.  Looking at _setinput, it
> appears a reference to an array of Seq objects should end up creating
> a multi-fasta file.  The only possibilities I can think of to explain
> this is...
>
> - The -i file isn't be created for some reason when an (ref to) array
> of Seqs is passed
> - There is something wrong with the -i file that is created and sent
> to blastall.
> - Something else is wrong with the $commandstring being sent to the
> system call.
>
> Does anyone see something here that I don't?
>
>
> Thanks,
> Andrew
>
>
>
> On Dec 14, 2006, at 9:10 AM, Sendu Bala wrote:
>
>> Andrew Stewart wrote:
>>> I am trying to StandAloneBlast->blastall an array or Bio::Seq
>>> objects.  The documentation claims that blastall can be passed a
>>> file  name,
>>
>> You're referring to 'In addition, sequence input may be in the form
>> of either a Bio::Seq object or or an array of Bio::Seq objects'? I
>> agree its not clear, but supplying a reference to an array is still
>> supplying an array. Anyway, I'll clarify it.
>>
>>
>> In any case, the usage for the method is what you should pay
>> attention to:
>>
>>> Usage:
>>> 	$seq_array_ref = \@seq_array;  # where @seq_array is an array of
>>> Bio::Seq objects
>>> 	$blast_report = $factory->blastall(\@seq_array);
>>> Should this be...
>>> $report = $factory->blastall(@seq_array);
>>> or
>>> $report = $factory->blastall(\@seq_array);
>>> ???
>>
>> It should be exactly what it says. A reference to the array.
>>
>>
>>> And if you are blastall'ing an array of Seq objects, then does
>>> blastall just return one big blast report or should I be expecting
>>> an  array of blast reports?
>>
>> Returns : Reference to a Blast object or BPlite object
>>            containing the blast report.
>>
>> That means, just one big object, not an array.
>
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stewarta at nmrc.navy.mil  Thu Dec 14 21:23:07 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 16:23:07 -0500
Subject: [Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects
In-Reply-To: <97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
References: <3A26D139-1963-4E47-8A70-910B3886AE18@nmrc.navy.mil>
	<45815B63.1020003@sendu.me.uk>
	<2DAAB59E-A4F9-4E2F-B1E5-F34376B5D1E0@nmrc.navy.mil>
	<88DDC5EA-C4BE-48FB-B259-B6584F5F86B1@uiuc.edu>
	<704E0191-A0E3-4DD2-A8F4-A0B9BE8E3AEE@nmrc.navy.mil>
	<97FE8E3C-58F2-406F-909D-DD479E594530@uiuc.edu>
Message-ID: <E1CF879B-7A07-4CE7-A0D0-C7749ECFF8FC@nmrc.navy.mil>

> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris

Interesting.  I ran the 738-sequence dataset through blastall  
manually and the report only returned 198 of the 738 expected  
results.  Not only that, it seems to have just cut off right in the  
middle of the 198th result and a Segmentation fault was reported.   I  
removed the 198th sequence, wondering if it might be some issue with  
the input, and the segmentation fault occured again with the results  
ending on the 210th result.  I stuck the 198th sequence back in, but  
at the start of the file and sure enough the Segmentation error  
occurred earlier.  I think we can rule out the size of the input or  
number of sequences as the source of error here.  I'm more inclined  
to think it has something to do with the blast databases being  
queried against.

I found an old discussion on a problem that sounds fairly similar to  
this one, for anyone interested.
http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html

I think I'll try to work around the problem for now.

andrew


On Dec 14, 2006, at 1:36 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:
>
>>> So can you look at the tempfile that is created and see if it is  
>>> sane?
>>>
>>> Set -save_tempfiles => 1 whene you initialize the factory object  
>>> or do
>>> $factory->save_tempfiles(1)
>>> before calling the blastall.
>>>
>>> -jason
>>>
>>
>> Jason,
>> I was actually wondering how to do that.  Thanks.  Odd though, it
>> still doesn't seem to be saving the tempfiles.  Might not matter
>
> That needs to be checked out.  Can anyone verify that?
>
>>> The error pops up when the executable returns a bad status, so
>>> maybe it's choking on too many input sequences (i.e. Bioperl is
>>> doing everything correctly, but you are attempting to BLAST too
>>> many sequences in one go).  How many sequences are you attempting
>>> to use as input?  What happens when you use fewer input sequences?
>>>
>>> chris
>>>
>>
>> I was processing 738 sequences for input.  I cut that down to 20
>> sequences and I'm getting some other exception thrown further
>> downstream, so it appears you may be correct.  You don't happen to
>> know what the max number of sequences that blastall allows for input,
>> would ya? ;)  I suppose I'll have to break @query down into smaller
>> doses or something.
>>
>> Thanks,
>> Andrew
>
> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From lincoln.stein at gmail.com  Thu Dec 14 20:24:56 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 15:24:56 -0500
Subject: [Bioperl-l] Bio::Graphics xyplot
In-Reply-To: <4578951B.5050206@sfu.ca>
References: <4578951B.5050206@sfu.ca>
Message-ID: <6dce9a0b0612141224r1ef7cce2s6e6123461c3827d8@mail.gmail.com>

Hi,

The way it works is that you create a single feature that spans the entire
range of the xyplot. It contains subfeatures, each of which has a score. The
graph points correspond to each of the subfeatures.

Lincoln

On 12/7/06, Keith Anthony Boroevich <kaboroev at sfu.ca> wrote:
>
> Hi everyone,
>
> I'm attempting to add an xyplot of the phred quality scores to an
> Bio::Graphics image, and cannot get it to work.
> I have the panel with a track for both the scale and the DNA displaying
> properly.  When I attempt to add the xyplot i just get a garbled track
> of, what looks like, timy xyplots for each datapoint.  I have the cvs
> (updated today) of bioperl-live running.  I think what I am missing is
> the creation of a "Sequence Feature Group" to hold the individual points
> of the plot.  However, I cannot seem to find such an object. This is
> what I attempted:
>
> -------BEGIN---CODE-----------
> # start panel
> my $panel = Bio::Graphics::Panel->new(-length    => $f_seqlen,
>                       -width     => $f_seqlen*10,
>                       -pad_left  => 10,
>                       -pad_right => 10,
>                       -grid      => 1
>                       );
> # add scale
> $panel->add_track(arrow =>
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$f_seqlen),
>               -double  => 1,
>               -tick    => 2,
>               -fgcolor => 'black');
> # add DNA ($feature is of type Bio::SeqFeature::Annotated)
> $panel->add_track(dna => $feature);
> # get list of quality scores from database
> my ($pqs_value) = $dbh->selectrow_array($sql);
> my @pqs_value = split(/\s/,$pqs_value);
> # create track
> my $track =  $panel->add_track(-glyph        => 'xyplot',
>                    -graph_type   => 'points',
>                    -point_symbol => 'point',
>                    -max_score    => 100,
>                    -min_score    => 0,
>                    -scale        => 'none');
> # add "subfeatures" to
> for (my $i=0;$i<$f_seqlen;$i++) {
>
>
> $track->add_feature(Bio::SeqFeature::Generic->new(-start=>$i,-end=>$i,-score=>$pqs_value[$i]));
>
> }
> print $panel->png();
> $panel->finished;
> ------END---CODE----------
>
> I also attempted to create an array of the point features and passed
> that by reference to the panel "add_track" as it describes in the xyplot
> documentation, but that resulted in the exact same image.
>
> keith
>
> --
> ><)))?> -cGRASP- <?(((><
> Keith Anthony Boroevich
> Davidson Lab
> Dept of Molecular Biology
> Simon Fraser University
> Tel: 604-268-7276
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Thu Dec 14 22:15:07 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 14 Dec 2006 17:15:07 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
Message-ID: <4581CCEB.20206@sendu.me.uk>

Matthew Vaughn wrote:
> Dear all,
> 
> I'm trying to bring some of my code into compliance with the BioPerl 
> 1.5.2 and am running into some design decisions that I am unclear on. 
> Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> the 'type' against SOFA? It seems to me that this should be optional 
> behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> write the patch if there is any agreement with me on this case.

Lots of people seem to have worked on it over the years, but perhaps 
Scott Cain is the person to talk to?

revision 1.4
date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
two things:
   * adding SOFA as an available ontology to DocumentRegistry.pm
   * modifying FeatureIO::gff to use SOFA to validate, and to parse 
Ontology_term


From lincoln.stein at gmail.com  Thu Dec 14 21:56:41 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Thu, 14 Dec 2006 16:56:41 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
Message-ID: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>

Hi All,

I'm afraid that the xyplot glyph that is in the recent bioperl release has
an error that causes the content to be printed to the right of the correct
position. Unfortunately this wasn't caught before the release because the
glyph was only tested on very large (whole genome) features.

You will need to do a CVS update to get a fixed version from bioperl-live. A
future bugfix release of gbrowse will patch this glyph for you
automatically.

Lincoln

On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>
> Hi,
> I'm having a problem getting features and an xyplot properly aligned in
> Gbrowse.  For example, see this page:
>
> http://tinyurl.com/ylbq3q
>
> The feature in the "CENPK SNPs" track should actually be around the peak
> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP feature
> is at position 79, and the xyplot axes and data should span from 61 - 95.
> However, as you can see, the data in the xyplot are oddly separated from
> the axes (which seem to be in the correct place), with the data shifted over
> to about position 120-155.
> This occurs elsewhere, not just at the ends of the chromosomes.
>
> When I zoom to ~80 bp, all is well, see:
>
> http://tinyurl.com/yzav8k
>
> The relevant snippets from the GFF and the config files are below.
>
> Thanks!
> Kara
>
> GFF:
>
> chrI SNPScanner CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> is 2.24506
> chrI SNPScanner CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> is 3.26837
> chrI SNPScanner CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> is 1.39938
> chrI SNPScanner CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> is 1.4039
> chrI SNPScanner CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> is 9.16134
> chrI SNPScanner CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> is 10.1413
> chrI SNPScanner CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> is 12.9256
> chrI SNPScanner CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> is 13.195
> chrI SNPScanner CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> is 22.7127
> chrI SNPScanner CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> is 23.8289
> chrI SNPScanner CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> is 21.9123
> chrI SNPScanner CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> is 28.3344
> chrI SNPScanner CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> is 35.0436
> chrI SNPScanner CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> is 37.361
> chrI SNPScanner CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> is 39.5408
> chrI SNPScanner CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> is 28.2008
> chrI SNPScanner CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> is 32.6254
> chrI SNPScanner CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> is 36.0832
> chrI SNPScanner CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> is 41.9883
> chrI SNPScanner CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> is 32.1205
> chrI SNPScanner CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> is 41.3048
> chrI SNPScanner CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> is 30.7975
> chrI SNPScanner CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> is 29.4282
> chrI SNPScanner CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> is 35.3586
> chrI SNPScanner CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> is 34.1426
> chrI SNPScanner CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> is 30.2966
> chrI SNPScanner CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> is 17.8402
> chrI SNPScanner CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> is 15.2637
> chrI SNPScanner CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> is 12.657
> chrI SNPScanner CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> is 10.2033
> chrI SNPScanner CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> is 9.40143
> chrI SNPScanner CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> is 6.56273
> chrI SNPScanner CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> is 3.66211
> chrI SNPScanner CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> is 0.394194
>
> CONFIG:
>
>
> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>
> [CENPK_all_scores_graph]
> feature = GRAPH_CENPK:SNPScanner
> glyph = xyplot
> graph_type = boxes
> fgcolor = purple
> bgcolor = purple
> height = 100
> min_score = 0
> max_score = 110
> label = 0
> key = CENPK prediction signal
> link =
> category = SNPs: signal graphs
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From dmessina at wustl.edu  Fri Dec 15 01:45:24 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 19:45:24 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
Message-ID: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>

Hey Chris,

My thoughts below.

> [Chris]
> This could be used to annotate any
> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
> maybe in a collection (similar to AnnotationCollection).  I thought
> something like this may be of general use for any PrimarySeq
> (quality, structure), alignments like NEXUS and Stockholm,
> SeqFeatures where structure could be stored (tRNA or riboswitches),  
> etc.
>
> However, this also seems to fall into the category of sequence
> annotation.  So, would it be better to have a set of Bio::Annotation
> classes used for this purpose?


To me, all meta data is equal. That is, your classic Genbank feature  
annotation and a user's arbitrary meta-tag like "Bob thinks this is a  
kinase domain" aren't different in kind even if they are different in  
content.

As resequencing projects multiply, the ability to create arbitrary  
meta tags, attach them to different types of objects, and use those  
tags to link them together will become desirable, if not essential.

Keeping a common interface to all of these meta data types would be  
advantageous, plus new users won't have to determine whether they  
need to use Bio::Meta objects or Bio::Annotation objects.

So I would argue for all of the meta data types to live "under one  
roof". Which roof isn't as important. Bio::Annotation, since it  
already exists for today's meta data, seems like a reasonable choice.  
(assuming Annotation objects are flexible enough to be extended as  
you propose)

There, and no flames or jibes even. :)

Dave


From cjfields at uiuc.edu  Fri Dec 15 02:21:10 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 14 Dec 2006 20:21:10 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
Message-ID: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>


On Dec 14, 2006, at 7:45 PM, David Messina wrote:

> Hey Chris,
>
> My thoughts below.
>
>> [Chris]
>> This could be used to annotate any
>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>> maybe in a collection (similar to AnnotationCollection).  I thought
>> something like this may be of general use for any PrimarySeq
>> (quality, structure), alignments like NEXUS and Stockholm,
>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>> etc.
>>
>> However, this also seems to fall into the category of sequence
>> annotation.  So, would it be better to have a set of Bio::Annotation
>> classes used for this purpose?
>
>
> To me, all meta data is equal. That is, your classic Genbank feature
> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
> kinase domain" aren't different in kind even if they are different in
> content.
>
> As resequencing projects multiply, the ability to create arbitrary
> meta tags, attach them to different types of objects, and use those
> tags to link them together will become desirable, if not essential.
>
> Keeping a common interface to all of these meta data types would be
> advantageous, plus new users won't have to determine whether they
> need to use Bio::Meta objects or Bio::Annotation objects.
>
> So I would argue for all of the meta data types to live "under one
> roof". Which roof isn't as important. Bio::Annotation, since it
> already exists for today's meta data, seems like a reasonable choice.
> (assuming Annotation objects are flexible enough to be extended as
> you propose)
>
> There, and no flames or jibes even. :)

I guess what I want to know is whether there should to be a  
distinction between 'normal' sequence annotation (comments,  
references, and so on) and annotation that could be best described as  
position-specific (like RNA or protein structural annotation).  The  
current meta implementation is for sequence data only; I felt it  
would be nice to have a generic implementation that would be  
applicable to any object data.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Fri Dec 15 02:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <9C72012A-EFD7-42DD-93F8-578251CFDE01@wustl.edu>

And it all seemed so clear to me when I wrote it. :)

> whether there should to be a distinction

I would argue no because it would contravene a s


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


From dmessina at wustl.edu  Fri Dec 15 02:46:27 2006
From: dmessina at wustl.edu (David Messina)
Date: Thu, 14 Dec 2006 20:46:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
Message-ID: <E4629E7B-E42C-4B93-869F-FE26035052A0@wustl.edu>

[oops, accidentally hit send midsentence]


And it all seemed so clear to me when I wrote it. :)


> whether there should to be a distinction

I would argue no because it would contravene a standard interface.


> a generic implementation that would be applicable to any object data.

I wholeheartedly agree that this is the way to go. A generic  
implementation would allow arbitrary object data while maintaining a  
standard interface.


Dave


From neetisomaiya at gmail.com  Fri Dec 15 05:21:42 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 10:51:42 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612140002m2a8c4268ma4b55f12412c5e9d@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C669@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>

Hi,

Thanks a lot for your response.
I ran needle like this
 /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
get the alignment start and stop coordinates on the sequence. I mean
something like hsp->query->start which gives us the alignment start position
on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate
where the alignment starts on the sequence.

~Neeti.

On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
>  Neeti,
>
>
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html:
>
>
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
>
>
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
>
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
>
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
>
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
>
>
> HTH,
>
>
>
> Derek.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
>
>
> How do I run needle specifying that I want the MSF format, on a linux box?
>
> The help doesnt show me any format option. Is there anything available to
>
> pasre MSF format?
>
> Please find an example alignment file attached. Here the seq_of_contig
>
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
>
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
>
> output alignment, how can I parse the result to get this?
>
>
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
> >
>
> >
>
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> >
>
> > > Hi,
>
> > >
>
> > > Does anyone know of a bioperl parser for needle output, basically I
>
> > > won't
>
> > > where the target sequence aligns on the template (i.e. coordinate
>
> > > on the
>
> > > template where the taget aligns).
>
> > >
>
> > > --
>
> > > -Neeti
>
> > > Even my blood says, B positive
>
> >
>
> > I answered this a number of months back:
>
> >
>
> > http://tinyurl.com/yzlbx5
>
> >
>
> > Basically, newer versions of EMBOSS have changed the output for the
>
> > AlignIO::emboss parser (which parses needle).  I don't believe the
>
> > parser has been fixed to deal with that, but Jason has pointed out
>
> > you can use MSF output when running needle, then parse using AlignIO
>
> > with the format set to 'msf'.
>
> >
>
> > chris
>
> >
>
>
>
>
>
>
>
> --
>
> -Neeti
>
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive


From Derek.Fairley at bll.n-i.nhs.uk  Fri Dec 15 09:57:35 2006
From: Derek.Fairley at bll.n-i.nhs.uk (Fairley, Derek)
Date: Fri, 15 Dec 2006 09:57:35 -0000
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
Message-ID: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>

Neeti,

In lieu of a response from a BioPerl guru... why not use Needle to generate your pairwise alignment in fasta format, rather than msf format? The sequence you want should correspond to a single HSP which you can get directly from the fasta alignment with Bio::SearchIO: http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use Bio::AlignIO at all. 

Derek.


-----Original Message-----
From: neeti somaiya [mailto:neetisomaiya at gmail.com] 
Sent: 15 December 2006 05:22
To: Fairley, Derek; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?

Hi,

Thanks a lot for your response.
I ran needle like this 
?/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
It gave me the output in format msf.
But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I get the alignment start and stop coordinates on the sequence. I mean something like hsp->query->start which gives us the alignment start position on query sequence in a blast output when using Bio::SearchIO.
Please help.
Like I explained with an example in my previous mail, I want the coordinate where the alignment starts on the sequence.

~Neeti.
On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
Neeti,
?
>From http://emboss.sourceforge.net/apps/cvs/needle.html :
?
"The results can be output in one of several styles by using the command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of the required format. Some of the alignment formats can cope with an unlimited number of sequences, while others are only for pairs of sequences. 
?
The available multiple alignment format names are: unknown, multiple, simple, fasta, msf, trace, srs 
?
The available pairwise alignment format names are: pair, markx0, markx1, markx2, markx3, markx10, srspair, score 
?
See: http://emboss.sf.net/docs/themes/AlignFormats.html for further information on alignment formats."
?
Not sure based on this whether you can get pairwise alignment in .msf format; can't think of a good reason why not. The BioPerl Align::IO module will allow you to parse alignments in .msf format.
?
HTH,
?
Derek.
?
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
Sent: 14 December 2006 08:03
To: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] needle parser in bioperl?
?
How do I run needle specifying that I want the MSF format, on a linux box?
The help doesnt show me any format option. Is there anything available to
pasre MSF format?
Please find an example alignment file attached. Here the seq_of_contig
aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
(coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
output alignment, how can I parse the result to get this?
?
On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>
>
> On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>
> > Hi,
> >
> > Does anyone know of a bioperl parser for needle output, basically I
> > won't
> > where the target sequence aligns on the template (i.e. coordinate
> > on the
> > template where the taget aligns).
> >
> > --
> > -Neeti
> > Even my blood says, B positive
>
> I answered this a number of months back:
>
> http://tinyurl.com/yzlbx5 
>
> Basically, newer versions of EMBOSS have changed the output for the
> AlignIO::emboss parser (which parses needle).? I don't believe the
> parser has been fixed to deal with that, but Jason has pointed out
> you can use MSF output when running needle, then parse using AlignIO
> with the format set to 'msf'.
>
> chris
>
?
?
?
-- 
-Neeti
Even my blood says, B positive


-- 
-Neeti
Even my blood says, B positive 


From cain at cshl.edu  Fri Dec 15 05:01:36 2006
From: cain at cshl.edu (Scott Cain)
Date: Fri, 15 Dec 2006 00:01:36 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <4581CCEB.20206@sendu.me.uk>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
Message-ID: <1166158897.2569.335.camel@localhost.localdomain>

As much as I would like to take credit for this :-)  Allen Day wrote the
original code, and then Chris Fields tried to fix it so that it actually
worked :-)  I think it would be a good idea to have a validate_terms
option like Bio::FeatureIO::gff.

Scott

On Thu, 2006-12-14 at 17:15 -0500, Sendu Bala wrote:
> Matthew Vaughn wrote:
> > Dear all,
> > 
> > I'm trying to bring some of my code into compliance with the BioPerl 
> > 1.5.2 and am running into some design decisions that I am unclear on. 
> > Can I ask why Bio::SeqFeature::Annotated enforces mandatory checking of 
> > the 'type' against SOFA? It seems to me that this should be optional 
> > behavior as is the case with the Bio::FeatureIO family. I'd be happy to 
> > write the patch if there is any agreement with me on this case.
> 
> Lots of people seem to have worked on it over the years, but perhaps 
> Scott Cain is the person to talk to?
> 
> revision 1.4
> date: 2004/09/25 11:41:29;  author: scain;  state: Exp;  lines: +1 -1
> two things:
>    * adding SOFA as an available ontology to DocumentRegistry.pm
>    * modifying FeatureIO::gff to use SOFA to validate, and to parse 
> Ontology_term
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/021ec42f/attachment.sig>

From neetisomaiya at gmail.com  Fri Dec 15 12:46:08 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 15 Dec 2006 18:16:08 +0530
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
Message-ID: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>

I ran needle like this

/usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out

Please find the output attached.

When I run the following :-

use Bio::SearchIO;

my $io = Bio::SearchIO->new(-file   => "1.out",
                           -format => "fasta" );

while ( my $result = $io->next_result() )
{
       while( my $hit = $result->next_hit)
      {

               print "yes\n";
       }
}


It says :-

-------------------- WARNING ---------------------
MSG: unrecognized FASTA Family report file!
---------------------------------------------------

What should I do?

~Neeti.

On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>
> Neeti,
>
> In lieu of a response from a BioPerl guru... why not use Needle to
> generate your pairwise alignment in fasta format, rather than msf format?
> The sequence you want should correspond to a single HSP which you can get
> directly from the fasta alignment with Bio::SearchIO:
> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need to use
> Bio::AlignIO at all.
>
> Derek.
>
>
> -----Original Message-----
> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
> Sent: 15 December 2006 05:22
> To: Fairley, Derek; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> Hi,
>
> Thanks a lot for your response.
> I ran needle like this
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
> It gave me the output in format msf.
> But now my problem is, if I use Bio::AlignIO module of Bioperl, how can I
> get the alignment start and stop coordinates on the sequence. I mean
> something like hsp->query->start which gives us the alignment start position
> on query sequence in a blast output when using Bio::SearchIO.
> Please help.
> Like I explained with an example in my previous mail, I want the
> coordinate where the alignment starts on the sequence.
>
> ~Neeti.
> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
> Neeti,
>
> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>
> "The results can be output in one of several styles by using the
> command-line qualifier -aformat xxx, where 'xxx' is replaced by the name of
> the required format. Some of the alignment formats can cope with an
> unlimited number of sequences, while others are only for pairs of sequences.
>
> The available multiple alignment format names are: unknown, multiple,
> simple, fasta, msf, trace, srs
>
> The available pairwise alignment format names are: pair, markx0, markx1,
> markx2, markx3, markx10, srspair, score
>
> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
> information on alignment formats."
>
> Not sure based on this whether you can get pairwise alignment in .msf
> format; can't think of a good reason why not. The BioPerl Align::IO module
> will allow you to parse alignments in .msf format.
>
> HTH,
>
> Derek.
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:
> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: 14 December 2006 08:03
> To: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] needle parser in bioperl?
>
> How do I run needle specifying that I want the MSF format, on a linux box?
> The help doesnt show me any format option. Is there anything available to
> pasre MSF format?
> Please find an example alignment file attached. Here the seq_of_contig
> aligns with the reference sequence (i.e. SEQ_1.REF) starting at position
> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate from the
> output alignment, how can I parse the result to get this?
>
> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
> >
> >
> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
> >
> > > Hi,
> > >
> > > Does anyone know of a bioperl parser for needle output, basically I
> > > won't
> > > where the target sequence aligns on the template (i.e. coordinate
> > > on the
> > > template where the taget aligns).
> > >
> > > --
> > > -Neeti
> > > Even my blood says, B positive
> >
> > I answered this a number of months back:
> >
> > http://tinyurl.com/yzlbx5
> >
> > Basically, newer versions of EMBOSS have changed the output for the
> > AlignIO::emboss parser (which parses needle). I don't believe the
> > parser has been fixed to deal with that, but Jason has pointed out
> > you can use MSF output when running needle, then parse using AlignIO
> > with the format set to 'msf'.
> >
> > chris
> >
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>
>
>
> --
> -Neeti
> Even my blood says, B positive
>


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.out
Type: application/octet-stream
Size: 90277 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/34b05d03/attachment-0004.obj>

From jason at bioperl.org  Fri Dec 15 14:28:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:28:13 -0500
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
Message-ID: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>


On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>
>> Hey Chris,
>>
>> My thoughts below.
>>
>>> [Chris]
>>> This could be used to annotate any
>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-you,
>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>> something like this may be of general use for any PrimarySeq
>>> (quality, structure), alignments like NEXUS and Stockholm,
>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>> etc.
>>>
>>> However, this also seems to fall into the category of sequence
>>> annotation.  So, would it be better to have a set of Bio::Annotation
>>> classes used for this purpose?
>>
>>
>> To me, all meta data is equal. That is, your classic Genbank feature
>> annotation and a user's arbitrary meta-tag like "Bob thinks this is a
>> kinase domain" aren't different in kind even if they are different in
>> content.
>>
>> As resequencing projects multiply, the ability to create arbitrary
>> meta tags, attach them to different types of objects, and use those
>> tags to link them together will become desirable, if not essential.
>>
>> Keeping a common interface to all of these meta data types would be
>> advantageous, plus new users won't have to determine whether they
>> need to use Bio::Meta objects or Bio::Annotation objects.
>>
>> So I would argue for all of the meta data types to live "under one
>> roof". Which roof isn't as important. Bio::Annotation, since it
>> already exists for today's meta data, seems like a reasonable choice.
>> (assuming Annotation objects are flexible enough to be extended as
>> you propose)
>>
>> There, and no flames or jibes even. :)
>
> I guess what I want to know is whether there should to be a
> distinction between 'normal' sequence annotation (comments,
> references, and so on) and annotation that could be best described as
> position-specific (like RNA or protein structural annotation).  The
> current meta implementation is for sequence data only; I felt it
> would be nice to have a generic implementation that would be
> applicable to any object data.

my stream-of-consciousness for right now:

I was thinking Bio::Annotation is where this should go - that system  
doesn't have anything about it that makes it explicitly sequence  
related. What we're trying to hammer out here on the Alignment side -  
which fits with your RNA example - is have features, basically  
SeqFeatures - associated with alignments so columns can be annotated  
to cover things like character sets and partitions for phylogenetic  
analyses.  As for data which annotates non-contiguous things like  
RNAstems we may have  to be more creative about that or model it with  
a splitLocation.

So currently we've added code so that an Alignment is-a  
Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
end, with the goal of being able to capture more of the data that can  
be represented in a NEXUS file.

It feels more like a hack than an elegant Meta-data solution, but I  
am totally sure whether the data you are thinking about doing at this  
point, perhaps I need to spend more time thinking about it.
Or are you worried about the idea of whether the semantic mapping of  
the data into features or annotations is confusing users?


From jason at bioperl.org  Fri Dec 15 14:48:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 09:48:32 -0500
Subject: [Bioperl-l] needle parser in bioperl?
In-Reply-To: <764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
References: <764978cf0612142121s547a54dbu54b839f71d171f81@mail.gmail.com>
	<B4B8F9CCEDA9334F819017E5D711AD1C32C66A@bllmail.bll.n-i.nhs.uk>
	<764978cf0612150446r46e5f64tc6bf0b198cf618c5@mail.gmail.com>
Message-ID: <42CB9018-72CD-433E-A42F-152D63D2F584@bioperl.org>

I get the impression you are trying to use the wrong tool for the  
job.  Can you explain a little more generally what you want to do?

Semantically FASTA in Bio::SearchIO is much different from FASTA in  
Bio::AlignIO.  We explain this on the wiki, please have a look on the  
FASTA page.

  do not use Bio::SearchIO to parse multi-fasta alignment output  
Bio::SearchIO is for pairwise alignment reports
  use Bio::AlignIO for a multi-fasta format or for msf - you just  
provide a different field to '-format'.

But none of that is going to help you get start/end for your  
alignment because that is not part of the output format - do the  
experiment of looking at the file and figuring out what are the  
actual fields you want output, if they don't exist then you either  
have a format that won't work for your question, or you will have to  
calculate additional .  If you trying to align transcripts to genome  
please consider tools that are built for it (and referenced on the  
wiki like Sim4, est2genome, exonerate, BLAT).

-jason
On Dec 15, 2006, at 7:46 AM, neeti somaiya wrote:

> I ran needle like this
>
> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat fasta 1.out
>
> Please find the output attached.
>
> When I run the following :-
>
> use Bio::SearchIO;
>
> my $io = Bio::SearchIO->new(-file   => "1.out",
>                           -format => "fasta" );
>
> while ( my $result = $io->next_result() )
> {
>       while( my $hit = $result->next_hit)
>      {
>
>               print "yes\n";
>       }
> }
>
>
> It says :-
>
> -------------------- WARNING ---------------------
> MSG: unrecognized FASTA Family report file!
> ---------------------------------------------------
>
> What should I do?
>
> ~Neeti.
>
> On 12/15/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>>
>> Neeti,
>>
>> In lieu of a response from a BioPerl guru... why not use Needle to
>> generate your pairwise alignment in fasta format, rather than msf  
>> format?
>> The sequence you want should correspond to a single HSP which you  
>> can get
>> directly from the fasta alignment with Bio::SearchIO:
>> http://www.bioperl.org/wiki/Module:Bio::SearchIO. You may not need  
>> to use
>> Bio::AlignIO at all.
>>
>> Derek.
>>
>>
>> -----Original Message-----
>> From: neeti somaiya [mailto:neetisomaiya at gmail.com]
>> Sent: 15 December 2006 05:22
>> To: Fairley, Derek; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> Hi,
>>
>> Thanks a lot for your response.
>> I ran needle like this
>> /usr/local/bin/./needle SEQ_1.REF seq_of_contig1 -aformat msf 1.out
>> It gave me the output in format msf.
>> But now my problem is, if I use Bio::AlignIO module of Bioperl,  
>> how can I
>> get the alignment start and stop coordinates on the sequence. I mean
>> something like hsp->query->start which gives us the alignment  
>> start position
>> on query sequence in a blast output when using Bio::SearchIO.
>> Please help.
>> Like I explained with an example in my previous mail, I want the
>> coordinate where the alignment starts on the sequence.
>>
>> ~Neeti.
>> On 12/14/06, Fairley, Derek <Derek.Fairley at bll.n-i.nhs.uk> wrote:
>> Neeti,
>>
>> From http://emboss.sourceforge.net/apps/cvs/needle.html :
>>
>> "The results can be output in one of several styles by using the
>> command-line qualifier -aformat xxx, where 'xxx' is replaced by  
>> the name of
>> the required format. Some of the alignment formats can cope with an
>> unlimited number of sequences, while others are only for pairs of  
>> sequences.
>>
>> The available multiple alignment format names are: unknown, multiple,
>> simple, fasta, msf, trace, srs
>>
>> The available pairwise alignment format names are: pair, markx0,  
>> markx1,
>> markx2, markx3, markx10, srspair, score
>>
>> See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
>> information on alignment formats."
>>
>> Not sure based on this whether you can get pairwise alignment in .msf
>> format; can't think of a good reason why not. The BioPerl  
>> Align::IO module
>> will allow you to parse alignments in .msf format.
>>
>> HTH,
>>
>> Derek.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:
>> bioperl-l-bounces at lists.open-bio.org] On Behalf Of neeti somaiya
>> Sent: 14 December 2006 08:03
>> To: Chris Fields; bioperl-l
>> Subject: Re: [Bioperl-l] needle parser in bioperl?
>>
>> How do I run needle specifying that I want the MSF format, on a  
>> linux box?
>> The help doesnt show me any format option. Is there anything  
>> available to
>> pasre MSF format?
>> Please find an example alignment file attached. Here the  
>> seq_of_contig
>> aligns with the reference sequence (i.e. SEQ_1.REF) starting at  
>> position
>> (coordinate) 8918 of SEQ_1.REF. I basically want this coordinate  
>> from the
>> output alignment, how can I parse the result to get this?
>>
>> On 12/12/06, Chris Fields <cjfields at uiuc.edu > wrote:
>> >
>> >
>> > On Dec 12, 2006, at 6:14 AM, neeti somaiya wrote:
>> >
>> > > Hi,
>> > >
>> > > Does anyone know of a bioperl parser for needle output,  
>> basically I
>> > > won't
>> > > where the target sequence aligns on the template (i.e. coordinate
>> > > on the
>> > > template where the taget aligns).
>> > >
>> > > --
>> > > -Neeti
>> > > Even my blood says, B positive
>> >
>> > I answered this a number of months back:
>> >
>> > http://tinyurl.com/yzlbx5
>> >
>> > Basically, newer versions of EMBOSS have changed the output for the
>> > AlignIO::emboss parser (which parses needle). I don't believe the
>> > parser has been fixed to deal with that, but Jason has pointed out
>> > you can use MSF output when running needle, then parse using  
>> AlignIO
>> > with the format set to 'msf'.
>> >
>> > chris
>> >
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>>
>>
>> --
>> -Neeti
>> Even my blood says, B positive
>>
>
>
>
> -- 
> -Neeti
> Even my blood says, B positive
> <1.out>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From lubapardo at gmail.com  Fri Dec 15 16:39:11 2006
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 15 Dec 2006 17:39:11 +0100
Subject: [Bioperl-l] NO BLAST
Message-ID: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>

*Hello,*
*I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
**
*I got the following error message: cannot find path to blastall.*
*The code I used is (modified from HOWTObeginners):
*

#! /local/bin/perl -w

#use strict;

use Bio::Seq; use Bio::SeqIO; use Bio::DB::GenBank; use
Bio::Tools::Run::StandAloneBlast;

my $db_object = Bio::DB::GenBank-> new;

#my $seq_ob = $db_object->get_Seq_by_id('NM_004043');

#$seq= Bio::SeqIO->new(-file => "> out.fasta", -format => 'fasta');

#$seq ->write_seq($seq_ob);

#print $seq;

@params = (program =>'blastn',
   database =>'db.fa');

$blast_obj =Bio::Tools::Run::StandAloneBlast->new(@params);


$seq_obj = Bio::Seq->new(-id =>"testquery",
   -seq =>"TTTAAATATATTTTGAAGTATAGATTATATGTT");

$report_obj = $blast_obj->blastall($seq_obj);

$result_obj =$report_obj->next_result;

print $result_obj->num_hits;

*Whether I create a sequence the novo or retrieve one from internet I got
the same message.*


From cjfields at uiuc.edu  Fri Dec 15 17:23:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:23:27 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<5DB6475C-109D-406D-B4BA-D2248AE3F987@wustl.edu>
	<9F172B90-B065-4A42-A54F-140360132B3B@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
Message-ID: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>


On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:

>
> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>
>>
>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>
>>> Hey Chris,
>>>
>>> My thoughts below.
>>>
>>>> [Chris]
>>>> This could be used to annotate any
>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have- 
>>>> you,
>>>> maybe in a collection (similar to AnnotationCollection).  I thought
>>>> something like this may be of general use for any PrimarySeq
>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
>>>> etc.
>>>>
>>>> However, this also seems to fall into the category of sequence
>>>> annotation.  So, would it be better to have a set of  
>>>> Bio::Annotation
>>>> classes used for this purpose?
>>>
>>>
>>> To me, all meta data is equal. That is, your classic Genbank feature
>>> annotation and a user's arbitrary meta-tag like "Bob thinks this  
>>> is a
>>> kinase domain" aren't different in kind even if they are  
>>> different in
>>> content.
>>>
>>> As resequencing projects multiply, the ability to create arbitrary
>>> meta tags, attach them to different types of objects, and use those
>>> tags to link them together will become desirable, if not essential.
>>>
>>> Keeping a common interface to all of these meta data types would be
>>> advantageous, plus new users won't have to determine whether they
>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>
>>> So I would argue for all of the meta data types to live "under one
>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>> already exists for today's meta data, seems like a reasonable  
>>> choice.
>>> (assuming Annotation objects are flexible enough to be extended as
>>> you propose)
>>>
>>> There, and no flames or jibes even. :)
>>
>> I guess what I want to know is whether there should to be a
>> distinction between 'normal' sequence annotation (comments,
>> references, and so on) and annotation that could be best described as
>> position-specific (like RNA or protein structural annotation).  The
>> current meta implementation is for sequence data only; I felt it
>> would be nice to have a generic implementation that would be
>> applicable to any object data.
>
> my stream-of-consciousness for right now:
>
> I was thinking Bio::Annotation is where this should go - that  
> system doesn't have anything about it that makes it explicitly  
> sequence related. What we're trying to hammer out here on the  
> Alignment side - which fits with your RNA example - is have  
> features, basically SeqFeatures - associated with alignments so  
> columns can be annotated to cover things like character sets and  
> partitions for phylogenetic analyses.  As for data which annotates  
> non-contiguous things like RNAstems we may have  to be more  
> creative about that or model it with a splitLocation.
>
> So currently we've added code so that an Alignment is-a  
> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this  
> end, with the goal of being able to capture more of the data that  
> can be represented in a NEXUS file.
>
> It feels more like a hack than an elegant Meta-data solution, but I  
> am totally sure whether the data you are thinking about doing at  
> this point, perhaps I need to spend more time thinking about it.
> Or are you worried about the idea of whether the semantic mapping  
> of the data into features or annotations is confusing users?

Sorry in advance for the longish response here...

My original thought was to have a generic abstract class capable of  
positionally describing data in any another class, similar to  
Heikki's Bio::Seq::MetaI but not constrained to sequence data only.   
Implementing classes would be capable of having different data  
structures based on their use (simple string, array, AoA, AoH, AoO).   
One MetaCollection class to contain them all in a tag-like system, so  
you could have mixed data types describe the same object.  The latter  
Collection class is so similar to AnnotationCollection that I agree  
Bio::Annotation would be the best place for this.

The way I reconfigured Stockholm alignment parsing/writing is to use  
Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is  
capable of holding a sequence and several meta strings, stored as  
tags or 'names'.  However, there is no Meta object for alignments  
(for RNA/protein structure consensus and other Rfam/Pfam markup); I  
hacked around this by using a Bio::Seq::Meta w/o a seq, but I would  
rather have a generic Meta object independent of the sequence cruft.

So for this partial Pfam alignment,

Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
#=GR Q92SV1_RHIME/122-299 pAS .........................
Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
#=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
#=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
#=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
#=GC SA_cons                 03002200312...1312414..676
#=GC seq_cons                luhhLuhsRpl...hthppth..+pG
//

'#=GC' lines would be in generic meta string objects in the  
alignment, while '#=GR' tags would be in similar meta objects in the  
relevant sequences.  As long as both aren't AnnotatableI this isn't  
an issue.

Similarly, NEXUS files which contained any position-based values  
could hold a meta string/array object in a similar tag.

The basic scheme is:

                     |--String
                     |
Annotation::Meta----|--Array
                     |
                     |--HorriblyComplexDataStruct

Then I started thinking about where this could be applied, and  
whether a true Meta object needs to be constrained only to describing  
position-based data.  This somewhat relates to this bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=1825

which seems to need a simple but unconstrained hash-of-arrays-based  
meta object.

Then my head appropriately exploded...

Hope everything is going well at the hackathon!  Looks like some  
interesting stuff coming out of it.

chris


From cjfields at uiuc.edu  Fri Dec 15 17:49:45 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 11:49:45 -0600
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory
	type	checking
In-Reply-To: <1166158897.2569.335.camel@localhost.localdomain>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
Message-ID: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>

On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:

> As much as I would like to take credit for this :-)  Allen Day  
> wrote the
> original code, and then Chris Fields tried to fix it so that it  
> actually
> worked :-)  I think it would be a good idea to have a validate_terms
> option like Bio::FeatureIO::gff.
>
> Scott

I did ?!?  I committed a bug fix a while back:

Revision 1.34 / (view) - annotate - [select for diffs] ,
Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
Branch: MAIN
CVS Tags: branch-experimental
Branch point for: branch-1-5-2
Changes since 1.33: +155 -33 lines
Diff to previous 1.33

Bug 2026; Robert's enhancements

To tell the truth I don't know if this is where the mandatory checks  
were added in; I'm not too familiar with SeqFeature::Annotation yet.

I agree with Scott (and Matthew) that SOFA checks should be  
optional.  Matthew, can you write up a patch and maybe some tests?

chris


From stewarta at nmrc.navy.mil  Thu Dec 14 23:30:11 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 14 Dec 2006 18:30:11 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
Message-ID: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>

I'm getting the following exception...

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
STACK: Error::throw
STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/ 
SearchIO/blast.pm:1172
STACK: main::process_reports ./new_blast_script.pl:254
STACK: ./new_blast_script.pl:132
-----------------------------------------------------------


next_result is a pretty dense chunk of code to decipher.  I was  
wondering if anyone more familiar with that code might know what the  
"no data for midline $_" exception is referring to?

For context:

    1161                if( /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+ 
(\-?\d+)/ ) {
    1162                    my ($full,$type,$start,$str,$end) = ($1, 
$2,$3,$4,$5);
    1163                    if( $str eq '-' ) {
    1164                        $i = 3 if $type eq 'Sbjct';
    1165                    } else {
    1166                        $data{$type} = $str;
    1167                    }
    1168                    $len = length($full);
    1169                    $self->{"\_$type"}->{'begin'} = $start  
unless $self->{"_$type"}->{'begin'};
    1170                    $self->{"\_$type"}->{'end'} = $end;
    1171                } else {
    1172                    $self->throw("no data for midline $_")
    1173                        unless (defined $_ && defined $len);
    1174                    $data{'Mid'} = substr($_,$len);
    1175                }


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason at bioperl.org  Fri Dec 15 18:56:13 2006
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Dec 2006 13:56:13 -0500
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
Message-ID: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>

It means it is expecting alignment block of data and there is none  
(or there is none in the context it is expecting it) - so something  
is wrong with the report as it gets tripped up.

I'm not sure reading the code is going to help you - what someone  
will have to do is figure out what is different about this report  
than reports that do work for the parser.
You'll do better if you just provide an example report that is  
failing as a bug report.

Providing the version of BLAST you are using and version of bioperl  
will help.  I seem to remember NCBI changing the BLAST text format so  
that will break the parser if it is a significant change.

As has been mentioned in the past, this playing cat and mouse with  
format changes means things will periodically break. If you need rock- 
solid always going to work, I guess the XML is better route to go.

-jason
On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:

> I'm getting the following exception...
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm:328
> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
> SearchIO/blast.pm:1172
> STACK: main::process_reports ./new_blast_script.pl:254
> STACK: ./new_blast_script.pl:132
> -----------------------------------------------------------
>
>
> next_result is a pretty dense chunk of code to decipher.  I was
> wondering if anyone more familiar with that code might know what the
> "no data for midline $_" exception is referring to?
>
>
> --
> Andrew Stewart
> Research Assistant, Genomics Team
> Navy Medical Research Center (NMRC)
> Biological Defense Research Directorate (BDRD)
> BDRD Annex
> 12300 Washington Avenue, 2nd Floor
> Rockville, MD 20852
>
> email: stewarta at nmrc.navy.mil
> phone: 301-231-6700 Ext 270
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Dec 15 19:21:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 15 Dec 2006 13:21:32 -0600
Subject: [Bioperl-l] Bio::SearchIO::blast::next_result exception thrown
In-Reply-To: <B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
References: <968A2A44-82C5-4505-8F50-ABC4D57171F3@nmrc.navy.mil>
	<B07BB616-28A3-435A-9C43-38CEF0F01E53@bioperl.org>
Message-ID: <6A0D17FA-CB98-4937-998E-11B87FB9CBBD@uiuc.edu>


On Dec 15, 2006, at 12:56 PM, Jason Stajich wrote:

> It means it is expecting alignment block of data and there is none
> (or there is none in the context it is expecting it) - so something
> is wrong with the report as it gets tripped up.
>
> I'm not sure reading the code is going to help you - what someone
> will have to do is figure out what is different about this report
> than reports that do work for the parser.
> You'll do better if you just provide an example report that is
> failing as a bug report.
>
> Providing the version of BLAST you are using and version of bioperl
> will help.  I seem to remember NCBI changing the BLAST text format so
> that will break the parser if it is a significant change.
>
> As has been mentioned in the past, this playing cat and mouse with
> format changes means things will periodically break. If you need rock-
> solid always going to work, I guess the XML is better route to go.
>
> -jason

I agree that XML is the only reliable way to go, though I have been  
reading on the BioPython group about some issues with newer (2.2.13  
or greater) BLAST XML output when reports with multiple BLAST  
queries.  Don't know if this affects Bioperl or not.

As for the 'midline' error, there was a similar error a while back  
(fixed for the 1.5.2 release) that had to do with extra lines in the  
alignment section in some BLAST reports.  Unless we have a demo BLAST  
report and sample code we can't do much about it (we need to  
reproduce the error in order to fix it), so the best thing to do it  
file a bug report.

chris

> On Dec 14, 2006, at 6:30 PM, Andrew Stewart wrote:
>
>> I'm getting the following exception...
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: no data for midline     Posted date:  Dec 14, 2006  2:52 PM
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw /sw/lib/perl5/5.8.6/Bio/Root/Root.pm: 
>> 328
>> STACK: Bio::SearchIO::blast::next_result /sw/lib/perl5/5.8.6/Bio/
>> SearchIO/blast.pm:1172
>> STACK: main::process_reports ./new_blast_script.pl:254
>> STACK: ./new_blast_script.pl:132
>> -----------------------------------------------------------
>>
>>
>> next_result is a pretty dense chunk of code to decipher.  I was
>> wondering if anyone more familiar with that code might know what the
>> "no data for midline $_" exception is referring to?
>>
>>
>> --
>> Andrew Stewart
>> Research Assistant, Genomics Team
>> Navy Medical Research Center (NMRC)
>> Biological Defense Research Directorate (BDRD)
>> BDRD Annex
>> 12300 Washington Avenue, 2nd Floor
>> Rockville, MD 20852
>>
>> email: stewarta at nmrc.navy.mil
>> phone: 301-231-6700 Ext 270
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From vaughn at cshl.edu  Fri Dec 15 18:05:47 2006
From: vaughn at cshl.edu (Matthew Vaughn)
Date: Fri, 15 Dec 2006 13:05:47 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and mandatory type
	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <ed625e0e0612151005o2641f019ndb5cf0ac6582e2d6@mail.gmail.com>

Yes, I will. I am working on it today. It's a little more complicated
to fix this than I expected because SeqFeature::Annotation->type()
returns a Bio::AnnotationI rather than a simple scalar like it used
to.

On 12/15/06, Chris Fields <cjfields at uiuc.edu> wrote:
> On Dec 14, 2006, at 11:01 PM, Scott Cain wrote:
>
> > As much as I would like to take credit for this :-)  Allen Day
> > wrote the
> > original code, and then Chris Fields tried to fix it so that it
> > actually
> > worked :-)  I think it would be a good idea to have a validate_terms
> > option like Bio::FeatureIO::gff.
> >
> > Scott
>
> I did ?!?  I committed a bug fix a while back:
>
> Revision 1.34 / (view) - annotate - [select for diffs] ,
> Sun Jul 23 18:00:50 2006 UTC (4 months, 3 weeks ago) by cjfields
> Branch: MAIN
> CVS Tags: branch-experimental
> Branch point for: branch-1-5-2
> Changes since 1.33: +155 -33 lines
> Diff to previous 1.33
>
> Bug 2026; Robert's enhancements
>
> To tell the truth I don't know if this is where the mandatory checks
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
>
> I agree with Scott (and Matthew) that SOFA checks should be
> optional.  Matthew, can you write up a patch and maybe some tests?
>
> chris
>
>
>
>


From valiente at lsi.upc.edu  Sat Dec 16 00:45:27 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Sat, 16 Dec 2006 01:45:27 +0100
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <4577EFD3.7090904@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
Message-ID: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>

> I don't think that can be true. Your error message contains 'Must  
> supply
> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>
> If you uninstall the fink installation and install 1.5.2 using cpan  
> (with root privileges by going sudo cpan) that should at least get  
> rid of the error messages...
>
>
>> The tree is not correct (I've parsed it from R to have a double
>> check) but don't know yet what the problem is with it.
>
> ... But if the tree is wrong anyway... Let me know what you find out.

I've uninstalled the fink installation and used the cvs instead, and  
the error message is gone. However, on a larger set of 190 species,  
which are all present in the NCBI taxonomy, the resulting tree has  
only 178 taxa. I suspect, something must be wrong with the  
merge_lineage method in the major rewrite of the taxonomy2tree  
script. Can someone please check this? I'm attaching the 190 species  
call to the script. Thanks,

Gabriel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fetch-bork.sh
Type: application/octet-stream
Size: 7378 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061216/5e392593/attachment-0004.obj>

From lincoln.stein at gmail.com  Fri Dec 15 16:02:27 2006
From: lincoln.stein at gmail.com (Lincoln Stein)
Date: Fri, 15 Dec 2006 11:02:27 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
Message-ID: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>

This is very embarassing for me, particularly since I spent a lot of time
validating that Bio::Graphics was working properly before the 1.5.2 release
went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

Lincoln

On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>
> Hi All,
>
> I'm afraid that the xyplot glyph that is in the recent bioperl release has
> an error that causes the content to be printed to the right of the correct
> position. Unfortunately this wasn't caught before the release because the
> glyph was only tested on very large (whole genome) features.
>
> You will need to do a CVS update to get a fixed version from bioperl-live.
> A future bugfix release of gbrowse will patch this glyph for you
> automatically.
>
> Lincoln
>
> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
> >
> > Hi,
> > I'm having a problem getting features and an xyplot properly aligned in
> > Gbrowse.  For example, see this page:
> >
> > http://tinyurl.com/ylbq3q
> >
> > The feature in the "CENPK SNPs" track should actually be around the peak
> > of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
> > feature is at position 79, and the xyplot axes and data should span from
> > 61 - 95.  However, as you can see, the data in the xyplot are oddly
> > separated from the axes (which seem to be in the correct place), with the
> > data shifted over to about position 120-155.
> > This occurs elsewhere, not just at the ends of the chromosomes.
> >
> > When I zoom to ~80 bp, all is well, see:
> >
> > http://tinyurl.com/yzav8k
> >
> > The relevant snippets from the GFF and the config files are below.
> >
> > Thanks!
> > Kara
> >
> > GFF:
> >
> > chrI SNPScanner
> > CENPK_GRAPH 61 95 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_CALL 79 79 41.9883 . . ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 61 61 2.24506 . . ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
> > is 2.24506
> > chrI SNPScanner
> > CENPK_SCORE 62 62 3.26837 . . ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
> > is 3.26837
> > chrI SNPScanner
> > CENPK_SCORE 63 63 1.39938 . . ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
> > is 1.39938
> > chrI SNPScanner
> > CENPK_SCORE 64 64 1.4039 . . ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
> > is 1.4039
> > chrI SNPScanner
> > CENPK_SCORE 65 65 9.16134 . . ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
> > is 9.16134
> > chrI SNPScanner
> > CENPK_SCORE 66 66 10.1413 . . ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
> > is 10.1413
> > chrI SNPScanner
> > CENPK_SCORE 67 67 12.9256 . . ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
> > is 12.9256
> > chrI SNPScanner
> > CENPK_SCORE 68 68 13.195 . . ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
> > is 13.195
> > chrI SNPScanner
> > CENPK_SCORE 69 69 22.7127 . . ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
> > is 22.7127
> > chrI SNPScanner
> > CENPK_SCORE 70 70 23.8289 . . ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
> > is 23.8289
> > chrI SNPScanner
> > CENPK_SCORE 71 71 21.9123 . . ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
> > is 21.9123
> > chrI SNPScanner
> > CENPK_SCORE 72 72 28.3344 . . ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
> > is 28.3344
> > chrI SNPScanner
> > CENPK_SCORE 73 73 35.0436 . . ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
> > is 35.0436
> > chrI SNPScanner
> > CENPK_SCORE 74 74 37.361 . . ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
> > is 37.361
> > chrI SNPScanner
> > CENPK_SCORE 75 75 39.5408 . . ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
> > is 39.5408
> > chrI SNPScanner
> > CENPK_SCORE 76 76 28.2008 . . ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
> > is 28.2008
> > chrI SNPScanner
> > CENPK_SCORE 77 77 32.6254 . . ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
> > is 32.6254
> > chrI SNPScanner
> > CENPK_SCORE 78 78 36.0832 . . ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
> > is 36.0832
> > chrI SNPScanner
> > CENPK_SCORE 79 79 41.9883 . . ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
> > is 41.9883
> > chrI SNPScanner
> > CENPK_SCORE 80 80 32.1205 . . ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
> > is 32.1205
> > chrI SNPScanner
> > CENPK_SCORE 81 81 41.3048 . . ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
> > is 41.3048
> > chrI SNPScanner
> > CENPK_SCORE 82 82 30.7975 . . ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
> > is 30.7975
> > chrI SNPScanner
> > CENPK_SCORE 83 83 29.4282 . . ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
> > is 29.4282
> > chrI SNPScanner
> > CENPK_SCORE 84 84 35.3586 . . ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
> > is 35.3586
> > chrI SNPScanner
> > CENPK_SCORE 85 85 34.1426 . . ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
> > is 34.1426
> > chrI SNPScanner
> > CENPK_SCORE 86 86 30.2966 . . ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
> > is 30.2966
> > chrI SNPScanner
> > CENPK_SCORE 87 87 17.8402 . . ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
> > is 17.8402
> > chrI SNPScanner
> > CENPK_SCORE 88 88 15.2637 . . ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
> > is 15.2637
> > chrI SNPScanner
> > CENPK_SCORE 89 89 12.657 . . ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
> > is 12.657
> > chrI SNPScanner
> > CENPK_SCORE 90 90 10.2033 . . ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
> > is 10.2033
> > chrI SNPScanner
> > CENPK_SCORE 91 91 9.40143 . . ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
> > is 9.40143
> > chrI SNPScanner
> > CENPK_SCORE 92 92 6.56273 . . ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
> > is 6.56273
> > chrI SNPScanner
> > CENPK_SCORE 93 93 3.66211 . . ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
> > is 3.66211
> > chrI SNPScanner
> > CENPK_SCORE 94 94 0.394194 . . ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
> > is 0.394194
> >
> > CONFIG:
> >
> >
> > GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
> >
> > [CENPK_all_scores_graph]
> > feature = GRAPH_CENPK:SNPScanner
> > glyph = xyplot
> > graph_type = boxes
> > fgcolor = purple
> > bgcolor = purple
> > height = 100
> > min_score = 0
> > max_score = 110
> > label = 0
> > key = CENPK prediction signal
> > link =
> > category = SNPs: signal graphs
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> >
> > _______________________________________________
> > Gmod-gbrowse mailing list
> > Gmod-gbrowse at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Sat Dec 16 06:10:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:10:07 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>
	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <70A5E333-8CF5-49D3-84AC-7A6A02791B5C@uiuc.edu>

We could feasibly have regular point releases of the 1.5 dev. series  
for bug fixes; I guess it just depends on how often these should come  
out and what critical tests must pass for a release to go forward.   
Sendu's already done a ton of work towards getting BioPerl switched  
over to Module::Build and Test::More, and fixing bugs.  As Hilmar has  
pointed out in the past, this is a developer's series, so not every  
test needs to pass before a release goes out.

When would you like this to go out?

chris

On Dec 15, 2006, at 10:02 AM, Lincoln Stein wrote:

> This is very embarassing for me, particularly since I spent a lot  
> of time
> validating that Bio::Graphics was working properly before the 1.5.2  
> release
> went out. How long before there is a 1.5.3 release? How about a  
> 1.5.2.1release?
>
> Lincoln
>
> On 12/14/06, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>>
>> Hi All,
>>
>> I'm afraid that the xyplot glyph that is in the recent bioperl  
>> release has
>> an error that causes the content to be printed to the right of the  
>> correct
>> position. Unfortunately this wasn't caught before the release  
>> because the
>> glyph was only tested on very large (whole genome) features.
>>
>> You will need to do a CVS update to get a fixed version from  
>> bioperl-live.
>> A future bugfix release of gbrowse will patch this glyph for you
>> automatically.
>>
>> Lincoln
>>
>> On 12/12/06, Kara Dolinski <kara at genomics.princeton.edu> wrote:
>>>
>>> Hi,
>>> I'm having a problem getting features and an xyplot properly  
>>> aligned in
>>> Gbrowse.  For example, see this page:
>>>
>>> http://tinyurl.com/ylbq3q
>>>
>>> The feature in the "CENPK SNPs" track should actually be around  
>>> the peak
>>> of the graph in the "CENPK prediction signal" xyplot  ie. the SNP
>>> feature is at position 79, and the xyplot axes and data should  
>>> span from
>>> 61 - 95.  However, as you can see, the data in the xyplot are oddly
>>> separated from the axes (which seem to be in the correct place),  
>>> with the
>>> data shifted over to about position 120-155.
>>> This occurs elsewhere, not just at the ends of the chromosomes.
>>>
>>> When I zoom to ~80 bp, all is well, see:
>>>
>>> http://tinyurl.com/yzav8k
>>>
>>> The relevant snippets from the GFF and the config files are below.
>>>
>>> Thanks!
>>> Kara
>>>
>>> GFF:
>>>
>>> chrI SNPScanner
>>> CENPK_GRAPH 61 95 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_CALL 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=CENPK_peak0;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 61 61 2.24506 . .  
>>> ID=CENPK_all_peaks;Name=chrI61;PEAK=peak0;Note=score
>>> is 2.24506
>>> chrI SNPScanner
>>> CENPK_SCORE 62 62 3.26837 . .  
>>> ID=CENPK_all_peaks;Name=chrI62;PEAK=peak0;Note=score
>>> is 3.26837
>>> chrI SNPScanner
>>> CENPK_SCORE 63 63 1.39938 . .  
>>> ID=CENPK_all_peaks;Name=chrI63;PEAK=peak0;Note=score
>>> is 1.39938
>>> chrI SNPScanner
>>> CENPK_SCORE 64 64 1.4039 . .  
>>> ID=CENPK_all_peaks;Name=chrI64;PEAK=peak0;Note=score
>>> is 1.4039
>>> chrI SNPScanner
>>> CENPK_SCORE 65 65 9.16134 . .  
>>> ID=CENPK_all_peaks;Name=chrI65;PEAK=peak0;Note=score
>>> is 9.16134
>>> chrI SNPScanner
>>> CENPK_SCORE 66 66 10.1413 . .  
>>> ID=CENPK_all_peaks;Name=chrI66;PEAK=peak0;Note=score
>>> is 10.1413
>>> chrI SNPScanner
>>> CENPK_SCORE 67 67 12.9256 . .  
>>> ID=CENPK_all_peaks;Name=chrI67;PEAK=peak0;Note=score
>>> is 12.9256
>>> chrI SNPScanner
>>> CENPK_SCORE 68 68 13.195 . .  
>>> ID=CENPK_all_peaks;Name=chrI68;PEAK=peak0;Note=score
>>> is 13.195
>>> chrI SNPScanner
>>> CENPK_SCORE 69 69 22.7127 . .  
>>> ID=CENPK_all_peaks;Name=chrI69;PEAK=peak0;Note=score
>>> is 22.7127
>>> chrI SNPScanner
>>> CENPK_SCORE 70 70 23.8289 . .  
>>> ID=CENPK_all_peaks;Name=chrI70;PEAK=peak0;Note=score
>>> is 23.8289
>>> chrI SNPScanner
>>> CENPK_SCORE 71 71 21.9123 . .  
>>> ID=CENPK_all_peaks;Name=chrI71;PEAK=peak0;Note=score
>>> is 21.9123
>>> chrI SNPScanner
>>> CENPK_SCORE 72 72 28.3344 . .  
>>> ID=CENPK_all_peaks;Name=chrI72;PEAK=peak0;Note=score
>>> is 28.3344
>>> chrI SNPScanner
>>> CENPK_SCORE 73 73 35.0436 . .  
>>> ID=CENPK_all_peaks;Name=chrI73;PEAK=peak0;Note=score
>>> is 35.0436
>>> chrI SNPScanner
>>> CENPK_SCORE 74 74 37.361 . .  
>>> ID=CENPK_all_peaks;Name=chrI74;PEAK=peak0;Note=score
>>> is 37.361
>>> chrI SNPScanner
>>> CENPK_SCORE 75 75 39.5408 . .  
>>> ID=CENPK_all_peaks;Name=chrI75;PEAK=peak0;Note=score
>>> is 39.5408
>>> chrI SNPScanner
>>> CENPK_SCORE 76 76 28.2008 . .  
>>> ID=CENPK_all_peaks;Name=chrI76;PEAK=peak0;Note=score
>>> is 28.2008
>>> chrI SNPScanner
>>> CENPK_SCORE 77 77 32.6254 . .  
>>> ID=CENPK_all_peaks;Name=chrI77;PEAK=peak0;Note=score
>>> is 32.6254
>>> chrI SNPScanner
>>> CENPK_SCORE 78 78 36.0832 . .  
>>> ID=CENPK_all_peaks;Name=chrI78;PEAK=peak0;Note=score
>>> is 36.0832
>>> chrI SNPScanner
>>> CENPK_SCORE 79 79 41.9883 . .  
>>> ID=CENPK_all_peaks;Name=chrI79;PEAK=peak0;Note=score
>>> is 41.9883
>>> chrI SNPScanner
>>> CENPK_SCORE 80 80 32.1205 . .  
>>> ID=CENPK_all_peaks;Name=chrI80;PEAK=peak0;Note=score
>>> is 32.1205
>>> chrI SNPScanner
>>> CENPK_SCORE 81 81 41.3048 . .  
>>> ID=CENPK_all_peaks;Name=chrI81;PEAK=peak0;Note=score
>>> is 41.3048
>>> chrI SNPScanner
>>> CENPK_SCORE 82 82 30.7975 . .  
>>> ID=CENPK_all_peaks;Name=chrI82;PEAK=peak0;Note=score
>>> is 30.7975
>>> chrI SNPScanner
>>> CENPK_SCORE 83 83 29.4282 . .  
>>> ID=CENPK_all_peaks;Name=chrI83;PEAK=peak0;Note=score
>>> is 29.4282
>>> chrI SNPScanner
>>> CENPK_SCORE 84 84 35.3586 . .  
>>> ID=CENPK_all_peaks;Name=chrI84;PEAK=peak0;Note=score
>>> is 35.3586
>>> chrI SNPScanner
>>> CENPK_SCORE 85 85 34.1426 . .  
>>> ID=CENPK_all_peaks;Name=chrI85;PEAK=peak0;Note=score
>>> is 34.1426
>>> chrI SNPScanner
>>> CENPK_SCORE 86 86 30.2966 . .  
>>> ID=CENPK_all_peaks;Name=chrI86;PEAK=peak0;Note=score
>>> is 30.2966
>>> chrI SNPScanner
>>> CENPK_SCORE 87 87 17.8402 . .  
>>> ID=CENPK_all_peaks;Name=chrI87;PEAK=peak0;Note=score
>>> is 17.8402
>>> chrI SNPScanner
>>> CENPK_SCORE 88 88 15.2637 . .  
>>> ID=CENPK_all_peaks;Name=chrI88;PEAK=peak0;Note=score
>>> is 15.2637
>>> chrI SNPScanner
>>> CENPK_SCORE 89 89 12.657 . .  
>>> ID=CENPK_all_peaks;Name=chrI89;PEAK=peak0;Note=score
>>> is 12.657
>>> chrI SNPScanner
>>> CENPK_SCORE 90 90 10.2033 . .  
>>> ID=CENPK_all_peaks;Name=chrI90;PEAK=peak0;Note=score
>>> is 10.2033
>>> chrI SNPScanner
>>> CENPK_SCORE 91 91 9.40143 . .  
>>> ID=CENPK_all_peaks;Name=chrI91;PEAK=peak0;Note=score
>>> is 9.40143
>>> chrI SNPScanner
>>> CENPK_SCORE 92 92 6.56273 . .  
>>> ID=CENPK_all_peaks;Name=chrI92;PEAK=peak0;Note=score
>>> is 6.56273
>>> chrI SNPScanner
>>> CENPK_SCORE 93 93 3.66211 . .  
>>> ID=CENPK_all_peaks;Name=chrI93;PEAK=peak0;Note=score
>>> is 3.66211
>>> chrI SNPScanner
>>> CENPK_SCORE 94 94 0.394194 . .  
>>> ID=CENPK_all_peaks;Name=chrI94;PEAK=peak0;Note=score
>>> is 0.394194
>>>
>>> CONFIG:
>>>
>>>
>>> GRAPH_CENPK{CENPK_SCORE/CENPK_GRAPH}
>>>
>>> [CENPK_all_scores_graph]
>>> feature = GRAPH_CENPK:SNPScanner
>>> glyph = xyplot
>>> graph_type = boxes
>>> fgcolor = purple
>>> bgcolor = purple
>>> height = 100
>>> min_score = 0
>>> max_score = 110
>>> label = 0
>>> key = CENPK prediction signal
>>> link =
>>> category = SNPs: signal graphs
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -----
>>> Take Surveys. Earn Cash. Influence the Future of IT
>>> Join SourceForge.net's Techsay panel and you'll get the chance to  
>>> share
>>> your
>>> opinions on IT & business topics through brief surveys - and earn  
>>> cash
>>> http://www.techsay.com/default.php? 
>>> page=join.php&p=sourceforge&CID=DEVDEV
>>>
>>>
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>>
>> --
>> Lincoln D. Stein
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724
>> (516) 367-8380 (voice)
>> (516) 367-8389 (fax)
>> FOR URGENT MESSAGES & SCHEDULING,
>> PLEASE CONTACT MY ASSISTANT,
>> SANDRA MICHELSEN, AT michelse at cshl.edu
>>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Dec 16 06:28:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 00:28:47 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on 110
	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>
	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>
	<4577E4A2.5090303@sendu.me.uk>
	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>
	<4577EAAF.7030509@sendu.me.uk>
	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>
	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>


On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:

>> I don't think that can be true. Your error message contains 'Must  
>> supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using  
>> cpan (with root privileges by going sudo cpan) that should at  
>> least get rid of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
>
> I've uninstalled the fink installation and used the cvs instead,  
> and the error message is gone. However, on a larger set of 190  
> species, which are all present in the NCBI taxonomy, the resulting  
> tree has only 178 taxa. I suspect, something must be wrong with the  
> merge_lineage method in the major rewrite of the taxonomy2tree  
> script. Can someone please check this? I'm attaching the 190  
> species call to the script. Thanks,
>
> Gabriel

I can confirm that.  It is definitely dropping them in merge_lineage 
(); if you add a call to get_leaf_nodes to check how many are present  
after each merge_lineage() call, you can see it dropping nodes along  
the trace.

in taxonomy2tree.pl:

my $ct;
my ($treect, $mergect) = 0;
for my $name (@species) {
   my $ncbi_id = $db->get_taxonid($name);
   if ($ncbi_id) {
     #print "Species: $name\n\tTaxID: $ncbi_id\n";
     #$ids{$ncbi_id}++;
     my $node = $db->get_taxon(-taxonid => $ncbi_id);

     if ($tree) {
       $tree->merge_lineage($node);

     }
     else {
       $tree = Bio::Tree::Tree->new(-node => $node);
     }
     printf("%-3d: Nodes: %-4d\n",$ct,scalar($tree->get_leaf_nodes));
   }
   else {
     warn "no NCBI Taxonomy node for species ",$name,"\n";
   }
   $ct++;
}

chris


From bix at sendu.me.uk  Sat Dec 16 14:37:49 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:37:49 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
Message-ID: <458404BD.8030908@sendu.me.uk>

Lincoln Stein wrote:
> This is very embarassing for me, particularly since I spent a lot of time
> validating that Bio::Graphics was working properly before the 1.5.2 release
> went out. How long before there is a 1.5.3 release? How about a 1.5.2.1release?

I'm happy to try a point release for critical bug fixes. Why don't you 
commit the necessary fixes to branch-1-5-2 and let me know when you're 
happy, and I'll do 1.5.2.1.


Cheers,
Sendu.


From bix at sendu.me.uk  Sat Dec 16 14:47:57 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 14:47:57 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
Message-ID: <4584071D.3070005@sendu.me.uk>

Gabriel Valiente wrote:
>> I don't think that can be true. Your error message contains 'Must supply
>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>
>> If you uninstall the fink installation and install 1.5.2 using cpan 
>> (with root privileges by going sudo cpan) that should at least get rid 
>> of the error messages...
>>
>>
>>> The tree is not correct (I've parsed it from R to have a double
>>> check) but don't know yet what the problem is with it.
>>
>> ... But if the tree is wrong anyway... Let me know what you find out.
> 
> I've uninstalled the fink installation and used the cvs instead, and the 
> error message is gone. However, on a larger set of 190 species, which 
> are all present in the NCBI taxonomy, the resulting tree has only 178 
> taxa. I suspect, something must be wrong with the merge_lineage method 
> in the major rewrite of the taxonomy2tree script. Can someone please 
> check this? I'm attaching the 190 species call to the script. Thanks,

Ok, I'll look into it. You're also welcome to see if you can take your 
own code from your original taxonomy2tree script and see if you can 
merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with 
your algorithms to get it working correctly. Indeed, does your original 
version of the script work on this data set?


Cheers,
Sendu.


From cjfields at uiuc.edu  Sat Dec 16 15:18:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 16 Dec 2006 09:18:50 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4584071D.3070005@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>
	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<4584071D.3070005@sendu.me.uk>
Message-ID: <6AE33842-B2E7-4E9B-B80D-68A058045818@uiuc.edu>


On Dec 16, 2006, at 8:47 AM, Sendu Bala wrote:

> Gabriel Valiente wrote:
>>> I don't think that can be true. Your error message contains 'Must  
>>> supply
>>> a Bio::Taxon'. Bio::Taxon only exists in 1.5.2 (or cvs live).
>>>
>>> If you uninstall the fink installation and install 1.5.2 using cpan
>>> (with root privileges by going sudo cpan) that should at least  
>>> get rid
>>> of the error messages...
>>>
>>>
>>>> The tree is not correct (I've parsed it from R to have a double
>>>> check) but don't know yet what the problem is with it.
>>>
>>> ... But if the tree is wrong anyway... Let me know what you find  
>>> out.
>>
>> I've uninstalled the fink installation and used the cvs instead,  
>> and the
>> error message is gone. However, on a larger set of 190 species, which
>> are all present in the NCBI taxonomy, the resulting tree has only 178
>> taxa. I suspect, something must be wrong with the merge_lineage  
>> method
>> in the major rewrite of the taxonomy2tree script. Can someone please
>> check this? I'm attaching the 190 species call to the script. Thanks,
>
> Ok, I'll look into it. You're also welcome to see if you can take your
> own code from your original taxonomy2tree script and see if you can
> merge/replace the appropriate Bio::Tree::TreeFunctionsI methods with
> your algorithms to get it working correctly. Indeed, does your  
> original
> version of the script work on this data set?
>
>
> Cheers,
> Sendu.

Sendu,

Don't know if it helps, but when I tried Gabriel's shell script last  
night I ran a modification of taxonomy2tree to see what would pop  
up.  Everything is fine up to about 100 iterations, then merge_lineage 
() starts dropping leaf nodes.

chris 
  

From bix at sendu.me.uk  Sat Dec 16 15:33:35 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 16 Dec 2006 15:33:35 +0000
Subject: [Bioperl-l] NO BLAST
In-Reply-To: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
References: <58ff33550612150839i40409b06pe427bcd77d3f208@mail.gmail.com>
Message-ID: <458411CF.8000707@sendu.me.uk>

Luba Pardo wrote:
> *Hello,*
> *I am having trouble to use the module Bio::Tools::Run::StandAloneBlast;*
> **
> *I got the following error message: cannot find path to blastall.*
> *The code I used is (modified from HOWTObeginners):

Bioperl doesn't know where you installed blast. If you've actually 
installed it, you can set the environment variable BLASTDIR to point to 
the directory that contains the blastall executable.


From cain.cshl at gmail.com  Fri Dec 15 18:09:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 15 Dec 2006 13:09:48 -0500
Subject: [Bioperl-l] Bio::SeqFeature::Annotated and
	mandatory	type	checking
In-Reply-To: <9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
References: <637A2459-4115-466F-BD8D-036D5E9114F8@cshl.edu>
	<4581CCEB.20206@sendu.me.uk>
	<1166158897.2569.335.camel@localhost.localdomain>
	<9B984087-C843-440A-B3E1-F7DEC65160E7@uiuc.edu>
Message-ID: <1166206188.2569.380.camel@localhost.localdomain>

On Fri, 2006-12-15 at 11:49 -0600, Chris Fields wrote:
> 
> To tell the truth I don't know if this is where the mandatory checks  
> were added in; I'm not too familiar with SeqFeature::Annotation yet.
> 
> I agree with Scott (and Matthew) that SOFA checks should be  
> optional.  Matthew, can you write up a patch and maybe some tests?
> 
> chris
> 
That's not where they were added in, it just that they hadn't been fully
implemented before then, so they didn't work (which probably meant they
weren't mandatory, though I don't remember (it could be that it just
croaked)).

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061215/b248a096/attachment.sig>

From hlapp at gmx.net  Sun Dec 17 06:02:04 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 17 Dec 2006 01:02:04 -0500
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <458404BD.8030908@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
Message-ID: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>


On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:

> Lincoln Stein wrote:
>> This is very embarassing for me, particularly since I spent a lot  
>> of time
>> validating that Bio::Graphics was working properly before the  
>> 1.5.2 release
>> went out. How long before there is a 1.5.3 release? How about a  
>> 1.5.2.1release?
>
> I'm happy to try a point release for critical bug fixes. Why don't you
> commit the necessary fixes to branch-1-5-2 and let me know when you're
> happy, and I'll do 1.5.2.1.

Feel free to do that, but why not make a 1.5.3 off the main trunk?  
1.5.2.1 may be adding more to the version confusion (developer/stable/ 
point-release/etc) than it is worth, and there is no shame in  
releasing new developer versions every few weeks.

My $0.02 ...

	-hilmar


>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From fgarret at ub.edu  Mon Dec 18 12:07:02 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 13:07:02 +0100
Subject: [Bioperl-l] codeml
Message-ID: <45868466.508@ub.edu>

Hi all,

I've been using bioperl's PAML module (specifically the codeml part) but 
with just one tree.

Since the program accepts several trees as input (and runs the analysis 
for each tree outputting the difference in likelihoods for each one) I 
was wondering if there's some way to do it through bioperl?

thanks in adv,
FG


From heikki at sanbi.ac.za  Mon Dec 18 13:51:50 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 18 Dec 2006 15:51:50 +0200
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
Message-ID: <200612181551.51277.heikki@sanbi.ac.za>


Reading the discussion, I think it is time to draw some guidelines.

1. Base the Meta implementation to a real use cases.

   MSA is a good example.

2. Allow generalisations

   If you can see an other implementation of the same idea that can be merged 
   with the first do it but do not hurt yourself if you can not.


The most difficult question is how to separate case-specific attributes that 
are best implemented by subclassing with additional methods from truly widely 
variable meta data that is best done as a parallel track meta information 
holding class.

The problem I see with undefined, totally open meta annotation, is that if you 
can put anything in there, it is also totally confusing to a user. If you can 
put anything in, how do you know what to get get out and know that it is 
there?

That leads to the the third guideline:

3. Use separate meta classes only when there are several different ways of 
encoding data that is present in large numbers *and* when you are expecting 
to be assessing the data computationally rather than just checking if an 
attribute is there. 


	-Heikki


On Friday 15 December 2006 19:23, Chris Fields wrote:
> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
> > On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
> >> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
> >>> Hey Chris,
> >>>
> >>> My thoughts below.
> >>>
> >>>> [Chris]
> >>>> This could be used to annotate any
> >>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
> >>>> you,
> >>>> maybe in a collection (similar to AnnotationCollection).  I thought
> >>>> something like this may be of general use for any PrimarySeq
> >>>> (quality, structure), alignments like NEXUS and Stockholm,
> >>>> SeqFeatures where structure could be stored (tRNA or riboswitches),
> >>>> etc.
> >>>>
> >>>> However, this also seems to fall into the category of sequence
> >>>> annotation.  So, would it be better to have a set of
> >>>> Bio::Annotation
> >>>> classes used for this purpose?
> >>>
> >>> To me, all meta data is equal. That is, your classic Genbank feature
> >>> annotation and a user's arbitrary meta-tag like "Bob thinks this
> >>> is a
> >>> kinase domain" aren't different in kind even if they are
> >>> different in
> >>> content.
> >>>
> >>> As resequencing projects multiply, the ability to create arbitrary
> >>> meta tags, attach them to different types of objects, and use those
> >>> tags to link them together will become desirable, if not essential.
> >>>
> >>> Keeping a common interface to all of these meta data types would be
> >>> advantageous, plus new users won't have to determine whether they
> >>> need to use Bio::Meta objects or Bio::Annotation objects.
> >>>
> >>> So I would argue for all of the meta data types to live "under one
> >>> roof". Which roof isn't as important. Bio::Annotation, since it
> >>> already exists for today's meta data, seems like a reasonable
> >>> choice.
> >>> (assuming Annotation objects are flexible enough to be extended as
> >>> you propose)
> >>>
> >>> There, and no flames or jibes even. :)
> >>
> >> I guess what I want to know is whether there should to be a
> >> distinction between 'normal' sequence annotation (comments,
> >> references, and so on) and annotation that could be best described as
> >> position-specific (like RNA or protein structural annotation).  The
> >> current meta implementation is for sequence data only; I felt it
> >> would be nice to have a generic implementation that would be
> >> applicable to any object data.
> >
> > my stream-of-consciousness for right now:
> >
> > I was thinking Bio::Annotation is where this should go - that
> > system doesn't have anything about it that makes it explicitly
> > sequence related. What we're trying to hammer out here on the
> > Alignment side - which fits with your RNA example - is have
> > features, basically SeqFeatures - associated with alignments so
> > columns can be annotated to cover things like character sets and
> > partitions for phylogenetic analyses.  As for data which annotates
> > non-contiguous things like RNAstems we may have  to be more
> > creative about that or model it with a splitLocation.
> >
> > So currently we've added code so that an Alignment is-a
> > Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
> > end, with the goal of being able to capture more of the data that
> > can be represented in a NEXUS file.
> >
> > It feels more like a hack than an elegant Meta-data solution, but I
> > am totally sure whether the data you are thinking about doing at
> > this point, perhaps I need to spend more time thinking about it.
> > Or are you worried about the idea of whether the semantic mapping
> > of the data into features or annotations is confusing users?
>
> Sorry in advance for the longish response here...
>
> My original thought was to have a generic abstract class capable of
> positionally describing data in any another class, similar to
> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
> Implementing classes would be capable of having different data
> structures based on their use (simple string, array, AoA, AoH, AoO).
> One MetaCollection class to contain them all in a tag-like system, so
> you could have mixed data types describe the same object.  The latter
> Collection class is so similar to AnnotationCollection that I agree
> Bio::Annotation would be the best place for this.
>
> The way I reconfigured Stockholm alignment parsing/writing is to use
> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
> capable of holding a sequence and several meta strings, stored as
> tags or 'names'.  However, there is no Meta object for alignments
> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
> rather have a generic Meta object independent of the sequence cruft.
>
> So for this partial Pfam alignment,
>
> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
> #=GR Q92SV1_RHIME/122-299 pAS .........................
> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
> #=GC SA_cons                 03002200312...1312414..676
> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
> //
>
> '#=GC' lines would be in generic meta string objects in the
> alignment, while '#=GR' tags would be in similar meta objects in the
> relevant sequences.  As long as both aren't AnnotatableI this isn't
> an issue.
>
> Similarly, NEXUS files which contained any position-based values
> could hold a meta string/array object in a similar tag.
>
> The basic scheme is:
>                      |--String
>
> Annotation::Meta----|--Array
>
>                      |--HorriblyComplexDataStruct
>
> Then I started thinking about where this could be applied, and
> whether a true Meta object needs to be constrained only to describing
> position-based data.  This somewhat relates to this bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>
> which seems to need a simple but unconstrained hash-of-arrays-based
> meta object.
>
> Then my head appropriately exploded...
>
> Hope everything is going well at the hackathon!  Looks like some
> interesting stuff coming out of it.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From fgarret at ub.edu  Mon Dec 18 16:18:31 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 17:18:31 +0100
Subject: [Bioperl-l] PAML files
Message-ID: <4586BF57.4090002@ub.edu>

Hi all,

does anyone knows how to get the name of the .ctl file created by the 
PAML module? Inside the tmp directory there are 2 files with random 
names (tree and ctl). Why do they have random names?? Wouldn't it be 
easier to assign them a fixed name?? For instance "codeml.ctl" and 
"tree.nwk"??

thanks in adv,
FG


From bix at sendu.me.uk  Mon Dec 18 16:15:21 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 16:15:21 +0000
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
Message-ID: <4586BE99.7020308@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
> 
>> Lincoln Stein wrote:
>>> This is very embarassing for me, particularly since I spent a lot
>>> of time validating that Bio::Graphics was working properly before
>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>> release? How about a 1.5.2.1release?
>> 
>> I'm happy to try a point release for critical bug fixes. Why don't
>> you commit the necessary fixes to branch-1-5-2 and let me know when
>> you're happy, and I'll do 1.5.2.1.
> 
> Feel free to do that, but why not make a 1.5.3 off the main trunk? 
> 1.5.2.1 may be adding more to the version confusion 
> (developer/stable/point-release/etc) than it is worth,

My feeling is that 1.5.3 should be reserved for some significant changes
and new features, and not just a few bug fixes. I'd say this causes less
confusion amongst users - they can associate '1.5.2' with a certain API
and feature-set, and the specific name of the file they download and
install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
matter at all to them.

I also won't have to make some major announcement about it; it will
simply be the most recent developer version of bioperl available so new
users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
1.5.2 users will only feel compelled to get it if they suffer from the
bugs fixed.


> and there is no shame in releasing new developer versions every few
> weeks.

I think doing frequent releases are inadvisable; such a quick release
won't have had much testing so we shouldn't encourage people to install
it: encouragement is implicit when a major new version comes out like
1.5.3. People who want to live on the edge can and should be using a
CVS checkout.


From bix at sendu.me.uk  Mon Dec 18 19:15:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 19:15:16 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
Message-ID: <4586E8C4.6030306@sendu.me.uk>

Chris Fields wrote:
> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
> 
>> However, on a larger set of 190 species, which are all present in
>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>> something must be wrong with the merge_lineage method in the major
>> rewrite of the taxonomy2tree script. Can someone please check this?
>> I'm attaching the 190 species call to the script. Thanks,
>> 
>> Gabriel
> 
> I can confirm that.  It is definitely dropping them in merge_lineage
>  (); if you add a call to get_leaf_nodes to check how many are
> present after each merge_lineage() call, you can see it dropping
> nodes along the trace.

I confirm the 'dropped' nodes, but also claim that this is no bug.

For example, the first 'drop' happens for the 101st species which is
'Leptospira interrogans serovar Copenhageni'. This is a variation
(descendant) of species 24: 'Leptospira interrogans'. So when the
variation is added it becomes a leaf and 'Leptospira interrogans' is no
longer a leaf, so the overall number of leaves does not increase.

The next drop is for species 103 'Prochlorococcus marinus subsp.
pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
Same deal. I didn't check any others, but suspect the same issue arises
in all cases.

Gabriel, please confirm this isn't a bug, or suggest how you propose to
see your taxa when they are not all leaves of the tree.


PS. I changed the merge_lineage() algorithm to be 18x faster (from the 
absurd 3mins for making the 190 species tree to a more reasonable 10s), 
without changing the tree produced.


From fgarret at ub.edu  Mon Dec 18 20:01:38 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:01:38 +0100
Subject: [Bioperl-l] PAML files
In-Reply-To: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
Message-ID: <4586F3A2.4010607@ub.edu>


Hi Jason,

This question is related with the one I made previously today.
I need to run codeml with 3 tree topologies. I looked on codeml module 
but it only accepts one tree as input so I thought of using the codeml 
module to prepare all the files and then I would just have to run the 
codeml with the new tree file in batch. But for that I need to know 
which one is the ctl file.

any idea?
FG

Jason Stajich wrote:
> They are temporary names so they are deliberately random and there is no 
> intention of you needing them after a run since it to be cleaned up on 
> the fly. We use an internal method for generating tempfiles that takes 
> care of cleanup afterwards.  I suppose since we do all the work within a 
> temp directory that is cleaned up, one could have a fixed name for the 
> tree, alignment, and ctl files but honestly we never expect people to be 
> reading these filenames as they are intended to be transient.
> 
> What problem are you having that you need the filename?
> 
> -jason
> On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> does anyone knows how to get the name of the .ctl file created by the 
>> PAML module? Inside the tmp directory there are 2 files with random 
>> names (tree and ctl). Why do they have random names?? Wouldn't it be 
>> easier to assign them a fixed name?? For instance "codeml.ctl" and 
>> "tree.nwk"??
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org <mailto:jason at bioperl.org>
> http://jason.open-bio.org/
> 
> 


From fgarret at ub.edu  Mon Dec 18 20:07:46 2006
From: fgarret at ub.edu (Filipe Garrett)
Date: Mon, 18 Dec 2006 21:07:46 +0100
Subject: [Bioperl-l] codeml
In-Reply-To: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
References: <45868466.508@ub.edu>
	<7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>
Message-ID: <4586F512.1030209@ub.edu>


Right now it's impossible for me to write it.
By February or March I should have more time but I'll let you know.

FG

Jason Stajich wrote:
> This is shortcoming in the Run::Phylo::PAML::Codeml implementation - I 
> guess we'll need to allow the -tree option to accept and arrayref of trees.
> Are you willing to try write this patch?  It should be added as a 
> bug/feature request to bugzilla so it can be corrected in short order.
> 
> -jason
> On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:
> 
>> Hi all,
>>
>> I've been using bioperl's PAML module (specifically the codeml part) but 
>> with just one tree.
>>
>> Since the program accepts several trees as input (and runs the analysis 
>> for each tree outputting the difference in likelihoods for each one) I 
>> was wondering if there's some way to do it through bioperl?
>>
>> thanks in adv,
>> FG
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich 
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> 
> 


From cjfields at uiuc.edu  Mon Dec 18 20:55:55 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 14:55:55 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <4586E8C4.6030306@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
Message-ID: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>


On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 15, 2006, at 6:45 PM, Gabriel Valiente wrote:
>>
>>> However, on a larger set of 190 species, which are all present in
>>> the NCBI taxonomy, the resulting tree has only 178 taxa. I suspect,
>>> something must be wrong with the merge_lineage method in the major
>>> rewrite of the taxonomy2tree script. Can someone please check this?
>>> I'm attaching the 190 species call to the script. Thanks,
>>>
>>> Gabriel
>>
>> I can confirm that.  It is definitely dropping them in merge_lineage
>>  (); if you add a call to get_leaf_nodes to check how many are
>> present after each merge_lineage() call, you can see it dropping
>> nodes along the trace.
>
> I confirm the 'dropped' nodes, but also claim that this is no bug.
>
> For example, the first 'drop' happens for the 101st species which is
> 'Leptospira interrogans serovar Copenhageni'. This is a variation
> (descendant) of species 24: 'Leptospira interrogans'. So when the
> variation is added it becomes a leaf and 'Leptospira interrogans'  
> is no
> longer a leaf, so the overall number of leaves does not increase.
>
> The next drop is for species 103 'Prochlorococcus marinus subsp.
> pastoris str. CCMP1986', a subspecies of 63 'Prochlorococcus marinus'.
> Same deal. I didn't check any others, but suspect the same issue  
> arises
> in all cases.

Makes sense now.  I personally would consider this a bug since the  
results are unexpected (so the docs need to be modified in order to  
clarify).  Some say tomato...

I suppose this is one of the issues one might run into when using  
NCBI taxonomy to build trees.

> Gabriel, please confirm this isn't a bug, or suggest how you  
> propose to
> see your taxa when they are not all leaves of the tree.

Having the nodes appear internally seems semantically correct to me.   
Is there any other way?

> PS. I changed the merge_lineage() algorithm to be 18x faster (from the
> absurd 3mins for making the 190 species tree to a more reasonable  
> 10s),
> without changing the tree produced.

Definitely an improvement!

chris


From jason at bioperl.org  Mon Dec 18 19:33:32 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:33:32 -0500
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586BF57.4090002@ub.edu>
References: <4586BF57.4090002@ub.edu>
Message-ID: <34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>

They are temporary names so they are deliberately random and there is  
no intention of you needing them after a run since it to be cleaned  
up on the fly. We use an internal method for generating tempfiles  
that takes care of cleanup afterwards.  I suppose since we do all the  
work within a temp directory that is cleaned up, one could have a  
fixed name for the tree, alignment, and ctl files but honestly we  
never expect people to be reading these filenames as they are  
intended to be transient.

What problem are you having that you need the filename?

-jason
On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:

> Hi all,
>
> does anyone knows how to get the name of the .ctl file created by the
> PAML module? Inside the tmp directory there are 2 files with random
> names (tree and ctl). Why do they have random names?? Wouldn't it be
> easier to assign them a fixed name?? For instance "codeml.ctl" and
> "tree.nwk"??
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjm at fruitfly.org  Mon Dec 18 21:50:00 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 18 Dec 2006 13:50:00 -0800
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>


I agree with everything Heikki is saying, I just wanted to highlight  
one paragraph:

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?

One solution is to give your annotation/metadata-model formal  
computational semantics and use ontologies to give additional  
semantics to your metadata tags. This provides both user information  
in the form of documentation, and a means of specifying to the  
computer exactly what should be done with the tags.

This is probably overkill for bioperl; but if the use cases being  
proposed do lean in the direction of a new metadata system that is  
not necessarily backwards compatible with the existing one, then I'd  
recommend checking out what's already out there before re-inventing  
the wheel. Perl RDF libraries are getting a little better.

If anyone is interested in pursuing this sort of thing (probably on a  
branch), let me know

On Dec 18, 2006, at 5:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.
>
> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.
>
>
> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.
>
> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki
>
>
>
> On Friday 15 December 2006 19:23, Chris Fields wrote:
>> On Dec 15, 2006, at 8:28 AM, Jason Stajich wrote:
>>> On Dec 14, 2006, at 9:21 PM, Chris Fields wrote:
>>>> On Dec 14, 2006, at 7:45 PM, David Messina wrote:
>>>>> Hey Chris,
>>>>>
>>>>> My thoughts below.
>>>>>
>>>>>> [Chris]
>>>>>> This could be used to annotate any
>>>>>> PrimarySeq, LocatableSeq, SimpleAlign, SeqFeature, or what-have-
>>>>>> you,
>>>>>> maybe in a collection (similar to AnnotationCollection).  I  
>>>>>> thought
>>>>>> something like this may be of general use for any PrimarySeq
>>>>>> (quality, structure), alignments like NEXUS and Stockholm,
>>>>>> SeqFeatures where structure could be stored (tRNA or  
>>>>>> riboswitches),
>>>>>> etc.
>>>>>>
>>>>>> However, this also seems to fall into the category of sequence
>>>>>> annotation.  So, would it be better to have a set of
>>>>>> Bio::Annotation
>>>>>> classes used for this purpose?
>>>>>
>>>>> To me, all meta data is equal. That is, your classic Genbank  
>>>>> feature
>>>>> annotation and a user's arbitrary meta-tag like "Bob thinks this
>>>>> is a
>>>>> kinase domain" aren't different in kind even if they are
>>>>> different in
>>>>> content.
>>>>>
>>>>> As resequencing projects multiply, the ability to create arbitrary
>>>>> meta tags, attach them to different types of objects, and use  
>>>>> those
>>>>> tags to link them together will become desirable, if not  
>>>>> essential.
>>>>>
>>>>> Keeping a common interface to all of these meta data types  
>>>>> would be
>>>>> advantageous, plus new users won't have to determine whether they
>>>>> need to use Bio::Meta objects or Bio::Annotation objects.
>>>>>
>>>>> So I would argue for all of the meta data types to live "under one
>>>>> roof". Which roof isn't as important. Bio::Annotation, since it
>>>>> already exists for today's meta data, seems like a reasonable
>>>>> choice.
>>>>> (assuming Annotation objects are flexible enough to be extended as
>>>>> you propose)
>>>>>
>>>>> There, and no flames or jibes even. :)
>>>>
>>>> I guess what I want to know is whether there should to be a
>>>> distinction between 'normal' sequence annotation (comments,
>>>> references, and so on) and annotation that could be best  
>>>> described as
>>>> position-specific (like RNA or protein structural annotation).  The
>>>> current meta implementation is for sequence data only; I felt it
>>>> would be nice to have a generic implementation that would be
>>>> applicable to any object data.
>>>
>>> my stream-of-consciousness for right now:
>>>
>>> I was thinking Bio::Annotation is where this should go - that
>>> system doesn't have anything about it that makes it explicitly
>>> sequence related. What we're trying to hammer out here on the
>>> Alignment side - which fits with your RNA example - is have
>>> features, basically SeqFeatures - associated with alignments so
>>> columns can be annotated to cover things like character sets and
>>> partitions for phylogenetic analyses.  As for data which annotates
>>> non-contiguous things like RNAstems we may have  to be more
>>> creative about that or model it with a splitLocation.
>>>
>>> So currently we've added code so that an Alignment is-a
>>> Bio::AnnotableI and is-a Bio::FeatureHolderI to move towards this
>>> end, with the goal of being able to capture more of the data that
>>> can be represented in a NEXUS file.
>>>
>>> It feels more like a hack than an elegant Meta-data solution, but I
>>> am totally sure whether the data you are thinking about doing at
>>> this point, perhaps I need to spend more time thinking about it.
>>> Or are you worried about the idea of whether the semantic mapping
>>> of the data into features or annotations is confusing users?
>>
>> Sorry in advance for the longish response here...
>>
>> My original thought was to have a generic abstract class capable of
>> positionally describing data in any another class, similar to
>> Heikki's Bio::Seq::MetaI but not constrained to sequence data only.
>> Implementing classes would be capable of having different data
>> structures based on their use (simple string, array, AoA, AoH, AoO).
>> One MetaCollection class to contain them all in a tag-like system, so
>> you could have mixed data types describe the same object.  The latter
>> Collection class is so similar to AnnotationCollection that I agree
>> Bio::Annotation would be the best place for this.
>>
>> The way I reconfigured Stockholm alignment parsing/writing is to use
>> Bio::Seq::Meta objects (which are LocatableSeq).  Each Seq::Meta is
>> capable of holding a sequence and several meta strings, stored as
>> tags or 'names'.  However, there is no Meta object for alignments
>> (for RNA/protein structure consensus and other Rfam/Pfam markup); I
>> hacked around this by using a Bio::Seq::Meta w/o a seq, but I would
>> rather have a generic Meta object independent of the sequence cruft.
>>
>> So for this partial Pfam alignment,
>>
>> Q92SV1_RHIME/122-299         LAMALNLARGI...VDADVDF..REG
>> #=GR Q92SV1_RHIME/122-299 pAS .........................
>> Q883D2_PSESM/110-290         LGLMLGLRRRL...FDGNGAV..KRS
>> Q8ZXP5_PYRAE/91-262          LALLLAPYKRI...IQYGEKM..KRG
>> #=GR Q8ZXP5_PYRAE/91-262 SS  HHHHHHHHTTH...HHHHHHX..HTT
>> #=GR Q8ZXP5_PYRAE/91-262 SA  00000000000...120030X..474
>> #=GC SS_cons                 HHHHHHHHTTH...HHHHHHH..HTT
>> #=GC SA_cons                 03002200312...1312414..676
>> #=GC seq_cons                luhhLuhsRpl...hthppth..+pG
>> //
>>
>> '#=GC' lines would be in generic meta string objects in the
>> alignment, while '#=GR' tags would be in similar meta objects in the
>> relevant sequences.  As long as both aren't AnnotatableI this isn't
>> an issue.
>>
>> Similarly, NEXUS files which contained any position-based values
>> could hold a meta string/array object in a similar tag.
>>
>> The basic scheme is:
>>                      |--String
>>
>> Annotation::Meta----|--Array
>>
>>                      |--HorriblyComplexDataStruct
>>
>> Then I started thinking about where this could be applied, and
>> whether a true Meta object needs to be constrained only to describing
>> position-based data.  This somewhat relates to this bug:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1825
>>
>> which seems to need a simple but unconstrained hash-of-arrays-based
>> meta object.
>>
>> Then my head appropriately exploded...
>>
>> Hope everything is going well at the hackathon!  Looks like some
>> interesting stuff coming out of it.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Dec 18 19:35:50 2006
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Dec 2006 14:35:50 -0500
Subject: [Bioperl-l] codeml
In-Reply-To: <45868466.508@ub.edu>
References: <45868466.508@ub.edu>
Message-ID: <7150593C-C159-4418-8FB3-9D7906C37E15@bioperl.org>

This is shortcoming in the Run::Phylo::PAML::Codeml implementation -  
I guess we'll need to allow the -tree option to accept and arrayref  
of trees.
Are you willing to try write this patch?  It should be added as a bug/ 
feature request to bugzilla so it can be corrected in short order.

-jason
On Dec 18, 2006, at 7:07 AM, Filipe Garrett wrote:

> Hi all,
>
> I've been using bioperl's PAML module (specifically the codeml  
> part) but
> with just one tree.
>
> Since the program accepts several trees as input (and runs the  
> analysis
> for each tree outputting the difference in likelihoods for each one) I
> was wondering if there's some way to do it through bioperl?
>
> thanks in adv,
> FG
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html


From gowthaman.ramasamy at sbri.org  Mon Dec 18 22:19:09 2006
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Mon, 18 Dec 2006 14:19:09 -0800
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>


Hi List,
Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

Many thanks in advance,
gowtham


From cjfields at uiuc.edu  Mon Dec 18 22:33:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:33:34 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <200612181551.51277.heikki@sanbi.ac.za>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
Message-ID: <FBD2CED3-EBE7-4CB9-8969-70C7A5931A04@uiuc.edu>


On Dec 18, 2006, at 7:51 AM, Heikki Lehvaslaiho wrote:

>
> Reading the discussion, I think it is time to draw some guidelines.
>
> 1. Base the Meta implementation to a real use cases.
>
>    MSA is a good example.

AlignIO::stockholm is where I'll initially test it out.

> 2. Allow generalisations
>
>    If you can see an other implementation of the same idea that can  
> be merged
>    with the first do it but do not hurt yourself if you can not.

I agree.

> The most difficult question is how to separate case-specific  
> attributes that
> are best implemented by subclassing with additional methods from  
> truly widely
> variable meta data that is best done as a parallel track meta  
> information
> holding class.

I would probably start with a general Bio::Annotation::MetaI abstract  
class, which supplements AnnotationI with general meta-specific  
methods (meta, meta_text, named_meta, etc)?  Implement this in  
whatever way one wanted (RNA structure as strings, quality data as  
arrays, etc) under the constraints of the interface description.

Multiple meta objects, potentially of mixed data types, could be  
added in an AnnotationCollection along with other Bio::Annotation  
data, or stored in a nested meta-specific AnnotationCollection object  
(I favor the former as it's simpler).  So you could have an  
alignment, sequence, seqfeature (anything that is AnnotatableI) with  
a regular AnnotationCollection also containing possibly multiple meta  
objects, each meta object also containing possibly more than one set  
of meta data.

The key issue I have is whether or not to constrain these to  
describing positional data, similar to Bio::Seq::Meta, by ensuring  
that the data is_flush(), etc.  My current inclination is 'no', and  
to have a separate abstract class which describes these methods,  
implementing those separately.

> The problem I see with undefined, totally open meta annotation, is  
> that if you
> can put anything in there, it is also totally confusing to a user.  
> If you can
> put anything in, how do you know what to get get out and know that  
> it is
> there?
>
> That leads to the the third guideline:
>
> 3. Use separate meta classes only when there are several different  
> ways of
> encoding data that is present in large numbers *and* when you are  
> expecting
> to be assessing the data computationally rather than just checking  
> if an
> attribute is there.
>
>
> 	-Heikki

The initial use case for this would be simple data strings for  
alignment data.  I already have a partial implementation in place for  
stockholm using Bio::Seq::Meta (which led me to this proposal!).  I  
like Chris M.'s idea of ensuring that meta implementations use some  
sort of formalized ontology, but I'll probably start out very simple  
and work up from there.

chris


From cjfields at uiuc.edu  Mon Dec 18 22:38:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 16:38:14 -0600
Subject: [Bioperl-l] [Gmod-gbrowse] xyplot data alignment problem?
In-Reply-To: <4586BE99.7020308@sendu.me.uk>
References: <EA0BFA4F-8182-4C40-92DA-218CE3F48D8B@genomics.princeton.edu>	<6dce9a0b0612141356u63afe2dak7e1d8dad93408312@mail.gmail.com>
	<6dce9a0b0612150802x354a02a8ib17fbd882379c63c@mail.gmail.com>
	<458404BD.8030908@sendu.me.uk>
	<733825EE-0426-4D12-A02F-B8825CDEBBA9@gmx.net>
	<4586BE99.7020308@sendu.me.uk>
Message-ID: <6AD475AE-7F5E-4612-BC24-73B65AA47F30@uiuc.edu>


On Dec 18, 2006, at 10:15 AM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>>
>> On Dec 16, 2006, at 9:37 AM, Sendu Bala wrote:
>>
>>> Lincoln Stein wrote:
>>>> This is very embarassing for me, particularly since I spent a lot
>>>> of time validating that Bio::Graphics was working properly before
>>>> the 1.5.2 release went out. How long before there is a 1.5.3
>>>> release? How about a 1.5.2.1release?
>>>
>>> I'm happy to try a point release for critical bug fixes. Why don't
>>> you commit the necessary fixes to branch-1-5-2 and let me know when
>>> you're happy, and I'll do 1.5.2.1.
>>
>> Feel free to do that, but why not make a 1.5.3 off the main trunk?
>> 1.5.2.1 may be adding more to the version confusion
>> (developer/stable/point-release/etc) than it is worth,
>
> My feeling is that 1.5.3 should be reserved for some significant  
> changes
> and new features, and not just a few bug fixes. I'd say this causes  
> less
> confusion amongst users - they can associate '1.5.2' with a certain  
> API
> and feature-set, and the specific name of the file they download and
> install (bioperl-1.5.2_100.tar.gz vs bioperl-1.5.2_101.tar.gz) won't
> matter at all to them.
>
> I also won't have to make some major announcement about it; it will
> simply be the most recent developer version of bioperl available so  
> new
> users trying to get 1.5.2 will end up getting 1.5.2.1, whilst existing
> 1.5.2 users will only feel compelled to get it if they suffer from the
> bugs fixed.
>
>
>> and there is no shame in releasing new developer versions every few
>> weeks.
>
> I think doing frequent releases are inadvisable; such a quick release
> won't have had much testing so we shouldn't encourage people to  
> install
> it: encouragement is implicit when a major new version comes out like
> 1.5.3. People who want to live on the edge can and should be using a
> CVS checkout.

I thought that 1.5.2 was considered a point release for the 1.5 dev  
series, for bug fixes along with the potential for added/experimental  
features.  Similarly, 1.6.x releases would be point releases for bug  
fixes only with all tests passing (no added features since it is a  
stable release series).  I guess one could reason that 1.5.x releases  
have both bug fixes and new features, while 1.5.x.y releases are  
simply bug fixes for the 1.5.x branch (no new features).  We probably  
should add something to the FAQ and maybe make a few changes to the  
1.5.2 wiki page.

I think having a 1.5.2.1 release is feasible as a quick one-off to  
get Lincoln's fixes in, since you would make them off the 1.5.2  
branch anyway (so I guess it could be considered a bug release from  
that branch).  It's probably not something we should make a habit of,  
but then again I'm not the Pumpkin!

chris


From bix at sendu.me.uk  Mon Dec 18 22:50:11 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 22:50:11 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
Message-ID: <45871B23.8070103@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
> 
>> For example, the first 'drop' happens for the 101st species which is
>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>> longer a leaf, so the overall number of leaves does not increase.
>
> Makes sense now.  I personally would consider this a bug since the 
> results are unexpected (so the docs need to be modified in order to 
> clarify).  Some say tomato...
> 
> I suppose this is one of the issues one might run into when using NCBI 
> taxonomy to build trees.

No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
deliberately then does:

# simple paths are contracted by removing degree one nodes
$tree->contract_linear_paths;

Because that is what Gabriel's script originally did.


>> Gabriel, please confirm this isn't a bug, or suggest how you propose to
>> see your taxa when they are not all leaves of the tree.
> 
> Having the nodes appear internally seems semantically correct to me.  Is 
> there any other way?

I suppose if we want to see all the input species output again we have 
to make contract_linear_paths() aware of nodes we want to keep, even 
when they are degree one nodes. Gabriel, is that what you want to see?


From cjfields at uiuc.edu  Mon Dec 18 23:14:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:14:23 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <45871B23.8070103@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
Message-ID: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>


On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>> For example, the first 'drop' happens for the 101st species which is
>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>> variation is added it becomes a leaf and 'Leptospira interrogans'  
>>> is no
>>> longer a leaf, so the overall number of leaves does not increase.
>>
>> Makes sense now.  I personally would consider this a bug since the  
>> results are unexpected (so the docs need to be modified in order  
>> to clarify).  Some say tomato...
>> I suppose this is one of the issues one might run into when using  
>> NCBI taxonomy to build trees.
>
> No, the tree produced is perfectly fine. The taxonomy2tree.pl  
> script deliberately then does:
>
> # simple paths are contracted by removing degree one nodes
> $tree->contract_linear_paths;
>
> Because that is what Gabriel's script originally did.

I think you misunderstood me.  The tree is fine; the data used to  
make the tree (NCBI taxonomy) is the issue.  One of the clear caveats  
that NCBI attaches to their taxonomy data is that should not be the  
'primary source for taxonomic or phylogenetic information':

http://tinyurl.com/y3k624

I think it works as a good guide as long as one takes the above into  
consideration.  That and the fact that not all taxids attached to  
sequence data will represent leaf nodes.

chris


From cjfields at uiuc.edu  Mon Dec 18 23:15:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 17:15:56 -0600
Subject: [Bioperl-l] Proposal for Meta data
In-Reply-To: <6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
References: <B3EF69DB-9C01-4F42-A4E4-898613D872F9@uiuc.edu>
	<32BE3FCF-C788-438F-8A4A-8A586DD6C569@bioperl.org>
	<F302B7DD-C806-4A6F-ACDF-9F27A84E0BF0@uiuc.edu>
	<200612181551.51277.heikki@sanbi.ac.za>
	<6747C74C-8A49-4169-8A3B-8A26134C3B0D@fruitfly.org>
Message-ID: <16D6DB51-C2CB-4E89-A597-4672FAA6681B@uiuc.edu>


On Dec 18, 2006, at 3:50 PM, Chris Mungall wrote:

>
> I agree with everything Heikki is saying, I just wanted to highlight
> one paragraph:
>
>> The problem I see with undefined, totally open meta annotation, is
>> that if you
>> can put anything in there, it is also totally confusing to a user.
>> If you can
>> put anything in, how do you know what to get get out and know that
>> it is
>> there?
>
> One solution is to give your annotation/metadata-model formal
> computational semantics and use ontologies to give additional
> semantics to your metadata tags. This provides both user information
> in the form of documentation, and a means of specifying to the
> computer exactly what should be done with the tags.
>
> This is probably overkill for bioperl; but if the use cases being
> proposed do lean in the direction of a new metadata system that is
> not necessarily backwards compatible with the existing one, then I'd
> recommend checking out what's already out there before re-inventing
> the wheel. Perl RDF libraries are getting a little better.
>
> If anyone is interested in pursuing this sort of thing (probably on a
> branch), let me know
...

I like the idea of of using ontologies (although that's one of my  
many weak points!).  I'll likely start off with simple examples using  
meta data initially, then progress from there.  It is a developer  
series, after all!

Thanks everybody!  I think I have an idea on how to at least get  
started.

chris


From bix at sendu.me.uk  Mon Dec 18 23:27:15 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:27:15 +0000
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
Message-ID: <458723D3.4010908@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 18, 2006, at 4:50 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> On Dec 18, 2006, at 1:15 PM, Sendu Bala wrote:
>>>> For example, the first 'drop' happens for the 101st species which is
>>>> 'Leptospira interrogans serovar Copenhageni'. This is a variation
>>>> (descendant) of species 24: 'Leptospira interrogans'. So when the
>>>> variation is added it becomes a leaf and 'Leptospira interrogans' is no
>>>> longer a leaf, so the overall number of leaves does not increase.
>>>
>>> Makes sense now.  I personally would consider this a bug since the 
>>> results are unexpected (so the docs need to be modified in order to 
>>> clarify).  Some say tomato...
>>> I suppose this is one of the issues one might run into when using 
>>> NCBI taxonomy to build trees.
>>
>> No, the tree produced is perfectly fine. The taxonomy2tree.pl script 
>> deliberately then does:
>>
>> # simple paths are contracted by removing degree one nodes
>> $tree->contract_linear_paths;
>>
>> Because that is what Gabriel's script originally did.
> 
> I think you misunderstood me.  The tree is fine; the data used to make 
> the tree (NCBI taxonomy) is the issue.

In what way is it the issue? The database is also fine as far as I can 
see, in so far as it is not causing any problems in this instance.

Gabriel asked for a tree featuring a species and its subspecies. The 
NCBI taxonomy database provided Bioperl the correct data to build such a 
tree. Then Gabriel asked to remove the degree one nodes of his tree. His 
problem was that doing that happened to (correctly) remove the species 
node. If he wants to see both his species and his subspecies he must 
either not remove degree one nodes, or alter the method of doing so to 
keep desired nodes. There is no possible way for NCBI to improve matters 
here.


From bix at sendu.me.uk  Mon Dec 18 23:45:59 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 18 Dec 2006 23:45:59 +0000
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45872837.6050403@sendu.me.uk>

Gowthaman Ramasamy wrote:
> Hi List, Is there any module in bioperl which can find out the primer
> binding sites in a genomic sequence. I am interested in finding
> locations with few mismatches along the primer...not just the exact
> match (which is a very trivial task)

There's no module dedicated to that task, but Bioperl may help you to
answer the question.

Probably the easiest/reliable/clear thing to do is to do a Blast with
appropriate settings for short sequence with few mismatches. You can
write a script to only consider hits for your forward primer that are a
'primable' distance from a hit to your reverse primer (and check their
orientations are correct as well).

Or use some e-pcr tool.


From Kevin.M.Brown at asu.edu  Mon Dec 18 23:52:20 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 18 Dec 2006 16:52:20 -0700
Subject: [Bioperl-l] module to find out primer binding sites in a genome
	sequence
Message-ID: <1A4207F8295607498283FE9E93B775B40270F3BB@EX02.asurite.ad.asu.edu>

A function I use to find the first landing site for a primer.  Should be
modifiable to handle multiple occurences:

=head1 C<match>

Match searches for a near alignment between two strings and returns the
position
at which the two strings align.  Match is based on 80% conformation

	match($this, $in_that)
	
=cut

sub match
{
	my ($primer, $gene) = @_;
	my $start   = 0;
	my $pattern = "";
	for (my $i = 0 ; $i < length($primer) ; $i++)
	{
		$pattern .= substr($primer, $i, 1);
		pos($gene) = 0;
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
		else
		{
			$start = 0;
			chop($pattern);
			$pattern .= '.';
		}
	}
	if ($pattern =~ /\.$/)
	{
		if ($gene =~ m/$pattern/gi)
		{
			$start = pos($gene) - length($pattern) + 1;
		}
	}
	$pattern =~ s/\.//g;

	if ((length($pattern) / length($primer)) > .8)
	{

		#print $start . "\n";
		return $start;
	}
	return 0;
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Monday, December 18, 2006 4:46 PM
> To: Gowthaman Ramasamy
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] module to find out primer binding 
> sites in a genome sequence
> 
> Gowthaman Ramasamy wrote:
> > Hi List, Is there any module in bioperl which can find out 
> the primer
> > binding sites in a genomic sequence. I am interested in finding
> > locations with few mismatches along the primer...not just the exact
> > match (which is a very trivial task)
> 
> There's no module dedicated to that task, but Bioperl may help you to
> answer the question.
> 
> Probably the easiest/reliable/clear thing to do is to do a Blast with
> appropriate settings for short sequence with few mismatches. You can
> write a script to only consider hits for your forward primer 
> that are a
> 'primable' distance from a hit to your reverse primer (and check their
> orientations are correct as well).
> 
> Or use some e-pcr tool.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From torsten.seemann at infotech.monash.edu.au  Mon Dec 18 23:52:58 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 19 Dec 2006 10:52:58 +1100
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <458729DA.9030909@infotech.monash.edu.au>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)

This FAQ question may help:
http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

This software may help:
http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

-- 
Dr Torsten Seemann               http://www.vicbioinformatics.com
Victorian Bioinformatics Consortium, Monash University, Australia


From sdavis2 at mail.nih.gov  Tue Dec 19 02:16:19 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 18 Dec 2006 21:16:19 -0500
Subject: [Bioperl-l] module to find out primer binding sites in a genome
 sequence
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0761E4@mail01.sbri.org>
Message-ID: <45874B73.7010600@mail.nih.gov>

Gowthaman Ramasamy wrote:
> Hi List,
> Is there any module in bioperl which can find out the primer binding sites in a genomic sequence.
> I am interested in finding locations with few mismatches along the primer...not just the exact match (which is a very trivial task)
>   

See here:

http://genome.ucsc.edu/cgi-bin/hgPcr?command=start

It is designed for exactly this task, is very fast, is available as an 
executable or web-based (though watch the usage requirements), and the 
output can be parsed rather easily.

Sean


From cjfields at uiuc.edu  Tue Dec 19 02:30:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 20:30:04 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <458723D3.4010908@sendu.me.uk>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
Message-ID: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>

>> I think you misunderstood me.  The tree is fine; the data used to  
>> make
>> the tree (NCBI taxonomy) is the issue.
>
> In what way is it the issue? The database is also fine as far as I can
> see, in so far as it is not causing any problems in this instance.

I should maybe have clarified a bit more: what I said has nothing to  
do with the structure of the database itself.  I was just pointing  
out that NCBI Taxonomy isn't the best source of data for building a  
phylogenetic tree, something NCBI also points out (the link in my  
last post).  Not a big deal, really.

> Gabriel asked for a tree featuring a species and its subspecies. The
> NCBI taxonomy database provided Bioperl the correct data to build  
> such a
> tree. Then Gabriel asked to remove the degree one nodes of his  
> tree. His
> problem was that doing that happened to (correctly) remove the species
> node. If he wants to see both his species and his subspecies he must
> either not remove degree one nodes, or alter the method of doing so to
> keep desired nodes. There is no possible way for NCBI to improve  
> matters
> here.

Actually, there isn't any way they could w/o digging through the  
literature in many cases.  The problem is incomplete taxonomic  
information for nodes derived from older sequence data, where a genus  
and species was designated but nothing else (strain, etc) is known.

Again, I merely was pointing out what I had mentioned above.  I  
wasn't criticizing you, Gabriel, or the methodology here.  Honest!

chris


From avilella at gmail.com  Mon Dec 18 21:43:27 2006
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 18 Dec 2006 21:43:27 +0000
Subject: [Bioperl-l] PAML files
In-Reply-To: <4586F3A2.4010607@ub.edu>
References: <4586BF57.4090002@ub.edu>
	<34C4970D-6F93-4CE4-878C-5FA4C916AAEC@bioperl.org>
	<4586F3A2.4010607@ub.edu>
Message-ID: <358f4d650612181343o5bd51169w7b46cceb34a5c92b@mail.gmail.com>

Filipe, if you need to create the ctl file but not run the job, you
can use the "prepare" method in Codeml run.

Also, there is a tmpdir and save_tempfiles method that will keep the
files where you want. About the naming, you can add a ".tree" and
".aln" extension to the tempnames if you want, by altering the
$temptreefile and $tempseqfile variables in
bioperl-run/Bio/Tools/Run/Phylo/PAML/Codeml.pm (cvs head version).

If you want, you can also add a couple of getters/setters there:

sub alnfilename{
    my $self = shift;

    return $self->{'alnfilename'} = shift if @_;
    return $self->{'alnfilename'};
}

and subtitute those $tempseqfile io calls for you
$self->{'alnfilename'} io calls.

$codeml->alnfilename("/path/name");
$codeml->prepare;
...
$codeml->run;

What I use to do is to have the aln and tree files in a different
place. Codeml will create the tmp files for running somewhere, and
then delete all the stuff when done.

Cheers,

    Albert.

On 12/18/06, Filipe Garrett <fgarret at ub.edu> wrote:
>
> Hi Jason,
>
> This question is related with the one I made previously today.
> I need to run codeml with 3 tree topologies. I looked on codeml module
> but it only accepts one tree as input so I thought of using the codeml
> module to prepare all the files and then I would just have to run the
> codeml with the new tree file in batch. But for that I need to know
> which one is the ctl file.
>
> any idea?
> FG
>
> Jason Stajich wrote:
> > They are temporary names so they are deliberately random and there is no
> > intention of you needing them after a run since it to be cleaned up on
> > the fly. We use an internal method for generating tempfiles that takes
> > care of cleanup afterwards.  I suppose since we do all the work within a
> > temp directory that is cleaned up, one could have a fixed name for the
> > tree, alignment, and ctl files but honestly we never expect people to be
> > reading these filenames as they are intended to be transient.
> >
> > What problem are you having that you need the filename?
> >
> > -jason
> > On Dec 18, 2006, at 11:18 AM, Filipe Garrett wrote:
> >
> >> Hi all,
> >>
> >> does anyone knows how to get the name of the .ctl file created by the
> >> PAML module? Inside the tmp directory there are 2 files with random
> >> names (tree and ctl). Why do they have random names?? Wouldn't it be
> >> easier to assign them a fixed name?? For instance "codeml.ctl" and
> >> "tree.nwk"??
> >>
> >> thanks in adv,
> >> FG
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org <mailto:jason at bioperl.org>
> > http://jason.open-bio.org/
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From valiente at lsi.upc.edu  Tue Dec 19 04:18:20 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 19 Dec 2006 13:18:20 +0900
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
Message-ID: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>

Thanks a lot for the prompt answer and follow-up discussion. I think  
this turned out not to be a bug in the merge_lineage() code but a  
direct consequence of building a phylogenetic tree instead of a  
taxonomic tree, aka with internal node labels.

In order to reconstruct the NCBI taxonomy for the set of species  
present in a given phylogenetic tree, the only reasonable work-around  
seems to be a first step of merging lineages and contracting linear  
paths with the current implementation, followed by a second step of  
restricting the given phylogenetic tree to the set of species present  
in the obtained NCBI taxonomy. But this does not affect the code for  
merge_lineage().

Gabriel

>>> I think you misunderstood me.  The tree is fine; the data used to  
>>> make
>>> the tree (NCBI taxonomy) is the issue.
>>
>> In what way is it the issue? The database is also fine as far as I  
>> can
>> see, in so far as it is not causing any problems in this instance.
>
> I should maybe have clarified a bit more: what I said has nothing  
> to do with the structure of the database itself.  I was just  
> pointing out that NCBI Taxonomy isn't the best source of data for  
> building a phylogenetic tree, something NCBI also points out (the  
> link in my last post).  Not a big deal, really.
>
>> Gabriel asked for a tree featuring a species and its subspecies. The
>> NCBI taxonomy database provided Bioperl the correct data to build  
>> such a
>> tree. Then Gabriel asked to remove the degree one nodes of his  
>> tree. His
>> problem was that doing that happened to (correctly) remove the  
>> species
>> node. If he wants to see both his species and his subspecies he must
>> either not remove degree one nodes, or alter the method of doing  
>> so to
>> keep desired nodes. There is no possible way for NCBI to improve  
>> matters
>> here.
>
> Actually, there isn't any way they could w/o digging through the  
> literature in many cases.  The problem is incomplete taxonomic  
> information for nodes derived from older sequence data, where a  
> genus and species was designated but nothing else (strain, etc) is  
> known.
>
> Again, I merely was pointing out what I had mentioned above.  I  
> wasn't criticizing you, Gabriel, or the methodology here.  Honest!
>
> chris


From cjfields at uiuc.edu  Tue Dec 19 04:41:16 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 18 Dec 2006 22:41:16 -0600
Subject: [Bioperl-l] Script taxonomy2tree version 1.4 crashed on
	110	species
In-Reply-To: <287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
References: <F5C5C9A8-D444-4994-9769-AC5DE68F4A39@lsi.upc.edu>	<68BD6738-F774-406B-B244-842E74DF4815@lsi.upc.edu>	<4577E4A2.5090303@sendu.me.uk>	<B290BEF7-81D6-4C0A-9EDA-348B8A75EEC8@lsi.upc.edu>	<4577EAAF.7030509@sendu.me.uk>	<0E425278-28C6-49EC-A80C-ACBB8F36E423@lsi.upc.edu>	<4577EFD3.7090904@sendu.me.uk>	<250E1BDB-87B1-4114-8A2C-B7122E727B2A@lsi.upc.edu>
	<C91DCC7B-E368-475D-B83A-AC301A49624B@uiuc.edu>
	<4586E8C4.6030306@sendu.me.uk>
	<63C1DC7D-2830-436A-BE95-7ECE3748C84D@uiuc.edu>
	<45871B23.8070103@sendu.me.uk>
	<CE808784-8068-44C5-82A8-BE852890E4DF@uiuc.edu>
	<458723D3.4010908@sendu.me.uk>
	<2638D8ED-A3B3-4EF8-978E-216C5F875D88@uiuc.edu>
	<287263A7-A84A-413E-AA9D-9258261A90C1@lsi.upc.edu>
Message-ID: <D72C19DB-B551-414E-96AF-113B32A34BCB@uiuc.edu>


On Dec 18, 2006, at 10:18 PM, Gabriel Valiente wrote:

> Thanks a lot for the prompt answer and follow-up discussion. I  
> think this turned out not to be a bug in the merge_lineage() code  
> but a direct consequence of building a phylogenetic tree instead of  
> a taxonomic tree, aka with internal node labels.
>
> In order to reconstruct the NCBI taxonomy for the set of species  
> present in a given phylogenetic tree, the only reasonable work- 
> around seems to be a first step of merging lineages and contracting  
> linear paths with the current implementation, followed by a second  
> step of restricting the given phylogenetic tree to the set of  
> species present in the obtained NCBI taxonomy. But this does not  
> affect the code for merge_lineage().
>
> Gabriel

I did notice one thing, though it's minor: if you use the option to  
retrieve the data from Entrez, a few species aren't found (even  
though they show up in a local taxonomy search).  I think both were  
E. coli strains.

chris


From DGroskreutz at twt.com  Tue Dec 19 07:00:40 2006
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Tue, 19 Dec 2006 01:00:40 -0600
Subject: [Bioperl-l] CN=Deb Groskreutz/OU=MSN/O=TWT is out of the office.
Message-ID: <OFEB7AC000.56E72ED8-ON86257249.002683B4-86257249.002683B4@twt.com>


I will be out of the office starting  12/18/2006 and will not return until
01/02/2007.


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From michael.watson at bbsrc.ac.uk  Tue Dec 19 12:20:56 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:20:56 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From michael.watson at bbsrc.ac.uk  Tue Dec 19 12:27:59 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 19 Dec 2006 12:27:59 -0000
Subject: [Bioperl-l] Problems with EMBL entries and fasta IDs?
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2E67F@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2E682@iahce2ksrv1.iah.bbsrc.ac.uk>

Sorry, problem solved.

Mick 

-----Original Message-----
From: michael watson (IAH-C) 
Sent: 19 December 2006 12:21
To: bioperl-l at lists.open-bio.org
Subject: Problems with EMBL entries and fasta IDs?

Hi

I'm using bioperl-1.4.  I did do a google search fro this but couldn't
find anything.  If this is fixed in 1.5.2 then forgive me.

I'm getting a warning:

MSG: No whitespace allowed in FASTA ID [unknown id]

When trying to convert from EMBL format to fasta.  The offending
sequence is CK234114:

ID   CK234114; SV 1; linear; mRNA; EST; VRT; 244 BP.
XX
AC   CK234114;
XX
DT   03-MAR-2004 (Rel. 79, Created)
DT   03-MAR-2004 (Rel. 79, Last updated, Version 1)
XX
DE   SB010002000A01 JUWNL1 Normalized Zebra Finch Juvenile Telencephalon
cDNA
DE   Library SB01 Taeniopygia guttata cDNA clone SB010002000A01 5', mRNA
DE   sequence.
Etc

Any advice?

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From roest216 at student.otago.ac.nz  Tue Dec 19 09:15:55 2006
From: roest216 at student.otago.ac.nz (Stephan Roessner)
Date: Tue, 19 Dec 2006 22:15:55 +1300
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>

Dear support team,

I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
gbrowse.
The installation seems to work (except of the test failures) but the
gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
of course it requires 1.52.

Is there a chance to find out what went wrong?

thanks a lot,
Stephan


From bix at sendu.me.uk  Tue Dec 19 15:12:39 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 15:12:39 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
Message-ID: <45880167.9010605@sendu.me.uk>

Stephan Roessner wrote:
> Dear support team,
> 
> I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> gbrowse.
> The installation seems to work (except of the test failures) but the
> gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> of course it requires 1.52.
> 
> Is there a chance to find out what went wrong?

Nothing went wrong with the Bioperl installation (well, expect there 
shouldn't have been any test failures - can you post those please?). 
gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
me exactly where you downloaded gbrowse from and how you went about 
installing it, and provide the exact, complete error message you got 
from it, I might be able help the authors fix the problem.

Or I'm pretty sure they can figure it our for themselves :)


From cjfields at uiuc.edu  Tue Dec 19 16:05:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 10:05:01 -0600
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>


On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:

> I really don't think the BioPerl version detection is wrong.  I  
> actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
> try reinstalling BioPerl and providing the --uninst 1 argument to  
> remove
> older versions of BioPerl:
>
>   sudo ./Build install --uninst 1
>
> Scott

Could having two Bioperl instances explain the test failures?  I'm  
not sure (maybe Sendu can answer this), but I would assume  
Module::Build uses the current working directory for test runs.

chris


From bix at sendu.me.uk  Tue Dec 19 17:02:34 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:02:34 +0000
Subject: [Bioperl-l] [Gmod-gbrowse]  problems installing bioperl
In-Reply-To: <8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<8D5C45A3-A90A-49D7-A7E7-888C977759AC@uiuc.edu>
Message-ID: <45881B2A.8060907@sendu.me.uk>

Chris Fields wrote:
> 
> On Dec 19, 2006, at 9:31 AM, Scott Cain wrote:
> 
>> I really don't think the BioPerl version detection is wrong.  I actually
>> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
>> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
>> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
>> have seen this happen when more than one BioPerl instance is installed
>> and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
>> try reinstalling BioPerl and providing the --uninst 1 argument to remove
>> older versions of BioPerl:
>>
>>   sudo ./Build install --uninst 1
>>
>> Scott
> 
> Could having two Bioperl instances explain the test failures?  I'm not 
> sure (maybe Sendu can answer this), but I would assume Module::Build 
> uses the current working directory for test runs.

It does, so that shouldn't be an issue for the test failures.


From ferraria at gmail.com  Tue Dec 19 16:40:05 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 17:40:05 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
Message-ID: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From bix at sendu.me.uk  Tue Dec 19 17:06:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 17:06:03 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166542310.6981.119.camel@localhost.localdomain>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>	
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
Message-ID: <45881BFB.7020008@sendu.me.uk>

Scott Cain wrote:
> I really don't think the BioPerl version detection is wrong.  I actually
> don't check Bio::Root::Version::VERSION in Makefile.PL, I check
> Bio::Graphics::Panel->api_version.  When it doesn't find the correct
> api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
> have seen this happen when more than one BioPerl instance is installed
> and `perl Makefile.PL` finds the wrong one first.

Yes, I saw that, which is why I thought I must be looking at something 
different to what the OP had installed.


> My suggestion is to try reinstalling BioPerl and providing the --uninst 1 argument to remove
> older versions of BioPerl:
> 
>   sudo ./Build install --uninst 1

My confusion is that he has definitely installed 1.5.2 and this version 
is being polled for its version number (by something!) and returning the 
correct '1.0050021', whilst the something expects '1.52'. Anyway, this 
can only be resolved if Stephan provides the real error message and its 
context.


From cjfields at uiuc.edu  Tue Dec 19 17:27:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 11:27:24 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>


On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:

> Hi all,
>
> I've just installed BioPerl 1.5.2 (devel) on a linux mandrake  
> machine with
> the cpan shell.
> I want to use the Bio::DB::EUtilities to retrieve data (id's) from  
> NCBI
> 'gene' database (first step of my pipeline).
>
> But the installation of this package doesn't seem to be correct :
> The simple example given on the documentation doesn't work. (this  
> one :
> http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
>
> Here is the error message I got :
> "Can't use an undefined value as an ARRAY reference at
> /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> In the UserAgent package, line 779 is in the private "_need_proxy"
> subroutine and corresponds to the code :    ...if (@{ $self-> 
> {'no_proxy'} })
> ...
>
> If I comment this line in the UserAgent package and the  
> corresponding "}",
> the example works. But obviously, I prefer to solve the problem in  
> a regular
> way :)
>
> Indeed, my computer accesses the internet via a http proxy and I  
> didn't tell
> this to BioPerl at any moment.
> As I read on the BioPerl Wiki site, I tried to configure an  
> $http_proxy
> environment variable but it still doesn't work.
>
> One last maybe important information is that I saw during the  
> installation
> that the tests 't/EUtilities' were skipped because of an unrevealed  
> reason.
>
>
> So finally I got two questions :
> 1. Is there somebody who can figure out what is my problem ?
> 2. At the moment, is the Bio::DB::EUtilities package really  
> efficient or
> using directly the NCBI eutilities with the LWP::Simple package  
> could be an
> good alternative ?
>
> Many thanks in advance,
> Best Regards,
> Anthony Ferrari

First things first: at the moment the BioPerl EUtilities interface is  
very experimental (as specifically outlined in the POD), so I can't  
really recommend it for production use until the API is cleaned up.   
However, I do appreciate any feedback or comments re:EUtilities (the  
reason it's out there in the 1.5.2 release).

You might check out this bug report, which relates directly to your  
issue:

http://bugzilla.open-bio.org/show_bug.cgi?id=2109

After I worked out the proxy issue Torsten got it working.  Let me  
know if this doesn't help or fix the problem.

chris


From cain at cshl.edu  Tue Dec 19 15:31:50 2006
From: cain at cshl.edu (Scott Cain)
Date: Tue, 19 Dec 2006 10:31:50 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <45880167.9010605@sendu.me.uk>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
Message-ID: <1166542310.6981.119.camel@localhost.localdomain>

I really don't think the BioPerl version detection is wrong.  I actually
don't check Bio::Root::Version::VERSION in Makefile.PL, I check
Bio::Graphics::Panel->api_version.  When it doesn't find the correct
api_version, it gives a warning the BioPerl 1.5.2 is not installed.  I
have seen this happen when more than one BioPerl instance is installed
and `perl Makefile.PL` finds the wrong one first.  My suggestion is to
try reinstalling BioPerl and providing the --uninst 1 argument to remove
older versions of BioPerl:

  sudo ./Build install --uninst 1

Scott


On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> Stephan Roessner wrote:
> > Dear support team,
> > 
> > I installed bioperl 1.5.2_100 on a ferdora machine to be able to use
> > gbrowse.
> > The installation seems to work (except of the test failures) but the
> > gbrowse installation tells me that BIO::pERL 1.0050021 is installed, but
> > of course it requires 1.52.
> > 
> > Is there a chance to find out what went wrong?
> 
> Nothing went wrong with the Bioperl installation (well, expect there 
> shouldn't have been any test failures - can you post those please?). 
> gbrowse simply defined its Bioperl requirement incorrectly. If you tell 
> me exactly where you downloaded gbrowse from and how you went about 
> installing it, and provide the exact, complete error message you got 
> from it, I might be able help the authors fix the problem.
> 
> Or I'm pretty sure they can figure it our for themselves :)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/67132cb3/attachment.sig>

From ferraria at gmail.com  Tue Dec 19 17:06:31 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Tue, 19 Dec 2006 18:06:31 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
Message-ID: <b2ec54b90612190906s2b4ddbf8g9b591372a85fdcd@mail.gmail.com>

Hi all,

I've just installed BioPerl 1.5.2 (devel) on a linux mandrake machine with
the cpan shell.
I want to use the Bio::DB::EUtilities to retrieve data (id's) from NCBI
'gene' database (first step of my pipeline).

But the installation of this package doesn't seem to be correct :
The simple example given on the documentation doesn't work. (this one :
http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)

Here is the error message I got :
"Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

In the UserAgent package, line 779 is in the private "_need_proxy"
subroutine and corresponds to the code :    ...if (@{ $self->{'no_proxy'} })
...

If I comment this line in the UserAgent package and the corresponding "}",
the example works. But obviously, I prefer to solve the problem in a regular
way :)

Indeed, my computer accesses the internet via a http proxy and I didn't tell
this to BioPerl at any moment.
As I read on the BioPerl Wiki site, I tried to configure an $http_proxy
environment variable but it still doesn't work.

One last maybe important information is that I saw during the installation
that the tests 't/EUtilities' were skipped because of an unrevealed reason.


So finally I got two questions :
1. Is there somebody who can figure out what is my problem ?
2. At the moment, is the Bio::DB::EUtilities package really efficient or
using directly the NCBI eutilities with the LWP::Simple package could be an
good alternative ?

Many thanks in advance,
Best Regards,
Anthony Ferrari


From stewarta at nmrc.navy.mil  Tue Dec 19 18:49:57 2006
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Tue, 19 Dec 2006 13:49:57 -0500
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
Message-ID: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>

I see that Bio::Tools::Glimmer documentation clearly states that this  
module is intended for use with GlimmerM (eukaryotic version) only.   
I am wondering if anyone can recall any talk about adopting  
Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?   
I've searched the list history with little luck other than someone  
else  asking a similar question.

If not, does anyone have any thoughts on how difficult it might be to  
implement support for glimmer2/3 result parsing?  Perhaps just a  
matter of editing the _parse_predictions method?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From rvosa at sfu.ca  Tue Dec 19 18:53:47 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 10:53:47 -0800
Subject: [Bioperl-l] problems installing bioperl
Message-ID: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/276348b7/attachment.ksh>

From cjfields at uiuc.edu  Tue Dec 19 19:31:17 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 19 Dec 2006 13:31:17 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer for glimmer2/3
In-Reply-To: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
References: <4FDC0EAE-0E93-42A6-AFCA-2B2DFB6F7E8D@nmrc.navy.mil>
Message-ID: <71E04575-DFD2-4F5A-B268-493D3246CBFA@uiuc.edu>


On Dec 19, 2006, at 12:49 PM, Andrew Stewart wrote:

> I see that Bio::Tools::Glimmer documentation clearly states that this
> module is intended for use with GlimmerM (eukaryotic version) only.
> I am wondering if anyone can recall any talk about adopting
> Bio::Tools::Glimmer for Glimmer2 / Glimmer3 (prokaryotic version)?
> I've searched the list history with little luck other than someone
> else  asking a similar question.

There is a thread here:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12546/ 
focus=12546

> If not, does anyone have any thoughts on how difficult it might be to
> implement support for glimmer2/3 result parsing?  Perhaps just a
> matter of editing the _parse_predictions method?

It depends on how different the various Glimmer formats are; I'll  
have to look at the ones Torsten added in CVS.  You could always try  
modifying Bio::Tools::Glimmer to parse Glimmer2/3 and GlimmerM  
reports, but based on the mail list thread above it may not be so  
straightforward.

chris


From MEC at stowers-institute.org  Tue Dec 19 19:57:48 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 19 Dec 2006 13:57:48 -0600
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
Message-ID: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>

Lincoln and fellow Bio::DB::SeqFeature travelers,

I find that using bp_seqfeature_load.PLS to load subfeatures of genes
already loaded using bp_seqfeature_load.PLS fails with 

------------- EXCEPTION  -------------
MSG: FBgn0017545 doesn't have a primary id
STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
STACK toplevel
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
ad.PLS:76

Where FBgn0017545 is the ID of a gene previously loaded.

I am unsure how to remedy my situation and welcome any advise on correct
or improved approach to my problem.

Here's some detail if it helps.  I am developing a pipeline to design a
microarray probes capable of distinguishing among splice variants in
drosophila (using latest Flybase dmel_r5.1 annotation).  So I

1) load a filtered selection of Flybase annotation using
bp_seqfeature_load.  (for testing purposes, I am using a single gene's
worth of annotation, FBgn0017545.gff, attached).  This is done as
follows:

	> bp_seqfeature_load.PLS  --create FBgn0017545.gff 

2) analyze all the genes in the database, and create GFF3 output each
feature of which has a 'Parent' that is a previously loaded gene (i.e.
FBgn0017545).  (These features represent the unique introns, splice
sites, and exonic design targets. Output of this analysis,
FBgn0017545_matd.gff, is also attached)

3) load these analysis results into the same database, as follows:

	> bp_seqfeature_load.PLS          FBgn0017545_matd.gff

It is at this point that I get the above error.

However, I don't get any error and the data loads fine if I load the two
files together, as follows:

	> bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
FBgn0017545_matd.gff)

So, I suspect that either I am misunderstanding when/how to use
bp_seqfeature_load.PLS or else this use case has not yet arisen and must
be provided for somehow.

I am running against bioperl-live

Thanks for your thoughts and assistance,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

From Kevin.M.Brown at asu.edu  Tue Dec 19 21:46:19 2006
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Tue, 19 Dec 2006 14:46:19 -0700
Subject: [Bioperl-l] Bio::SimpleAlign
Message-ID: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>

I'm working on a script that plays around with alignments of sequences
and one of the things I noticed is that the code for the match method
does not seem to actually use the start/end information when creating
the match between objects in the alignment.  Maybe I'm misunderstanding
what the alignment is supposed to hold in terms of sequence.  The
alignment objects I build up are based on the sequence of a gene and the
sequences of the primers that amplify that gene.

$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[0]->seq(),
				-id    => $seq[0]->id(),
				-start => $start,
				-end => $start + $seq[0]->length() - 1,
				-strand => 1
			 )
);
$alignments{$gene->id()}->add_seq(
				new Bio::LocatableSeq(
				-seq   => $seq[1]->seq(),
				-id    => $seq[1]->id(),
				-start => $stop,
				-end => $stop + $seq[1]->length() - 1,
				-strand => -1
				)
);

So, you can see I input a start and stop point for the primer, but when
I use the match function all it does is match the first character of the
gene sequence to the first char of the primer sequences, then the second
gene char to the second in each primer, etc...  This doesn't seem to fit
with the documentation and seems odd that there would be holders for the
start/stop points and not use them when doing things like matching of
sequences in an alignment.


From bix at sendu.me.uk  Tue Dec 19 22:01:22 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Dec 2006 22:01:22 +0000
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
References: <200612191853.kBJIrlW3026344@rm-rstar.sfu.ca>
Message-ID: <45886132.7050505@sendu.me.uk>

Rutger Vos wrote:
> Aren't 1.5.2_100 and 1.0050021 supposed to be equivalent in in this weird
> version-string-translation way that makes 5.5 and 5.005 equivalent also?

Yes, 1.5.2_100 and 1.0050021 are equivalent. The equivalent of 5.5 is 
5.500 however.


From lstein at cshl.edu  Tue Dec 19 21:58:24 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 19 Dec 2006 16:58:24 -0500
Subject: [Bioperl-l] bp_seqfeature_load /
	Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting
	Flybase annotation
In-Reply-To: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E506E06492@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0612191358t4764bfe0g601cd22d09132e55@mail.gmail.com>

Hi Malcom,

Your second guess was right. The use case of augmenting an existing gene
with additional splice forms isn't provided for. You can get the
functionality by making direct calls to Bio::DB::SeqFeature::Store methods:

my @genes = $db->get_features_by_name('FBgn0017545');
@genes == 1 or die "Didn't get exactly one gene";
my $parent = $genes[0];

my $parent = $genes[0];
my $chr    = $parent->seq_id;
my $start  = $parent->start;
my $end    = $parent->end;
my $strand = $parent->strand;

my $new_splice_form = $db->new_feature(-primary_tag => 'mRNA',
                       -source      => 'added',
                       -seq_id   => '4r',
                       -strand   => $strand,
                       -start    => $start+10,
                       -end      => $end,
                       );
$parent->add_SeqFeature($new_splice_form);

for my $pos ([$start+10,$start+100],[$start+200,$end]) {
  my ($e_start,$e_end) = @$pos;
  my $exon = Bio::DB::SeqFeature->new(-primary_tag => 'exon',
                                      -store       => $db,
                      -seq_id      => '4r',
                      -strand     => $strand,
                      -start       => $e_start,
                      -end         => $e_end);
  $new_splice_form->add_SeqFeature($exon);
}

I found a bug in updating the seqfeature database when I wrote this script,
so you'll have to get the latest biperl live. I think you can use this to
write a splice form updating script.

In order to support the idea of adding new splice forms to an existing gene
using the GFF3 format, I will have to either modify the loader, or write a
separate script (probably better to do the latter). It shouldn't be hard if
you'd like to give it a try.

Lincoln

On 12/19/06, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln and fellow Bio::DB::SeqFeature travelers,
>
> I find that using bp_seqfeature_load.PLS to load subfeatures of genes
> already loaded using bp_seqfeature_load.PLS fails with
>
> ------------- EXCEPTION  -------------
> MSG: FBgn0017545 doesn't have a primary id
> STACK
> Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load_fh
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
> STACK Bio::DB::SeqFeature::Store::GFF3Loader::load
> /home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
> STACK toplevel
> /home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo
> ad.PLS:76
>
> Where FBgn0017545 is the ID of a gene previously loaded.
>
> I am unsure how to remedy my situation and welcome any advise on correct
> or improved approach to my problem.
>
> Here's some detail if it helps.  I am developing a pipeline to design a
> microarray probes capable of distinguishing among splice variants in
> drosophila (using latest Flybase dmel_r5.1 annotation).  So I
>
> 1) load a filtered selection of Flybase annotation using
> bp_seqfeature_load.  (for testing purposes, I am using a single gene's
> worth of annotation, FBgn0017545.gff, attached).  This is done as
> follows:
>
>         > bp_seqfeature_load.PLS  --create FBgn0017545.gff
>
> 2) analyze all the genes in the database, and create GFF3 output each
> feature of which has a 'Parent' that is a previously loaded gene (i.e.
> FBgn0017545).  (These features represent the unique introns, splice
> sites, and exonic design targets. Output of this analysis,
> FBgn0017545_matd.gff, is also attached)
>
> 3) load these analysis results into the same database, as follows:
>
>         > bp_seqfeature_load.PLS          FBgn0017545_matd.gff
>
> It is at this point that I get the above error.
>
> However, I don't get any error and the data loads fine if I load the two
> files together, as follows:
>
>         > bp_seqfeature_load.PLS --create <(cat FBgn0017545.gff
> FBgn0017545_matd.gff)
>
> So, I suspect that either I am misunderstanding when/how to use
> bp_seqfeature_load.PLS or else this use case has not yet arisen and must
> be provided for somehow.
>
> I am running against bioperl-live
>
> Thanks for your thoughts and assistance,
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From rvosa at sfu.ca  Wed Dec 20 04:23:20 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Tue, 19 Dec 2006 20:23:20 -0800
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
Message-ID: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061219/17ec7ff3/attachment.ksh>

From cjfields at uiuc.edu  Wed Dec 20 06:16:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 00:16:47 -0600
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>


On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:

> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the  
> NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more  
> loosely
> an OTU).
>
> The object would primarily function as a way to relate nodes in  
> trees to
> sequences in an alignment (a foreign key that both nodes and  
> sequences refer
> to), and secondarily as the keeper of the canonical name of the  
> OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node  
> named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer  
> to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).

Alignment (SimpleAlign) objects contain Bio::LocatableSeq sequence  
objects; at the moment LocatableSeqs don't store their own annotation  
but they could easily be made or subclassed to be AnnotatableI (i.e.  
they can store annotation collections).  I recently made SimpleAlign  
Annotatable; Jason has also made SimpleAlign implement  
FeatureHolderI, so alignments can store SeqFeatures as well; he may  
have his own designs here.

There may be a wide variety of ways to go about this.  I would  
probably do the following (bear in mind I'm a microbiologist, not a  
computer scientist).  If one could add trees as annotation to the  
alignment (i.e. if trees could be Annotation objects and kept in the  
SimpleAlign's annotation collection), and each sequence in the  
alignment contained reference to a node object of the tree (i.e. if  
Bio::Taxon/Bio::Species objects could also be Annotation objects, but  
kept in a LocatableSeq annotation collection), both could refer to  
the same node object.  This may not be exactly what you want, but  
maybe it's close?

SimpleAlign->AnnoColln->Tree->OTU(Nodes)
    \----->LocSeqs-->AnnoColln-----/

I suppose this could also be done with Seqfeatures...

> I was thinking that a (possibly expanded) Bio::Species might work  
> if there
> was some sensible way of appending references to node and sequence  
> objects
> to it (or otherwise associate them with each other), but I am  
> writing *to
> solicit any and all suggestions*. I am looking for something  
> similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos

Sendu would be the best one to speak about Bio::Taxon and  
Bio::Species and may have some ideas on the above.  The current plan  
was to deprecate Bio::Species, but who knows?

chris


From heikki at sanbi.ac.za  Wed Dec 20 10:25:08 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 20 Dec 2006 12:25:08 +0200
Subject: [Bioperl-l] Bio::SimpleAlign
In-Reply-To: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
References: <1A4207F8295607498283FE9E93B775B40270F4E9@EX02.asurite.ad.asu.edu>
Message-ID: <200612201225.08862.heikki@sanbi.ac.za>

Kevin,

Sequences that are added to the alignment are supposed to be *aligned*. 
SimpleAlign does not do it for you. It seems to me that you are adding 
sequences like this:

nnnnnnnnnnnnnnnnnnnn  1 - 20, "a short gene" 
nnnnnn               21 - 26 "a short primer after the gene"

when you should be doing this

nnnnnnnnnnnnnnnnnnnn        1 - 20, "a short gene" 
--------------------nnnnnn 21 - 26 "a short primer after the gene"

Note that the default way of displaying names in SimpleAlign 
is "name/start-end". The name usually are expected to refer to the sequence 
from which this subsequence is derived from. The displayname does not change 
if you add gaps.


Yours,
	-Heikki


On Tuesday 19 December 2006 23:46, Kevin Brown wrote:
> I'm working on a script that plays around with alignments of sequences
> and one of the things I noticed is that the code for the match method
> does not seem to actually use the start/end information when creating
> the match between objects in the alignment.  Maybe I'm misunderstanding
> what the alignment is supposed to hold in terms of sequence.  The
> alignment objects I build up are based on the sequence of a gene and the
> sequences of the primers that amplify that gene.
>
> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[0]->seq(),
> 				-id    => $seq[0]->id(),
> 				-start => $start,
> 				-end => $start + $seq[0]->length() - 1,
> 				-strand => 1
> 			 )
> );

If your sequence does not contain gaps and the numbering starts from one, you 
can let the object handle start/stop:

my $a = new Bio::LocatableSeq(
      -seq   => 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA',
      -id    => 'A00001',
      -strand => 1
}


> $alignments{$gene->id()}->add_seq(
> 				new Bio::LocatableSeq(
> 				-seq   => $seq[1]->seq(),
> 				-id    => $seq[1]->id(),
> 				-start => $stop,
> 				-end => $stop + $seq[1]->length() - 1,
> 				-strand => -1
> 				)
> );
>
> So, you can see I input a start and stop point for the primer, but when
> I use the match function all it does is match the first character of the
> gene sequence to the first char of the primer sequences, then the second
> gene char to the second in each primer, etc...  This doesn't seem to fit
> with the documentation and seems odd that there would be holders for the
> start/stop points and not use them when doing things like matching of
> sequences in an alignment.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From ferraria at gmail.com  Wed Dec 20 11:04:16 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 12:04:16 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
Message-ID: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>

On 19/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 19, 2006, at 10:40 AM, Anthony Ferrari wrote:
>
> > Hi all,
> >
> > I've just installed BioPerl 1.5.2 (devel) on a linux mandrake
> > machine with
> > the cpan shell.
> > I want to use the Bio::DB::EUtilities to retrieve data (id's) from
> > NCBI
> > 'gene' database (first step of my pipeline).
> >
> > But the installation of this package doesn't seem to be correct :
> > The simple example given on the documentation doesn't work. (this
> > one :
> > http://doc.bioperl.org/bioperl-live/Bio/DB/EUtilities.html#SYNOPSIS)
> >
> > Here is the error message I got :
> > "Can't use an undefined value as an ARRAY reference at
> > /usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > In the UserAgent package, line 779 is in the private "_need_proxy"
> > subroutine and corresponds to the code :    ...if (@{ $self->
> > {'no_proxy'} })
> > ...
> >
> > If I comment this line in the UserAgent package and the
> > corresponding "}",
> > the example works. But obviously, I prefer to solve the problem in
> > a regular
> > way :)
> >
> > Indeed, my computer accesses the internet via a http proxy and I
> > didn't tell
> > this to BioPerl at any moment.
> > As I read on the BioPerl Wiki site, I tried to configure an
> > $http_proxy
> > environment variable but it still doesn't work.
> >
> > One last maybe important information is that I saw during the
> > installation
> > that the tests 't/EUtilities' were skipped because of an unrevealed
> > reason.
> >
> >
> > So finally I got two questions :
> > 1. Is there somebody who can figure out what is my problem ?
> > 2. At the moment, is the Bio::DB::EUtilities package really
> > efficient or
> > using directly the NCBI eutilities with the LWP::Simple package
> > could be an
> > good alternative ?
> >
> > Many thanks in advance,
> > Best Regards,
> > Anthony Ferrari
>
> First things first: at the moment the BioPerl EUtilities interface is
> very experimental (as specifically outlined in the POD), so I can't
> really recommend it for production use until the API is cleaned up.
> However, I do appreciate any feedback or comments re:EUtilities (the
> reason it's out there in the 1.5.2 release).
>
> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>


I carefully read this bug but that doesn't help because this has already
been modified in the now given GenericWebDBI.pm
So my problem does not come from a deep recursion loop.

As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w
t/EUtilities.t " to see what's really happening.
And actually, all tests are skipped because of the same message error
-> "Can't use an undefined value as an ARRAY reference at
/usr/lib/perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."

***
I tried the same command with the modified LWP::UserAgent package (which
means I comment the line 779 and the corresponding '}') and all 453 tests
passed.
But not always. I made the tests several times and  it often failed. And
always on a test called "eXXX->cookie->cookie() query key" (ending with
query key). In those cases, I got back a html message indicating that the
error was thrown by the internal sever of NCBI. So I guess that sometimes it
is just NCBI server fault (internal problem), and BioPerl is not implied..
But once more, I comment a line from a basic package so it is a bit
hazardous.
***

tony - a little bit lost.


From smane at vbi.vt.edu  Tue Dec 19 19:46:56 2006
From: smane at vbi.vt.edu (Shrinivasrao P. Mane)
Date: Tue, 19 Dec 2006 14:46:56 -0500
Subject: [Bioperl-l] Using Muscle parameter within bioperl
Message-ID: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>

Hi,
I need to run muscle using bioperl. This is how I do it in command line.

muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet

I used the following in perl script

my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');

The program runs and produces the result file but it doesn't create a  
log file nor does it stop sending output to STDOUT (-quiet).
Could anybody help me with this?
Thanks
Mane


From cjfields at uiuc.edu  Wed Dec 20 14:09:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 08:09:56 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
Message-ID: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
>
>
> I carefully read this bug but that doesn't help because this has  
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
>
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/ 
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/ 
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package  
> (which means I comment the line 779 and the corresponding '}') and  
> all 453 tests passed.
> But not always. I made the tests several times and  it often  
> failed. And always on a test called "eXXX->cookie->cookie() query  
> key" (ending with query key). In those cases, I got back a html  
> message indicating that the error was thrown by the internal sever  
> of NCBI. So I guess that sometimes it is just NCBI server fault  
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit  
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies.

EUtilities is set up to check for an env. proxy and also take a set  
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy  
to say this was a bug in LWP, but I think the problem is that  
something is undefined (i.e. an env. variable), or username/password.

 From the bug report, Torsten set his proxy variables using the  
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy.
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference.   
After the recursion fix, I'm assuming he made no changes to the env.  
settings, and according to the bug everything was fine (is that  
correct Tortsen?).

Also LWP::UserAgent has this:

--------------------------------------
"Load proxy settings from *_proxy environment variables. You might  
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy=http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these  
environment variables.

On systems with case insensitive environment variables there exists a  
name clash between the CGI environment variables and the HTTP_PROXY  
environment variable normally picked up by env_proxy(). Because of  
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY  
environment variable can be used instead."
--------------------------------------

chris


From bix at sendu.me.uk  Wed Dec 20 14:08:16 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:08:16 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <458943D0.10400@sendu.me.uk>

Shrinivasrao P. Mane wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
> 
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
> 
> I used the following in perl script
> 
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>  
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
> 
> The program runs and produces the result file but it doesn't create a  
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?

The Muscle arguments don't take dashed args. To make switches active you 
need to set them to some true value. So (-verbose => 1, quiet => 1, log 
=> 'inv.log'). Verbose may not do what you want since it is both a 
Bioperl option and a Muscle option; if you want the latter try using 
verbose => 1.


From bix at sendu.me.uk  Wed Dec 20 14:51:33 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 14:51:33 +0000
Subject: [Bioperl-l] suggestions for suitable 'taxon' object
In-Reply-To: <4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
	<4185E59B-C0DA-49B8-8D71-11183A091FBF@uiuc.edu>
Message-ID: <45894DF5.1060503@sendu.me.uk>

Chris Fields wrote:
> On Dec 19, 2006, at 10:23 PM, Rutger Vos wrote:
> 
>> Hi all,
>> 
>> I am looking for a bioperl object that can be abused to function as
>> a suitable 'taxon' object, where I mean 'taxon' as understood by
>> the NEXUS file format (i.e. not strictly an entity from a taxonomy,
>> but more loosely an OTU).
>> 
>> The object would primarily function as a way to relate nodes in 
>> trees to sequences in an alignment (a foreign key that both nodes
>> and sequences refer to), and secondarily as the keeper of the
>> canonical name of the OTU, such that a sequence named
>> 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo sapiens
>> (constrained monophyly)' can still be understood to refer to the 
>> same thing - the OTU 'Homo sapiens sapiens' (for example).

I haven't had time to give your suggestions consideration, but I can say 
that I'm having to do the same thing for a bioperl-run module and my 
work-around is simply to set a custom name on my Bio::Taxon objects. To 
explain, I have the benefit that my tree is made up of Bio::Taxon 
objects, so I call $taxon->name('seq_id', $seq->id). Then when I want to 
know which of my sequences corresponds to a particular taxon, I work out 
which of them has the id given by shift @{$taxon->name('seq_id')}.

Hardly ideal, but it works for now.


>> I was thinking that a (possibly expanded) Bio::Species might work
>>  if there was some sensible way of appending references to node and
>> sequence objects to it (or otherwise associate them with each
>> other), but I am writing *to solicit any and all suggestions*. I am
>> looking for something similar to Bio::Phylo::Taxa::Taxon.
>
> Sendu would be the best one to speak about Bio::Taxon and 
> Bio::Species and may have some ideas on the above.  The current plan
> was to deprecate Bio::Species, but who knows?

Given that we do plan to deprecate Bio::Species, I'd resist the 
temptation to expand on it. Use Bio::Taxon as a base if it has stuff you 
need, or base straight from Bio::Tree::Node if not.


From ferraria at gmail.com  Wed Dec 20 15:40:34 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 16:40:34 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
References: <b2ec54b90612190840r24fe1aa5ncb9c9def040aed49@mail.gmail.com>
	<6365ACFD-7F5A-4EF1-97EA-BB53A58B9B4D@uiuc.edu>
	<b2ec54b90612200304r56e1ba5o87963494875c1c43@mail.gmail.com>
	<13761416-E03F-46E7-BB43-E5FDA7F9C281@uiuc.edu>
Message-ID: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>

Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!)


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
>
> > You might check out this bug report, which relates directly to your
> > issue:
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> >
> > After I worked out the proxy issue Torsten got it working.  Let me
> > know if this doesn't help or fix the problem.
> >
> > chris
> >
> >
> > I carefully read this bug but that doesn't help because this has
> > already been modified in the now given GenericWebDBI.pm
> > So my problem does not come from a deep recursion loop.
> >
> > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > EUtilities.t " to see what's really happening.
> > And actually, all tests are skipped because of the same message error
> > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> >
> > ***
> > I tried the same command with the modified LWP::UserAgent package
> > (which means I comment the line 779 and the corresponding '}') and
> > all 453 tests passed.
> > But not always. I made the tests several times and  it often
> > failed. And always on a test called "eXXX->cookie->cookie() query
> > key" (ending with query key). In those cases, I got back a html
> > message indicating that the error was thrown by the internal sever
> > of NCBI. So I guess that sometimes it is just NCBI server fault
> > (internal problem), and BioPerl is not implied..
> > But once more, I comment a line from a basic package so it is a bit
> > hazardous.
> > ***
> >
> > tony - a little bit lost.
>
> I'm cc'ing Torsten as he has a bit more experience with proxies.
>
> EUtilities is set up to check for an env. proxy and also take a set
> proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> to say this was a bug in LWP, but I think the problem is that
> something is undefined (i.e. an env. variable), or username/password.
>
> From the bug report, Torsten set his proxy variables using the
> following:
>
> --------------------------------------
> "Note: I am behind an _authenticating_ proxy.
> My $http_proxy and $HTTP_PROXY are both set to
> http://USER:PASS at proxy.monash.edu.au:80/"
> --------------------------------------
>
> Note the lowercase for $http_proxy, which can make a difference.
> After the recursion fix, I'm assuming he made no changes to the env.
> settings, and according to the bug everything was fine (is that
> correct Tortsen?).
>
> Also LWP::UserAgent has this:
>
> --------------------------------------
> "Load proxy settings from *_proxy environment variables. You might
> specify proxies like this (sh-syntax):
>
>        gopher_proxy=http://proxy.my.place/
>        wais_proxy=http://proxy.my.place/
>        no_proxy="localhost,my.domain"
>        export gopher_proxy wais_proxy no_proxy
>
>      csh or tcsh users should use the setenv command to define these
> environment variables.
>
> On systems with case insensitive environment variables there exists a
> name clash between the CGI environment variables and the HTTP_PROXY
> environment variable normally picked up by env_proxy(). Because of
> this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> environment variable can be used instead."
> --------------------------------------
>
> chris
>


From cjfields at uiuc.edu  Wed Dec 20 16:10:48 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 10:10:48 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
Message-ID: <007901c72451$6ad540a0$15327e82@pyrimidine>

Just to clarify: does it work it you don't have any proxy env. settings?
 
chris


  _____  

From: Anthony Ferrari [mailto:ferraria at gmail.com] 
Sent: Wednesday, December 20, 2006 9:41 AM
To: Chris Fields
Cc: bioperl-l List; Torsten Seemann
Subject: Re: [Bioperl-l] Problem with : EUtilities - Proxy


Defining a "no_proxy" environment variable in my '.bashrc' file solved my
problem. I set it to "localhost".

It indeed corresponds to the line...       [    ...if (@{
$self->{'no_proxy'} }) ...    ]   (I guess!) 


I really don't know why we are compelled to do this, but let's say that's
the way it is.

It works now !

Thanks a lot.

Tony


On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote: 


On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:

> You might check out this bug report, which relates directly to your
> issue:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2109
>
> After I worked out the proxy issue Torsten got it working.  Let me
> know if this doesn't help or fix the problem.
>
> chris
> 
>
> I carefully read this bug but that doesn't help because this has
> already been modified in the now given GenericWebDBI.pm
> So my problem does not come from a deep recursion loop.
> 
> As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> EUtilities.t " to see what's really happening.
> And actually, all tests are skipped because of the same message error 
> -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
>
> ***
> I tried the same command with the modified LWP::UserAgent package 
> (which means I comment the line 779 and the corresponding '}') and
> all 453 tests passed.
> But not always. I made the tests several times and  it often
> failed. And always on a test called "eXXX->cookie->cookie() query 
> key" (ending with query key). In those cases, I got back a html
> message indicating that the error was thrown by the internal sever
> of NCBI. So I guess that sometimes it is just NCBI server fault 
> (internal problem), and BioPerl is not implied..
> But once more, I comment a line from a basic package so it is a bit
> hazardous.
> ***
>
> tony - a little bit lost.

I'm cc'ing Torsten as he has a bit more experience with proxies. 

EUtilities is set up to check for an env. proxy and also take a set
proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
to say this was a bug in LWP, but I think the problem is that
something is undefined ( i.e. an env. variable), or username/password.

>From the bug report, Torsten set his proxy variables using the
following:

--------------------------------------
"Note: I am behind an _authenticating_ proxy. 
My $http_proxy and $HTTP_PROXY are both set to
http://USER:PASS at proxy.monash.edu.au:80/"
--------------------------------------

Note the lowercase for $http_proxy, which can make a difference. 
After the recursion fix, I'm assuming he made no changes to the env.
settings, and according to the bug everything was fine (is that
correct Tortsen?).

Also LWP::UserAgent has this:

-------------------------------------- 
"Load proxy settings from *_proxy environment variables. You might
specify proxies like this (sh-syntax):

       gopher_proxy=http://proxy.my.place/
       wais_proxy= http://proxy.my.place/
       no_proxy="localhost,my.domain"
       export gopher_proxy wais_proxy no_proxy

     csh or tcsh users should use the setenv command to define these 
environment variables.

On systems with case insensitive environment variables there exists a
name clash between the CGI environment variables and the HTTP_PROXY
environment variable normally picked up by env_proxy(). Because of 
this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
environment variable can be used instead."
--------------------------------------

chris


From ferraria at gmail.com  Wed Dec 20 16:59:49 2006
From: ferraria at gmail.com (Anthony Ferrari)
Date: Wed, 20 Dec 2006 17:59:49 +0100
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <007901c72451$6ad540a0$15327e82@pyrimidine>
References: <b2ec54b90612200740x49b3d9d8qa8c01569b63cbdc4@mail.gmail.com>
	<007901c72451$6ad540a0$15327e82@pyrimidine>
Message-ID: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>

First, I got a $http_proxy env. variable automatically defined by the
BioPerl installation (I don't define and export it in my .bash_profile).
So when I'm logging in,             $http_proxy=http://ip_adress:port/

Next step :  two solutions :
1) defining an $no_proxy env.variable in my .bash_profile.
It can be set to 'whatever'.

2) If I do not define '$no_proxy'; to make it work, I must call the
no_proxy() method on each Bio::DB::EUtilities object I create before I can
call the get_response() method on it.

(The bug is in the 'get_response' call)

And finally without 1) or 2) it doesn't work.

Tony

On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
>
>  Just to clarify: does it work it you don't have any proxy env. settings?
>
One thing I didn't point out previously is that Bio::DB::GenericWebDBI
> inherits LWP::UserAgent.  You should be able to use $eutil->no_proxy()
> instead of setting it in your env.
> chris
>
>  ------------------------------
> *From:* Anthony Ferrari [mailto:ferraria at gmail.com]
> *Sent:* Wednesday, December 20, 2006 9:41 AM
> *To:* Chris Fields
> *Cc:* bioperl-l List; Torsten Seemann
> *Subject:* Re: [Bioperl-l] Problem with : EUtilities - Proxy
>
> Defining a "no_proxy" environment variable in my '.bashrc' file solved my
> problem. I set it to "localhost".
>
> It indeed corresponds to the line...       [    ...if (@{
> $self->{'no_proxy'} }) ...    ]   (I guess!)
>
>
> I really don't know why we are compelled to do this, but let's say that's
> the way it is.
>
> It works now !
>
> Thanks a lot.
>
> Tony
>
>
>
>
> On 20/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> >
> > On Dec 20, 2006, at 5:04 AM, Anthony Ferrari wrote:
> >
> > > You might check out this bug report, which relates directly to your
> > > issue:
> > >
> > > http://bugzilla.open-bio.org/show_bug.cgi?id=2109
> > >
> > > After I worked out the proxy issue Torsten got it working.  Let me
> > > know if this doesn't help or fix the problem.
> > >
> > > chris
> > >
> > >
> > > I carefully read this bug but that doesn't help because this has
> > > already been modified in the now given GenericWebDBI.pm
> > > So my problem does not come from a deep recursion loop.
> > >
> > > As Torsten did, I tried the command  " BIOPERLDEBUG=1 perl -I. -w t/
> > > EUtilities.t " to see what's really happening.
> > > And actually, all tests are skipped because of the same message error
> > > -> "Can't use an undefined value as an ARRAY reference at /usr/lib/
> > > perl5/site_perl/5.8.7/LWP/UserAgent.pm line 779."
> > >
> > > ***
> > > I tried the same command with the modified LWP::UserAgent package
> > > (which means I comment the line 779 and the corresponding '}') and
> > > all 453 tests passed.
> > > But not always. I made the tests several times and  it often
> > > failed. And always on a test called "eXXX->cookie->cookie() query
> > > key" (ending with query key). In those cases, I got back a html
> > > message indicating that the error was thrown by the internal sever
> > > of NCBI. So I guess that sometimes it is just NCBI server fault
> > > (internal problem), and BioPerl is not implied..
> > > But once more, I comment a line from a basic package so it is a bit
> > > hazardous.
> > > ***
> > >
> > > tony - a little bit lost.
> >
> > I'm cc'ing Torsten as he has a bit more experience with proxies.
> >
> > EUtilities is set up to check for an env. proxy and also take a set
> > proxy with $agent->proxy() (see GenericWebDBI POD).  It would be easy
> > to say this was a bug in LWP, but I think the problem is that
> > something is undefined ( i.e. an env. variable), or username/password.
> >
> > From the bug report, Torsten set his proxy variables using the
> > following:
> >
> > --------------------------------------
> > "Note: I am behind an _authenticating_ proxy.
> > My $http_proxy and $HTTP_PROXY are both set to
> > http://USER:PASS at proxy.monash.edu.au:80/"
> > --------------------------------------
> >
> > Note the lowercase for $http_proxy, which can make a difference.
> > After the recursion fix, I'm assuming he made no changes to the env.
> > settings, and according to the bug everything was fine (is that
> > correct Tortsen?).
> >
> > Also LWP::UserAgent has this:
> >
> > --------------------------------------
> > "Load proxy settings from *_proxy environment variables. You might
> > specify proxies like this (sh-syntax):
> >
> >        gopher_proxy=http://proxy.my.place/
> >        wais_proxy= http://proxy.my.place/
> >        no_proxy="localhost,my.domain"
> >        export gopher_proxy wais_proxy no_proxy
> >
> >      csh or tcsh users should use the setenv command to define these
> > environment variables.
> >
> > On systems with case insensitive environment variables there exists a
> > name clash between the CGI environment variables and the HTTP_PROXY
> > environment variable normally picked up by env_proxy(). Because of
> > this HTTP_PROXY is not honored for CGI scripts. The CGI_HTTP_PROXY
> > environment variable can be used instead."
> > --------------------------------------
> >
> > chris
> >
>
>


From cjfields at uiuc.edu  Wed Dec 20 18:28:09 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 12:28:09 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <b2ec54b90612200859w225df7qc35f1060f04eb452@mail.gmail.com>
Message-ID: <000301c72464$9a12a070$15327e82@pyrimidine>


> First, I got a $http_proxy env. variable automatically 
> defined by the BioPerl installation (I don't define and 
> export it in my .bash_profile).
> So when I'm logging in,             $http_proxy=http://ip_adress:port/

BioPerl can't permanently set any env. variables out of the box since that
would mean modifying your local .bash_profile or the system profile.  If
you're a user on a system where you're not the sysadmin, then it's more
likely the sysadmin has set up user accounts with an already-defined
$http_proxy variable in the system .bash_profile (which is passed on to all
users).  

The problem I can see (going by what you have above) is there is no
username/password defined, only the address (IP:Port).  I am assuming LWP is
expecting some form of authentication when a proxy is env. defined w/o
username/password included.  If so, you'll have to supply those yourself,
either by redefining $http_proxy to include it in your local .bash_profile,

export $http_proxy='http://USER:PASS at proxy.monash.edu.au:80/'

by using $agent->proxy() for including all proxy information, or by using
$agent->authentication() so that a proxy can authorize any outgoing/incoming
requests.  The first may be preferrable if you are able to do so since you
wouldn't have to authenticate every agent.

Note that this would also explain why you had an LWP problem with an
undefined array ref: the LWP agent is likely expecting some form of
authentication, probably in the form [username, password], if a proxy env.
variable is found.

> Next step :  two solutions :
> 1) defining an $no_proxy env.variable in my .bash_profile.
> It can be set to 'whatever'.
> 
> 2) If I do not define '$no_proxy'; to make it work, I must call the
> no_proxy() method on each Bio::DB::EUtilities object I create 
> before I can call the get_response() method on it.
> 
> (The bug is in the 'get_response' call)

If you mean when the request is calling proxy_authorization_basic(), that's
not a bug.  If we didn't authorize then it likely wouldn't work for properly
set up proxies (Torsten's worked).  Note that it's supposed to be passing a
username/password from $self->authentication().  

The fact that you can set $no_proxy to anything suggests there is no proxy
in place.  
 
> And finally without 1) or 2) it doesn't work.
> 
> Tony

We can't guarantee that defining no_proxy will always work on your system,
either.  It's possible a proxy was added systemwide but a firewall hasn't
been put in place yet; once it goes up and all requests need to be
authorized, then you'll run into problems again.  Conversely, maybe this was
defined at some point systemwide in the .bash_profile but wasn't removed.
The only one who would know is the sysadmin.

If you aren't the sysadmin, you can contact them to find out about how to
properly set up your proxy, or whether it is even necessary (maybe they
neglected to remove the proxy definition from the system .bash_profile).
Who knows?  

chris


From bix at sendu.me.uk  Wed Dec 20 21:03:03 2006
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 20 Dec 2006 21:03:03 +0000
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
References: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <4589A507.60106@sendu.me.uk>

Chris Fields wrote:
>> First, I got a $http_proxy env. variable automatically 
>> defined by the BioPerl installation (I don't define and 
>> export it in my .bash_profile).
>> So when I'm logging in,             $http_proxy=http://ip_adress:port/
> 
> BioPerl can't permanently set any env. variables out of the box since

True, and it doesn't try to set one temporarily either.

To clarify some of the other points Chris made, the proxy variable 
certainly doesn't need username and password to be defined (from LWPs 
point of view), since not all proxies authenticate. Of course accesses 
won't work if authentication is actually required and these aren't set.

There's no reason that no_proxy should have to be set. It is used to say 
what domains shouldn't be proxied. Either this is a real LWP bug, or 
somehow EUtilities or one of its bases is doing something wrong. It 
should be investigated...

It would be very informative if Anthony could log in when he hasn't done 
anything to his environment variables (and so where the original problem 
manifests) and give us the results of:

perl -e 'while (($key, $val) = each %ENV) { print "$key => $val\n" }'


From avilella at gmail.com  Wed Dec 20 14:07:17 2006
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 20 Dec 2006 14:07:17 +0000
Subject: [Bioperl-l] Using Muscle parameter within bioperl
In-Reply-To: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
References: <A8ACF950-F40E-4E8C-927E-23D2391E5074@vbi.vt.edu>
Message-ID: <358f4d650612200607m4324b8f1r91d2d917cd4951bd@mail.gmail.com>

Try something like:

my @params =('verbose'=>0, 'quiet'=>1, 'log'=>'/tmp/inv.log');
my $factory = Bio::Tools::Run::Alignment::Muscle->new(@params);

it works for me with muscle 3.6. The log only gives me a start,
commandstring and end. I dunno if that is what muscle is supposed to
spit out.

    Albert.

On 12/19/06, Shrinivasrao P. Mane <smane at vbi.vt.edu> wrote:
> Hi,
> I need to run muscle using bioperl. This is how I do it in command line.
>
> muscle -in inv.fasta -out inv.aln -log inv.log -verbose -quiet
>
> I used the following in perl script
>
> my $muscle = new Bio::Tools::Run::Alignment::Muscle(-format =>
> 'clustalw',  -verbose=>'', -quiet=>'', -log='inv.log');
>
> The program runs and produces the result file but it doesn't create a
> log file nor does it stop sending output to STDOUT (-quiet).
> Could anybody help me with this?
> Thanks
> Mane
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 20 22:46:35 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 16:46:35 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <4589A507.60106@sendu.me.uk>
Message-ID: <000c01c72488$b6a690b0$15327e82@pyrimidine>


> Chris Fields wrote:
> >> First, I got a $http_proxy env. variable automatically 
> defined by the 
> >> BioPerl installation (I don't define and export it in my 
> >> .bash_profile).
> >> So when I'm logging in,             
> $http_proxy=http://ip_adress:port/
> > 
> > BioPerl can't permanently set any env. variables out of the 
> box since
> 
> True, and it doesn't try to set one temporarily either.
> 
> To clarify some of the other points Chris made, the proxy 
> variable certainly doesn't need username and password to be 
> defined (from LWPs point of view), since not all proxies 
> authenticate. Of course accesses won't work if authentication 
> is actually required and these aren't set.
>
> There's no reason that no_proxy should have to be set. It is 
> used to say what domains shouldn't be proxied. Either this is 
> a real LWP bug, or somehow EUtilities or one of its bases is 
> doing something wrong. It should be investigated...

Actually, after some investigation I repeated the error and committed a fix.


If I set (on WinXP) HTTP_PROXY to a dummy variable I get the same error:

Can't use an undefined value as an ARRAY reference at
C:/Perl/lib/LWP/UserAgent.pm line 787.

It's EUtilities-specific as other WebAgents that have proxy settings do not
have the same problem, though I haven't checked any WebAgent-based classes.
I think this may also partly be an LWP bug as setting env_proxy to
TRUE/FALSE doesn't seem to have an effect, but instantiating with it
(env_proxy => 1) in the constructor fixes the problem.  Anthony, I have
committed a fix to CVS to GenericWebDBI and EUtilities.  Could you try it
out?

-chris


From cjfields at uiuc.edu  Wed Dec 20 23:19:59 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 20 Dec 2006 17:19:59 -0600
Subject: [Bioperl-l] Problem with : EUtilities - Proxy
In-Reply-To: <000301c72464$9a12a070$15327e82@pyrimidine>
Message-ID: <000001c7248d$5e7df450$15327e82@pyrimidine>

> > First, I got a $http_proxy env. variable automatically 
> defined by the 
> > BioPerl installation (I don't define and export it in my 
> > .bash_profile).
> > So when I'm logging in,             
> $http_proxy=http://ip_adress:port/

Anthony,

Sorry about the prior long-winded response.  I managed to reproduce the
error about five minutes after I responded and managed to trace the problem
back to GenericWebDBI.  The issue seems to be with the LWP::UserAgent
env_proxy method not setting correctly post-instantiation; setting to 0 or 1
doesn't seem to do anything.  If I add it to the list of args for chained
instantiation in the constructor:

    my $self = $class->SUPER::new(@args, env_proxy => 1);

it suddenly works like a charm.  Hard to know why it's being fussy...

I'm going to try reproducing this on a few platforms and check to see if it
has been reported as an LWP bug.  I have also committed a fix to CVS if you
want to test it out.

Chris


From jnewcomer at jhu.edu  Thu Dec 21 01:56:10 2006
From: jnewcomer at jhu.edu (Joe Newcomer)
Date: Wed, 20 Dec 2006 20:56:10 -0500
Subject: [Bioperl-l]  a stupid question
Message-ID: <002101c724a3$2ff80100$bd59dc80@aap.jhu.edu>

Hello Paul Leo,
I am with Johns Hopkins University Advanced Academic Programs.  I am trying
to contact a student named Paul Leo who has registered for Protein
Bioinformatics.  If this is you please email me.  I would like to send you
information about the spring course.

Respectfully, 
Joe Newcomer  (410) 516-5047
Online Education


From anhthu.tieu at gsf.de  Thu Dec 21 10:10:47 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:10:47 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5DA7.1010802@gsf.de>

Hi,

 I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From anhthu.tieu at gsf.de  Thu Dec 21 10:12:36 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 11:12:36 +0100
Subject: [Bioperl-l] imagemaps with heterogeneous_segments
Message-ID: <458A5E14.8060409@gsf.de>

Hi,

I use bioperl 1.5.2 and have been wondering whether it is possible to 
apply the image_and_map function with the glyph option 
"heterogenous_segments". Up to now I can successfully create an 
underlying imagemap for the entire track. However, what I want is to 
create an imagemap for each single segment on my track/glyph. Does 
anyone know who to realise this? Any help is appreciated.

Thanks a lot.

Anh Thu


From somil.sharma1 at gmail.com  Thu Dec 21 06:22:24 2006
From: somil.sharma1 at gmail.com (Somil Sharma)
Date: Thu, 21 Dec 2006 14:22:24 +0800
Subject: [Bioperl-l] problem
Message-ID: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>

hello

*i  run this program*

*#!/use/bin/perl*

*use Bio::DB::GenBank;*

*$gb = new Bio::DB::GenBank;
$seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1;
*

*and got this error on cmd line--*

---------- *EXCEPTION  -------------
MSG: WebDBSeqI Request Error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
Content-Type: text/plain
Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
Client-Warning: Internal response*

*500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*

*STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
STACK Bio::DB::WebDBSeqI::get_Stream_by_id
C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
STACK toplevel C:\Perl\a2.pl:5*

plz see if u can help me out.

my ppm is also not able to install Bioperl so i did that also manually.

waiting for ur reply


From granjeau at tagc.univ-mrs.fr  Thu Dec 21 11:14:25 2006
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Thu, 21 Dec 2006 12:14:25 +0100
Subject: [Bioperl-l] BioFetch: Adding databases
Message-ID: <458A6C91.7090000@tagc.univ-mrs.fr>

Hello!

I needed to query the Unisave database at EBI. Up to date, the only way 
to access it is the dbfetch web service 
(http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet defined 
in the BioFetch package 
(http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote 
these few lines to make it work, but I don't think it fits a good 
programming practice. May be it makes sense to defined a method to add 
databases to FORMATMAP, in order to follow the dbfetch service evolutions.

Cheers,
--Samuel

use Bio::DB::BioFetch;
$Bio::DB::BioFetch::FORMATMAP{unisave} = {
default   => 'swiss',
swissprot => 'swiss',
fasta     => 'fasta',
namespace => 'unisave',
};
my $bf = new Bio::DB::BioFetch(-db=>'unisave');
my $seq = $bf->get_Seq_by_id('LAM1_MOUSE'); 

print $seq->display_id();
print $seq->seq();


From cain at cshl.edu  Thu Dec 21 13:56:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 08:56:21 -0500
Subject: [Bioperl-l] problem
In-Reply-To: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
References: <4e6b524e0612202222t569cba11h3c10c9c11e64185f@mail.gmail.com>
Message-ID: <1166709381.3739.47.camel@localhost.localdomain>

Hello,

It looks to me like you have a networking problem that doesn't have
anything to do with BioPerl.  When I run your script, I get:

Bio::Seq::RichSeq=HASH(0x97013e0)

Fairly quickly, too.  Can you get to http://eutils.ncbi.nlm.nih.gov/ in
a browser without proxy settings?

As an aside, you probably don't really want the HASH stuff above, so I
modified your script to look like this, with warnings and strict to make
future debugging easier:

#!/use/bin/perl -w
use strict;

use Bio::DB::GenBank;

my $gb = new Bio::DB::GenBank;
my $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
print $seq1->seq;


Scott


On Thu, 2006-12-21 at 14:22 +0800, Somil Sharma wrote:
> hello
> 
> *i  run this program*
> 
> *#!/use/bin/perl*
> 
> *use Bio::DB::GenBank;*
> 
> *$gb = new Bio::DB::GenBank;
> $seq1 = $gb->get_Seq_by_id('MUSIGHBA1');
> print $seq1;
> *
> 
> *and got this error on cmd line--*
> 
> ---------- *EXCEPTION  -------------
> MSG: WebDBSeqI Request Error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)
> Content-Type: text/plain
> Client-Date: Thu, 21 Dec 2006 06:28:33 GMT
> Client-Warning: Internal response*
> 
> *500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (connect: Unknown error)*
> 
> *STACK Bio::DB::WebDBSeqI::_request C:/Perl/lib/Bio/DB/WebDBSeqI.pm:685
> STACK Bio::DB::WebDBSeqI::get_seq_stream C:/Perl/lib/Bio/DB/WebDBSeqI.pm:491
> STACK Bio::DB::WebDBSeqI::get_Stream_by_id
> C:/Perl/lib/Bio/DB/WebDBSeqI.pm:27
> STACK Bio::DB::WebDBSeqI::get_Seq_by_id C:/Perl/lib/Bio/DB/WebDBSeqI.pm:145
> STACK toplevel C:\Perl\a2.pl:5*
> 
> plz see if u can help me out.
> 
> my ppm is also not able to install Bioperl so i did that also manually.
> 
> waiting for ur reply
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f63031e2/attachment.sig>

From cjfields at uiuc.edu  Thu Dec 21 14:28:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 21 Dec 2006 08:28:07 -0600
Subject: [Bioperl-l] BioFetch: Adding databases
In-Reply-To: <458A6C91.7090000@tagc.univ-mrs.fr>
References: <458A6C91.7090000@tagc.univ-mrs.fr>
Message-ID: <193C6D1C-6374-4A86-9FBD-7FA994D5FDDF@uiuc.edu>

I've added this to the BioFetch FORMATMAP as 'unisave' and committed  
to CVS.  Thanks!

chris

On Dec 21, 2006, at 5:14 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> I needed to query the Unisave database at EBI. Up to date, the only  
> way
> to access it is the dbfetch web service
> (http://www.ebi.ac.uk/cgi-bin/dbfetch). This database is not yet  
> defined
> in the BioFetch package
> (http://doc.bioperl.org/bioperl-live/Bio/DB/BioFetch.html). I wrote
> these few lines to make it work, but I don't think it fits a good
> programming practice. May be it makes sense to defined a method to add
> databases to FORMATMAP, in order to follow the dbfetch service  
> evolutions.
>
> Cheers,
> --Samuel
>
> use Bio::DB::BioFetch;
> $Bio::DB::BioFetch::FORMATMAP{unisave} = {
> default   => 'swiss',
> swissprot => 'swiss',
> fasta     => 'fasta',
> namespace => 'unisave',
> };
> my $bf = new Bio::DB::BioFetch(-db=>'unisave');
> my $seq = $bf->get_Seq_by_id('LAM1_MOUSE');
>
> print $seq->display_id();
> print $seq->seq();
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From anhthu.tieu at gsf.de  Thu Dec 21 14:31:45 2006
From: anhthu.tieu at gsf.de (Anh-Thu Tieu)
Date: Thu, 21 Dec 2006 15:31:45 +0100
Subject: [Bioperl-l] multiple glyph elements in one track
Message-ID: <458A9AD1.50907@gsf.de>

Hello,

 I use bioperl 1.5.2. Does anyone know how I could create two seperate 
glyph elements on the same track with the Bio::Graphics::Panel module? 
My aim is to have two (e.g. two different) clickable imagemap elements 
on the same track. Until now I can merely create two glyph elements 
(transcript2 or generic options) per track with only one imagemap 
element (e.g. the same imagemap element is used for the entire track as 
the entire (=both elements) glyph's coordinates are returned to the 
image_and_map function as one set of coordinate).

Thank you for your help.

Best regards,

Anh Thu


From cain at cshl.edu  Thu Dec 21 14:47:32 2006
From: cain at cshl.edu (Scott Cain)
Date: Thu, 21 Dec 2006 09:47:32 -0500
Subject: [Bioperl-l] multiple glyph elements in one track
In-Reply-To: <458A9AD1.50907@gsf.de>
References: <458A9AD1.50907@gsf.de>
Message-ID: <1166712453.3739.53.camel@localhost.localdomain>

Hello Anh Thu,

You can provide a callback for the glyph argument that returns different
glyphs depending on what you want to do (ie, how you've coded your
callback).  This example is from the perldoc for Bio::Graphics::Panel:

        $panel->add_track(\@exons,
                          -glyph => sub { my $feature = shift;
                                          $feature->source_tag eq ?curated?                                                    
                                                    ? ?ellipse? : ?generic?; }
                         );

Scott

 
On Thu, 2006-12-21 at 15:31 +0100, Anh-Thu Tieu wrote:
> Hello,
> 
>  I use bioperl 1.5.2. Does anyone know how I could create two seperate 
> glyph elements on the same track with the Bio::Graphics::Panel module? 
> My aim is to have two (e.g. two different) clickable imagemap elements 
> on the same track. Until now I can merely create two glyph elements 
> (transcript2 or generic options) per track with only one imagemap 
> element (e.g. the same imagemap element is used for the entire track as 
> the entire (=both elements) glyph's coordinates are returned to the 
> image_and_map function as one set of coordinate).
> 
> Thank you for your help.
> 
> Best regards,
> 
> Anh Thu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/9ec29c3e/attachment.sig>

From cain.cshl at gmail.com  Thu Dec 21 20:03:48 2006
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 21 Dec 2006 15:03:48 -0500
Subject: [Bioperl-l] problems installing bioperl
In-Reply-To: <1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
References: <1166519755.4587adcb141d3@www.studentmail.otago.ac.nz>
	<45880167.9010605@sendu.me.uk>
	<1166542310.6981.119.camel@localhost.localdomain>
	<1166604008.4588f6e87cccc@www.studentmail.otago.ac.nz>
	<1166621113.3739.11.camel@localhost.localdomain>
	<1166642653.45898dddbd8cf@www.studentmail.otago.ac.nz>
	<1166643051.3739.28.camel@localhost.localdomain>
	<1166729231.458ae00ff184b@www.studentmail.otago.ac.nz>
Message-ID: <1166731428.3739.71.camel@localhost.localdomain>

Hi Stephan,

About your bioperl mail: did you cancel it, or did it just disappear?
If the latter, I might have accidentally deleted it, sorry :-/

So 'GBrowse is running' means that you can see the sample yeast chr1
database, browse around, etc, right?  I still don't know what is up with
the warning but my guess is that everything is working there.

As for your question about writing a callback, the reason it's not
working is that the attributes method returns a list (typically but not
always with only one element), so what you are really doing in your test
is this "number of elements in the list > 1200", which is almost always
going to be false.  You should change it to this:

  my $feature = shift;
  my ($score) = $feature->attributes('score');
  if ($score > 1200) {
  ...etc...

Finally, if you really want to test that you are using the correct
bioperl, you can put this simple cgi in your cgi-bin directory as
test_biographics.pl, set it as world executable and go to
http://localhost/cgi-bin/test_biographics.pl (and, yes, I use strict and
warnings even when the script is only 10 lines long :-)  :

#!/usr/bin/perl
use strict;
use warnings;
use Bio::Graphics::Panel;
use CGI qw/:standard/;

print header(),
      start_html,
      p("Bio::Graphics::Panel api_version is ".Bio::Graphics::Panel->api_version),
      p("It should be 1.654 for BioPerl 1.5.2"),
      end_html;

Scott


On Fri, 2006-12-22 at 08:27 +1300, Stephan Roessner wrote:
> Hi Scott,
> 
> responded to group but did get through.
> So I reply back to you.
> 
> I installed Class-Base-0.03 using CPAN.
> 
> Reinstalling GBrowse gives me still a warning like:
> Warning: prerequisite Bio::Perl 1.52 not found. We have 1.0050021.
> Writing Makefile for Bio::Graphocs::Browser::CAlign
> Writing Makefile for Generic-Genome-Browser.
> 
> GBrowse is running but I cannot access attributes and/or the score column
> of .gff files. Is this related to the warning?
> 
> To get an attribute I use
> 
> my $feature = shift;
>                 if ($feature->attributes('score') > 1200) {
>                   return 'blue';
>                 } else {
>                   return 'pink';
>                 }
> But I retrieve not data using $feature->
> 
> Can I somehaow verify what bioperl version GBrowse is using?
> 
> Stephan,
> 
> 
> 
> Quoting Scott Cain <cain.cshl at gmail.com>:
> 
> > Stephan,
> >
> > Yes, it is in cpan:
> >
> > http://search.cpan.org/~abw/Class-Base-0.03/lib/Class/Base.pm
> >
> > The cpan shell should be able to install it.
> >
> > Whether or not that works, please respond to the mailing list so that
> > the rest of the conversation can be archived.
> >
> > Scott
> >
> >
> > On Thu, 2006-12-21 at 08:24 +1300, Stephan Roessner wrote:
> > > Hi Scott,
> > >
> > > No I didn't.
> > > I had a look and couldn't find it.
> > > It is not part of CPAN?
> > >
> > > Stephan
> > >
> > >
> > > Quoting Scott Cain <cain.cshl at gmail.com>:
> > >
> > > > Stephan,
> > > >
> > > > Did you install Class::Base?  It was inadvertantly left out the
> > > > install
> > > > document, but is required.
> > > >
> > > > Scott
> > > >
> > > >
> > > > On Wed, 2006-12-20 at 21:40 +1300, Stephan Roessner wrote:
> > > > > Hi all,
> > > > >
> > > > > I did sudo ./Build install --uninst 1 and got the error
> > > > > * ERROR: Confiduration was initially created with MOdule::Build
> > > > version
> > > > > '0.2805', but we are now using '0.2806'. ...
> > > > >
> > > > > So I ran perl Build.PL and got the message
> > > > > Creating new 'Buid' script for 'bioperl' verion '1.0050021'.
> > > > >
> > > > > I did run sudo ./Build install --uninst 1 again.
> > > > > Seems to be fine with no error messages.
> > > > >
> > > > > When I run perl Makefile.PL for GBrowse 1.66-RC2 it results in
> > > > >
> > > > > Warning: prerequisite Bio::Perl 1.52 not found. We have
> > 1.0050021.
> > > > > Warning: prerequisite Class::Base 0 not found.
> > > > > Writing Makefile for Bio::Graphocs::Browser::CAlign
> > > > > Writing Makefile for Generic-Genome-Browser
> > > > >
> > > > > GBrowse is running but I have really troubles with aggregators
> > trying
> > > > to
> > > > > use xyplot. It does not plot anything. So I thought the bioperl
> > could
> > > > be
> > > > > the problem.
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > >
> > > > > Quoting Scott Cain <cain at cshl.edu>:
> > > > >
> > > > > > I really don't think the BioPerl version detection is wrong.
> > I
> > > > > > actually
> > > > > > don't check Bio::Root::Version::VERSION in Makefile.PL, I
> > check
> > > > > > Bio::Graphics::Panel->api_version.  When it doesn't find the
> > > > correct
> > > > > > api_version, it gives a warning the BioPerl 1.5.2 is not
> > installed.
> > > >  I
> > > > > > have seen this happen when more than one BioPerl instance is
> > > > installed
> > > > > > and `perl Makefile.PL` finds the wrong one first.  My
> > suggestion is
> > > > to
> > > > > > try reinstalling BioPerl and providing the --uninst 1 argument
> > to
> > > > > > remove
> > > > > > older versions of BioPerl:
> > > > > >
> > > > > >   sudo ./Build install --uninst 1
> > > > > >
> > > > > > Scott
> > > > > >
> > > > > >
> > > > > > On Tue, 2006-12-19 at 15:12 +0000, Sendu Bala wrote:
> > > > > > > Stephan Roessner wrote:
> > > > > > > > Dear support team,
> > > > > > > >
> > > > > > > > I installed bioperl 1.5.2_100 on a ferdora machine to be
> > able
> > > > to
> > > > > > use
> > > > > > > > gbrowse.
> > > > > > > > The installation seems to work (except of the test
> > failures)
> > > > but
> > > > > > the
> > > > > > > > gbrowse installation tells me that BIO::pERL 1.0050021 is
> > > > > > installed, but
> > > > > > > > of course it requires 1.52.
> > > > > > > >
> > > > > > > > Is there a chance to find out what went wrong?
> > > > > > >
> > > > > > > Nothing went wrong with the Bioperl installation (well,
> > expect
> > > > there
> > > > > > > shouldn't have been any test failures - can you post those
> > > > please?).
> > > > > > > gbrowse simply defined its Bioperl requirement incorrectly.
> > If
> > > > you
> > > > > > tell
> > > > > > > me exactly where you downloaded gbrowse from and how you
> > went
> > > > about
> > > > > > > installing it, and provide the exact, complete error message
> > you
> > > > got
> > > > > > > from it, I might be able help the authors fix the problem.
> > > > > > >
> > > > > > > Or I'm pretty sure they can figure it our for themselves :)
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > --
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > Scott Cain, Ph. D.
> > > > > > cain at cshl.edu
> > > > > > GMOD Coordinator (http://www.gmod.org/)
> > > > > > 216-392-3087
> > > > > > Cold Spring Harbor Laboratory
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > --
> > > >
> > ------------------------------------------------------------------------
> > > > Scott Cain, Ph. D.
> > > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > >
> > >
> > >
> > >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061221/f8621965/attachment.sig>

From rvosa at sfu.ca  Sat Dec 23 22:17:37 2006
From: rvosa at sfu.ca (Rutger Vos)
Date: Sat, 23 Dec 2006 14:17:37 -0800
Subject: [Bioperl-l] [Summary] Re: suggestions for suitable 'taxon' object
In-Reply-To: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
References: <200612200423.kBK4NKDt009254@rm-rstar.sfu.ca>
Message-ID: <458DAB01.6080200@sfu.ca>

The replies I've received so far indicate I should look into Bio::Taxon. 
I will probably come back with further questions/discussions as to how 
to link and cross reference taxa, sequences and  nodes, but for now I 
should first look at the Bio::Taxon api (and unpack my other Christmas 
gifts). Thank you for all comments and suggestions.

Happy holidays!

Rutger


Rutger Vos wrote:
> Hi all,
>
> I am looking for a bioperl object that can be abused to function as a
> suitable 'taxon' object, where I mean 'taxon' as understood by the NEXUS
> file format (i.e. not strictly an entity from a taxonomy, but more loosely
> an OTU). 
>
> The object would primarily function as a way to relate nodes in trees to
> sequences in an alignment (a foreign key that both nodes and sequences refer
> to), and secondarily as the keeper of the canonical name of the OTU, such
> that a sequence named 'Homo_sapiens|EF177447.1/12-56' and a node named 'Homo
> sapiens (constrained monophyly)' can still be understood to refer to the
> same thing - the OTU 'Homo sapiens sapiens' (for example).
>
> I was thinking that a (possibly expanded) Bio::Species might work if there
> was some sensible way of appending references to node and sequence objects
> to it (or otherwise associate them with each other), but I am writing *to
> solicit any and all suggestions*. I am looking for something similar to
> Bio::Phylo::Taxa::Taxon.
>
> Any and all comments and suggestions greatly appreciated!
>
> Best wishes,
>
> Rutger Vos
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Rutger A. Vos
 Postdoctoral research fellow
 University of British Columbia
 Personal site: http://www.sfu.ca/~rvosa
        CIPRES: http://www.phylo.org
    Bio::Phylo: http://search.cpan.org/~rvosa/Bio-Phylo/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++


From paul.boutros at utoronto.ca  Sun Dec 24 03:36:59 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:36:59 -0500
Subject: [Bioperl-l] Bio::Graphics::Glyph::dna
Message-ID: <20061223223659.7sgfofa44mw4okks@webmail.utoronto.ca>

Hi,

I've been trying to get the dna glyph working and have had some  
problems.  I'm using a fasta file, and am having some problems.  This  
is ActiveState perl 5.8.8 (build 819) and BioPerl 1.5.2 on WinXP.  I'm  
starting with a FASTA file, so I've tried:
$panel->add_track(
	$seq,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

where $seq is a Bio::Seq object

and I've tried it using a GFF $segment:
my $db = Bio::DB::GFF->new(
          -adaptor=>    'berkeleydb',
          -create =>    1,
          -dsn    =>    'temp'
          );

$db->load_sequence_string(
           $seq->primary_id(),
           $seq->seq()
           );

my $segment = Bio::DB::GFF::Segment->new(
           $db,
           $seq->primary_id(),
           $seq->primary)_id(),
           1,
           $seq->length()
           );

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);


From paul.boutros at utoronto.ca  Sun Dec 24 03:46:27 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Sat, 23 Dec 2006 22:46:27 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
Message-ID: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having  
some problems.  I'm starting with a fasta file, and I am running perl  
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2

If I try simply using a Bio::Seq object like this:
$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
Can't locate object method "start" via package "Bio::Seq" at  
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this:
my $db = Bio::DB::GFF->new(
	-adaptor  => 'berkeleydb',
	-create   => 1,
	-dsn      => '/usr/local/share/gff/dmel'
	);

$db->initialize(1);

$db->load_sequence_string(
	$seq->primary_id(),
	$seq->seq()
	);

my $segment = Bio::DB::GFF::Segment->new(
	$db,
	$seq->primary_id(),
	$seq->primary_id(),
	1,
	$seq->length()
	);

$panel->add_track(
	$segment,
	-glyph     =>   'dna',
	-do_gc     =>   'true',
	-gc_window =>   'auto'
	);

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not  
implemented b
y package Bio::DB::GFF::Segment.
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented  
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures  
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
STACK: Bio::Graphics::Glyph::subfeat  
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph  
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Panel::_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track  
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated!
Paul


From lstein at cshl.edu  Sun Dec 24 17:23:18 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun, 24 Dec 2006 12:23:18 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
References: <20061223224627.qezpabv9f74ocowk@webmail.utoronto.ca>
Message-ID: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>

Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g.
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:
>
> Hello,
>
> I'm trying to get the dna glyph of Bio::Graphics to work and am having
> some problems.  I'm starting with a fasta file, and I am running perl
> 5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2
>
> If I try simply using a Bio::Seq object like this:
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> Can't locate object method "start" via package "Bio::Seq" at
> C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.
>
>
> And if I try creating a Bio::DB::GFFSegment object like this:
> my $db = Bio::DB::GFF->new(
>         -adaptor  => 'berkeleydb',
>         -create   => 1,
>         -dsn      => '/usr/local/share/gff/dmel'
>         );
>
> $db->initialize(1);
>
> $db->load_sequence_string(
>         $seq->primary_id(),
>         $seq->seq()
>         );
>
> my $segment = Bio::DB::GFF::Segment->new(
>         $db,
>         $seq->primary_id(),
>         $seq->primary_id(),
>         1,
>         $seq->length()
>         );
>
> $panel->add_track(
>         $segment,
>         -glyph     =>   'dna',
>         -do_gc     =>   'true',
>         -gc_window =>   'auto'
>         );
>
> I get the error:
> ------------- EXCEPTION: Bio::Root::NotImplemented -------------
> MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
> implemented b
> y package Bio::DB::GFF::Segment.
> This is not your fault - author of Bio::DB::GFF::Segment should be blamed!
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::Root::RootI::throw_not_implemented
> C:/Perl/site/lib/Bio/Root/RootI.pm:522
> STACK: Bio::FeatureHolderI::get_SeqFeatures
> C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
> STACK: Bio::Graphics::Glyph::_subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186
> STACK: Bio::Graphics::Glyph::subfeat
> C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
> STACK: Bio::Graphics::Glyph::Factory::make_glyph
> C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
> STACK: Bio::Graphics::Panel::_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
> STACK: Bio::Graphics::Panel::_do_add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
> STACK: Bio::Graphics::Panel::add_track
> C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
> STACK: create_figure.pl:147
> ----------------------------------------------------------------
>
> I'm really unsure what to try next, any suggestions much appreciated!
> Paul
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From tgenahmet at gmail.com  Wed Dec 27 21:38:43 2006
From: tgenahmet at gmail.com (Ahmet Kurdoglu)
Date: Wed, 27 Dec 2006 14:38:43 -0700
Subject: [Bioperl-l] get mRNA details for a gene
Message-ID: <9d8d0e2a0612271338t7cb15a63v5a08f624888b3f7b@mail.gmail.com>

Hi,

This is my first message to the list. I hope I get it right. Here is what
I'm trying to accomplish:

Get the mRNA details for a given gene (ex. DNASE2B) from its GenBank file.

Using the web-interface I can search with this query:
DNASE2B [sym] AND homo sapiens [ORGN] (returns only one result if you search
'gene' database)
and get the GenBank file by clicking on NC_000001.9 and I can see the
details for its two mRNAs. (I eventually need to get exon locations for both
of its transcripts)

However trying to do this in Perl has proved to be very difficult for me.
I've tried various methods, including get_Seq_by_id, get_Seq_by_gi, and
get_Stream_by_query. Before I explain in detail what I did I'd like to hear
your ideas on how to accomplish this.

Thank you.


From sdavis2 at mail.nih.gov  Thu Dec 28 21:57:03 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 28 Dec 2006 16:57:03 -0500
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
Message-ID: <45943DAF.70100@mail.nih.gov>

Michael Muratet US-Huntsville wrote:
> Sean
>
> Thanks. I did consider the bioconductor package and downloaded your
> write-up after it was recommended by the GEO folks. I've looked at R a
> few times, but I never got proficient at it. I'm a lot better with perl.
>
> I've been looking at MINiML, too. It looked like it might be easier to
> parse the SOFT file since the data is in-line with the attributes and
> I'd have to use a SAX parser (not enough memory for DOM) for MINiML.
>
> NCBI must have parsers to get the data into their databases. Do you know
> what they use?
>   
Michael,

You might want to look more specifically at the MINiML format specs.  
The data tables are stored as separate tab-delimited files with an 
external reference in the XML, so DOM parsing is possible with just a 
few kB of memory.  Of course, to read in all of the data into memory at 
once will take a large amount of memory for some datasets.  If you are 
going to load into a database, I would suggest reading the MINiML using 
DOM and then stepping through the data files one at a time, loading as 
you go.

As for their parsers, I'm not sure what language they use, but writing a 
parser for either SOFT or MINiML isn't at all difficult.  GEO uses a 
very simplified MAGE schema. 

As for R vs. perl, if you are planning on doing analyses of microarray 
data, I would highly suggest looking again at the R/bioconductor 
project.  It will save you reinventing many wheels, such as getting 
annotation like gene ontology and pathways, doing stats, plotting, 
maintaining MIAME-compliant data structures, converting from multiple 
microarray formats, etc. 

Sean


From allenday at ucla.edu  Thu Dec 28 23:21:07 2006
From: allenday at ucla.edu (Allen Day)
Date: Thu, 28 Dec 2006 15:21:07 -0800
Subject: [Bioperl-l] [Bioperl-microarray] SOFT parsers
In-Reply-To: <45943DAF.70100@mail.nih.gov>
References: <FC173C9E9BE18F45A3241288B723D64C1F16D4@hsv-exmail03.operonads.local>
	<45943DAF.70100@mail.nih.gov>
Message-ID: <5c24dcc30612281521o58b9f256sfa36c403f4c30bfa@mail.gmail.com>

> As for R vs. perl, if you are planning on doing analyses of microarray
> data, I would highly suggest looking again at the R/bioconductor
> project.  It will save you reinventing many wheels, such as getting
> annotation like gene ontology and pathways, doing stats, plotting,
> maintaining MIAME-compliant data structures, converting from multiple
> microarray formats, etc.

I'll second this statement WRT the data analysis.  I'm doing all my
analysis in R, Perl is just not good at dealing with large matrices or
plotting.  OTOH, I have also found that R is particularly weak when it
comes to pipelining data and system interfacing.  If your goal is to
do ETL to a local database you're better off using Perl.

I've found they're both about equally clunky for dealing with the
experimental metadata, with a slight preference for Perl.  That's more
a reflection of the baroque MAGE model though than the programming
languages themselves.

-Allen

>
> Sean
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Paul.Boutros at utoronto.ca  Sat Dec 30 07:43:32 2006
From: Paul.Boutros at utoronto.ca (Paul Boutros)
Date: Sat, 30 Dec 2006 02:43:32 -0500
Subject: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?
In-Reply-To: <6dce9a0b0612240923v24ebafffs5c280d9cb4c65263@mail.gmail.com>
Message-ID: <000c01c72be6$34d07e60$ec02a8c0@main>

Hi Lincoln,

Thanks, that worked like a charm!  Can I suggest adding the
example/explanation you gave me to the pod for Bio::Graphics::Glyph::dna?
Here's a patch against the 1.5.2 version of dna.pm to do that.

Paul

 
266c266,274

< in response to the dna() method.

---

> in response to the dna() method.  For example, you can use a

> Bio::SeqFeature::Generic object with an attached Bio::PrimarySeq

> like this:

>    my $dna = Bio::PrimarySeq->new( -seq => 'A' x 1000 );

>    my $feature = Bio::SeqFeature::Generic->new( -start => 1, -end => 800
);

>    $feature->attach_seq($dna);

>    $panel->add_track( $feature, -glyph => 'dna' );

> 

> A Bio::Graphics::Feature object may also be used.

 
  _____  

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf Of
Lincoln Stein
Sent: Sunday, December 24, 2006 12:23 PM
To: Paul.Boutros at utoronto.ca
Cc: BioPerl Mailing List
Subject: Re: [Bioperl-l] How to use Bio::Graphics::Glyph::dna?

 
Hi,

You need to use either a Bio::SeqFeature::Generic object (with an attached
Bio::PrimarySeq) or a Bio::Graphics::Feature object. You are not intended to
create Bio::DB::GFF::Segment objects directly.

e.g. 
my $dna = Bio::PrimarySeq->new(-seq=>'a'x1000);
my $feature = Bio::SeqFeature::Generic->new(-start=>1,-end=>800);
$feature->attach_seq($dna);

Best,

Lincoln

On 12/23/06, Paul Boutros <paul.boutros at utoronto.ca> wrote:

Hello,

I'm trying to get the dna glyph of Bio::Graphics to work and am having
some problems.  I'm starting with a fasta file, and I am running perl
5.8.8 (ActiveState build 819) on WinXP and BioPerl 1.5.2 

If I try simply using a Bio::Seq object like this:
$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
Can't locate object method "start" via package "Bio::Seq" at
C:/Perl/site/lib/Bio/Graphics/FeatureBase.pm line 164.


And if I try creating a Bio::DB::GFFSegment object like this: 
my $db = Bio::DB::GFF->new(
        -adaptor  => 'berkeleydb',
        -create   => 1,
        -dsn      => '/usr/local/share/gff/dmel'
        );

$db->initialize(1);

$db->load_sequence_string(
        $seq->primary_id(),
        $seq->seq()
        );

my $segment = Bio::DB::GFF::Segment->new(
        $db,
        $seq->primary_id(),
        $seq->primary_id(), 
        1,
        $seq->length()
        );

$panel->add_track(
        $segment,
        -glyph     =>   'dna',
        -do_gc     =>   'true',
        -gc_window =>   'auto' 
        );

I get the error:
------------- EXCEPTION: Bio::Root::NotImplemented -------------
MSG: Abstract method "Bio::FeatureHolderI::get_SeqFeatures" is not
implemented b
y package Bio::DB::GFF::Segment. 
This is not your fault - author of Bio::DB::GFF::Segment should be blamed!

STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::Root::RootI::throw_not_implemented 
C:/Perl/site/lib/Bio/Root/RootI.pm:522
STACK: Bio::FeatureHolderI::get_SeqFeatures
C:/Perl/site/lib/Bio/FeatureHolderI.pm:101
STACK: Bio::Graphics::Glyph::_subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1186 
STACK: Bio::Graphics::Glyph::subfeat
C:/Perl/site/lib/Bio/Graphics/Glyph.pm:1167
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:56
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316
STACK: Bio::Graphics::Glyph::new C:/Perl/site/lib/Bio/Graphics/Glyph.pm:81
STACK: Bio::Graphics::Glyph::Factory::make_glyph
C:/Perl/site/lib/Bio/Graphics/Glyph/Factory.pm:316 
STACK: Bio::Graphics::Panel::_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:388
STACK: Bio::Graphics::Panel::_do_add_track
C:/Perl/site/lib/Bio/Graphics/Panel.pm:360
STACK: Bio::Graphics::Panel::add_track 
C:/Perl/site/lib/Bio/Graphics/Panel.pm:288
STACK: create_figure.pl:147
----------------------------------------------------------------

I'm really unsure what to try next, any suggestions much appreciated! 
Paul


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice) 
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From er at xs4all.nl  Sun Dec 31 00:05:16 2006
From: er at xs4all.nl (Erik)
Date: Sun, 31 Dec 2006 01:05:16 +0100 (CET)
Subject: [Bioperl-l] acquiring a local refseq + index
Message-ID: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>

Hi all,

I downloaded the refseq files (.gbff) and want to index the lot with
Bio::DB::Flat.

It turns out that there are many cases where the SOURCE and ORGANISM lines
are messed up, sometimes to a degree where the indexing fails on a
Bio::SeqIO::genbank error.

I'd like to change Bio::SeqIO::genbank to let this parsing go at least so
far as to make the indexing of the refseq files possible, and hopefully
improving the taxonomic output ($seq->species->binomial is often mutilated
at the moment).

Is it still worthwhile to change parsing modules like Bio::SeqIO::genbank?
 Is anyone already working on a rewrite? Because if this is the case I may
be better off writing my own indexing scheme?

Below is (outline of) my indexing program, which uses Bio::DB::Flat::DBD.
If anyone knows of a better way to get a locally searchable refseq flat
file index, I would be very interested.

Thanks for your help,

Erikjan


-------------
use Bio::DB::Flat;

my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
my $db=Bio::DB::Flat->new(
   -directory  => $refseq_dir,
   -dbname     => 'refseq',
   -format     => 'genbank',
   -index      => 'bdb',
   -write_flag => 1,
);
my @files = getfiles($refseq_dir);
for my $f (@files) {
        db->build_index($f);
}


From hlapp at gmx.net  Sun Dec 31 01:48:33 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Dec 2006 20:48:33 -0500
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
Message-ID: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>

Can you send examples and the resulting error messages? Also, I'm  
assuming you running the 1.5.2 release of Bioperl; if not that's what  
I would try first.

	-hilmar

On Dec 30, 2006, at 7:05 PM, Erik wrote:

> Hi all,
>
> I downloaded the refseq files (.gbff) and want to index the lot with
> Bio::DB::Flat.
>
> It turns out that there are many cases where the SOURCE and  
> ORGANISM lines
> are messed up, sometimes to a degree where the indexing fails on a
> Bio::SeqIO::genbank error.
>
> I'd like to change Bio::SeqIO::genbank to let this parsing go at  
> least so
> far as to make the indexing of the refseq files possible, and  
> hopefully
> improving the taxonomic output ($seq->species->binomial is often  
> mutilated
> at the moment).
>
> Is it still worthwhile to change parsing modules like  
> Bio::SeqIO::genbank?
>  Is anyone already working on a rewrite? Because if this is the  
> case I may
> be better off writing my own indexing scheme?
>
> Below is (outline of) my indexing program, which uses  
> Bio::DB::Flat::DBD.
> If anyone knows of a better way to get a locally searchable refseq  
> flat
> file index, I would be very interested.
>
> Thanks for your help,
>
> Erikjan
>
>
> -------------
> use Bio::DB::Flat;
>
> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
> my $db=Bio::DB::Flat->new(
>    -directory  => $refseq_dir,
>    -dbname     => 'refseq',
>    -format     => 'genbank',
>    -index      => 'bdb',
>    -write_flag => 1,
> );
> my @files = getfiles($refseq_dir);
> for my $f (@files) {
>         db->build_index($f);
> }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Dec 31 02:33:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Dec 2006 20:33:23 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
Message-ID: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>

Agree with Hilmar, in that we need examples.  If you are referring to  
your submitted bug:

http://bugzilla.open-bio.org/show_bug.cgi?id=2167

we could add this in as long as it passes (I'll try giving it a  
workout with my local bacterial seqs tonight or tomorrow).  However,  
in the not-too-distant future your patch would likely be rendered  
obsolete, as any parsing in Bio::SeqIO modules pertaining to  
Bio::Species-related matters will be deprecated in favor of simple  
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has  
optional db lookups using NCBI Taxonomy).  Bio::Species and anything  
related to it are considered marked for deprecation.  Fair warning...

chris

On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:

> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> 	-hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>>  Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>>    -directory  => $refseq_dir,
>>    -dbname     => 'refseq',
>>    -format     => 'genbank',
>>    -index      => 'bdb',
>>    -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>>         db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Dec 31 19:36:47 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 31 Dec 2006 13:36:47 -0600
Subject: [Bioperl-l] acquiring a local refseq + index
In-Reply-To: <76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
References: <4632.156.83.1.215.1167523516.squirrel@webmail.xs4all.nl>
	<A4BD1950-AD1C-4EAA-A2F8-85E7FCEC7C31@gmx.net>
	<76AAAE98-779F-495C-A19A-A1A800B1D392@uiuc.edu>
Message-ID: <37FB5BDF-25A9-44F0-9E82-964684A73A58@uiuc.edu>

As a followup, I have committed the fix Erik had in Bugzilla.  I  
don't know if this helps with the below issue Erik describes (they  
sound unrelated).

chris

On Dec 30, 2006, at 8:33 PM, Chris Fields wrote:

> Agree with Hilmar, in that we need examples.  If you are referring to
> your submitted bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow).  However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
> related to it are considered marked for deprecation.  Fair warning...
>
> chris
>
> On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
>
>> Can you send examples and the resulting error messages? Also, I'm
>> assuming you running the 1.5.2 release of Bioperl; if not that's what
>> I would try first.
>>
>> 	-hilmar
>>
>> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>>
>>> Hi all,
>>>
>>> I downloaded the refseq files (.gbff) and want to index the lot with
>>> Bio::DB::Flat.
>>>
>>> It turns out that there are many cases where the SOURCE and
>>> ORGANISM lines
>>> are messed up, sometimes to a degree where the indexing fails on a
>>> Bio::SeqIO::genbank error.
>>>
>>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>>> least so
>>> far as to make the indexing of the refseq files possible, and
>>> hopefully
>>> improving the taxonomic output ($seq->species->binomial is often
>>> mutilated
>>> at the moment).
>>>
>>> Is it still worthwhile to change parsing modules like
>>> Bio::SeqIO::genbank?
>>>  Is anyone already working on a rewrite? Because if this is the
>>> case I may
>>> be better off writing my own indexing scheme?
>>>
>>> Below is (outline of) my indexing program, which uses
>>> Bio::DB::Flat::DBD.
>>> If anyone knows of a better way to get a locally searchable refseq
>>> flat
>>> file index, I would be very interested.
>>>
>>> Thanks for your help,
>>>
>>> Erikjan
>>>
>>>
>>> -------------
>>> use Bio::DB::Flat;
>>>
>>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>>> my $db=Bio::DB::Flat->new(
>>>    -directory  => $refseq_dir,
>>>    -dbname     => 'refseq',
>>>    -format     => 'genbank',
>>>    -index      => 'bdb',
>>>    -write_flag => 1,
>>> );
>>> my @files = getfiles($refseq_dir);
>>> for my $f (@files) {
>>>         db->build_index($f);
>>> }
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign