From alan.bridge at isb-sib.ch  Sun Dec  2 13:29:48 2007
From: alan.bridge at isb-sib.ch (Alan Bridge)
Date: Sun, 02 Dec 2007 19:29:48 +0100
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
Message-ID: <4752F99C.9050504@isb-sib.ch>

Hello,

I was just wondering if, when performing a RemoteBlast, it would be 
possible to specify the entire UniProt database (i.e. Swiss-Prot + 
TrEMBL), or even just TrEMBL.

It seems that currently, you can only specify Swiss-Prot (the annotated 
portion of UniProt, which is much smaller than its automatically 
annotated counterpart, TrEMBL). Any hints on how to expand the search 
space to include TrEMBL would be really appreciated.

Regards, Alan Bridge

            my $prog = 'blastp';
            my $db   = 'swissprot'; # use TrEMBL ?
            my $e_val= '1e-10';

            my @params = ( '-prog' => $prog, '-data' => $db, '-expect' 
=> $e_val, '-readmethod' => 'SearchIO' );

-- 
Alan Bridge PhD
Swiss-Prot annotator
Swiss Institute of Bioinformatics (SIB)
1, rue Michel Servet
CH-1211 Geneva 4  
Switzerland   

Tel: (+41 22) 379 58 90
Fax: (+41 22) 379 58 58 

http://www.expasy.org/ 


From avilella at gmail.com  Mon Dec  3 06:39:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 3 Dec 2007 11:39:59 +0000
Subject: [Bioperl-l] Query about SLAC.pm module
In-Reply-To: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
References: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com>

[CCing to the bioperl ml]

Sorry, there were some bits left in the pod header referring to PAML
objects that aren't quite true.
I've updated now the PODs. The Hyphy executions return hashes:

If you run the SLAC test in t/Hyphy.t you will se that the $results
are something like:

DB<3> x 2 $results
0  HASH(0x8df3110)
   'E[NS Sites]' => ARRAY(0x8e6cff4)
   'E[S Sites]' => ARRAY(0x8e6ceb0)
   'Observed NS Changes' => ARRAY(0x8e7b380)
   'Observed S Changes' => ARRAY(0x8e7b344)
   'Observed S. Prop.' => ARRAY(0x8e6d018)
   'P{S geq. observed}' => ARRAY(0x8e6d360)
   'P{S leq. observed}' => ARRAY(0x8e6d33c)
   'P{S}' => ARRAY(0x8e6d03c)
   'Scaled dN-dS' => ARRAY(0x8e6d384)
   'dN' => ARRAY(0x8e6d084)
   'dN-dS' => ARRAY(0x8e6d0a8)
   'dS' => ARRAY(0x8e6d060)
  DB<4> x $rc

which correspond to the csv file that hyphy produces.

Cheers,

    Albert.

On Dec 3, 2007 10:04 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Dear Dr. Vilella,
>
> Please allow me to introduce myself. My name is Johan Nilsson and I am a
> postdoctoral researcher in bioinformatics.
>
> I was  planning to perform a large-scale analysis for positively selected
> protein coding genes using any appropriate method from the Hyphy package,
> and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very
> useful for this.
>
> IF I interpreted the documents of e.g. the SLAC module correctly, running
> $slac->run($aln,$tree) will return a
> Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results
> from the obtained hashref (running my script on the test files provided
> with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains
> that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed
> reference').
>
> I am fairly new to bioperl, so please appologise if this question was a
> stupid one :)
>
> Thanks in advance!
>
> Yours Sincerely
> /Johan
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>


From cjfields at uiuc.edu  Mon Dec  3 09:04:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 08:04:06 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
In-Reply-To: <4752F99C.9050504@isb-sib.ch>
References: <4752F99C.9050504@isb-sib.ch>
Message-ID: <CF967851-5E6C-448A-87C6-CC3F63A5D9AD@uiuc.edu>

You are limited to the databases hosted on the NCBI server, so it's  
really up to them; RemoteBlast is an interface to NCBI's WebBlast  
using URLAPI.

A list of current databases can be found here:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote:

> Hello,
>
> I was just wondering if, when performing a RemoteBlast, it would be
> possible to specify the entire UniProt database (i.e. Swiss-Prot +
> TrEMBL), or even just TrEMBL.
>
> It seems that currently, you can only specify Swiss-Prot (the  
> annotated
> portion of UniProt, which is much smaller than its automatically
> annotated counterpart, TrEMBL). Any hints on how to expand the search
> space to include TrEMBL would be really appreciated.
>
> Regards, Alan Bridge
>
>            my $prog = 'blastp';
>            my $db   = 'swissprot'; # use TrEMBL ?
>            my $e_val= '1e-10';
>
>            my @params = ( '-prog' => $prog, '-data' => $db, '-expect'
> => $e_val, '-readmethod' => 'SearchIO' );
>
> -- 
> Alan Bridge PhD
> Swiss-Prot annotator
> Swiss Institute of Bioinformatics (SIB)
> 1, rue Michel Servet
> CH-1211 Geneva 4
> Switzerland
>
> Tel: (+41 22) 379 58 90
> Fax: (+41 22) 379 58 58
>
> http://www.expasy.org/

From bioperl at boekhoff.info  Mon Dec  3 14:14:24 2007
From: bioperl at boekhoff.info (Sven Boekhoff)
Date: Mon, 03 Dec 2007 20:14:24 +0100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST
	reload
Message-ID: <47545590.1000703@boekhoff.info>

HI!
I just started working with Perl and BioPerl. I'm quite impressed what 
can be easily done with this module. Today I found that my second CPU 
ist not used, but the first one run's at 100%. I tried to include the 
"-a"-parameter, but I was not successful:

my @params = (
	-database => 'my_db',
	-a => '2',
	-outfile => 'blast1.out'
);

How do I have to use it?

Second question: In my perlscript I start BLAST-searches in a loop. 
Everytime BLAST has finished its search, the memory is cleared and BLAST 
is started again. I think most of the time is used to reload the 
database. Is it somehow possible to keep the database loaded (e.g. by 
starting a second search) or is BLAST reloaded anyway?

Thanks for your help!

Sven


www.boekhoff.info

From bix at sendu.me.uk  Mon Dec  3 19:05:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 00:05:23 +0000
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
 BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <475499C3.20801@sendu.me.uk>

Sven Boekhoff wrote:
> HI!
> I just started working with Perl and BioPerl. I'm quite impressed what 
> can be easily done with this module. Today I found that my second CPU 
> ist not used, but the first one run's at 100%. I tried to include the 
> "-a"-parameter, but I was not successful:
> 
> my @params = (
> 	-database => 'my_db',
> 	-a => '2',
> 	-outfile => 'blast1.out'
> );
> 
> How do I have to use it?

This should work in the CVS version of StandAloneBlast. In other 
versions, perhaps try using $object->a(2);


> Second question: In my perlscript I start BLAST-searches in a loop. 
> Everytime BLAST has finished its search, the memory is cleared and BLAST 
> is started again. I think most of the time is used to reload the 
> database. Is it somehow possible to keep the database loaded (e.g. by 
> starting a second search) or is BLAST reloaded anyway?

I hope someone will correct me for being wrong, but I think you'd have 
to that with a 2-way pipe. StandAloneBlast only uses output to a file 
and input from that file, finishing with the executable inbetween. I've 
thought about improving it with a 2-way pipe, but never got around to 
it, being apprehensive about stability on all platforms.

The more obvious solution, which may be possible depending on exactly 
what you're doing, is to avoid the loop and just supply Blast all your 
input in one go.

From Russell.Smithies at agresearch.co.nz  Mon Dec  3 19:49:21 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 13:49:21 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>

Hi all,

It' trying to read .ace files but keep getting an error that I don't
know the cause of.
Really basic example code:

	#!/usr/local/bin/perl -w

	use lib "/data/home/smithiesr/bioperl-live";
	use Bio::Assembly::IO;
	use Data::Dumper;

	$ace = "CLP0001001240-cE15_20030319.ace";

	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
	$assembly = $io->next_assembly;

	foreach $contig ($assembly->all_contigs) {
      		print Dumper $contig;
	}

Gives this error;
	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
	Can't call method "get_consensus_sequence" on an undefined value
at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
<GEN0> line 42.

Which relates to this bit in ace.pm:
	# Loading contig qualities... (Base Quality field)
	/^BQ/ && do {
	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();

Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace
or is the Contig object not getting created?
Any ideas?

Thanx,

Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Mon Dec  3 21:15:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 20:15:58 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>

This seems similar to the 'too many open filehandles issue' documented  
here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

It unfortunately is due to having an open DB_File for every contig,  
and is a problem with the Bio::Assembly implementation that isn't  
easily fixed.  Changing the open filehandle limit using ulimit is the  
only known fix:

ulimit -n 10000

chris

On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:

> Hi all,
>
> It' trying to read .ace files but keep getting an error that I don't
> know the cause of.
> Really basic example code:
>
> 	#!/usr/local/bin/perl -w
>
> 	use lib "/data/home/smithiesr/bioperl-live";
> 	use Bio::Assembly::IO;
> 	use Data::Dumper;
>
> 	$ace = "CLP0001001240-cE15_20030319.ace";
>
> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> 	$assembly = $io->next_assembly;
>
> 	foreach $contig ($assembly->all_contigs) {
>      		print Dumper $contig;
> 	}
>
> Gives this error;
> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> 	Can't call method "get_consensus_sequence" on an undefined value
> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
> <GEN0> line 42.
>
> Which relates to this bit in ace.pm:
> 	# Loading contig qualities... (Base Quality field)
> 	/^BQ/ && do {
> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>
> Is this caused by a dud ace file or a problem with  
> Bio::Assembly::IO:ace
> or is the Contig object not getting created?
> Any ideas?
>
> Thanx,
>
> Russell Smithies
>
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From florent.angly at gmail.com  Mon Dec  3 21:25:24 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 03 Dec 2007 18:25:24 -0800
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <4754BA94.7090600@gmail.com>

Would this issue cause an excessive memory usage? Because I was getting 
a high memory usage when parsing some TIGR Assembler files and was 
wondering if the tigr parser was responsible for that or the parent 
assembly IO module.
I'd definitely be interested in a fix of the Bio::Assembly 
implementation if it's the assembly IO module's fault....
Florent

Chris Fields wrote:
> This seems similar to the 'too many open filehandles issue' documented  
> here:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>
> It unfortunately is due to having an open DB_File for every contig,  
> and is a problem with the Bio::Assembly implementation that isn't  
> easily fixed.  Changing the open filehandle limit using ulimit is the  
> only known fix:
>
> ulimit -n 10000
>
> chris
>
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>
>   
>> Hi all,
>>
>> It' trying to read .ace files but keep getting an error that I don't
>> know the cause of.
>> Really basic example code:
>>
>> 	#!/usr/local/bin/perl -w
>>
>> 	use lib "/data/home/smithiesr/bioperl-live";
>> 	use Bio::Assembly::IO;
>> 	use Data::Dumper;
>>
>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>
>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>> 	$assembly = $io->next_assembly;
>>
>> 	foreach $contig ($assembly->all_contigs) {
>>      		print Dumper $contig;
>> 	}
>>
>> Gives this error;
>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>> 	Can't call method "get_consensus_sequence" on an undefined value
>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
>> <GEN0> line 42.
>>
>> Which relates to this bit in ace.pm:
>> 	# Loading contig qualities... (Base Quality field)
>> 	/^BQ/ && do {
>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>
>> Is this caused by a dud ace file or a problem with  
>> Bio::Assembly::IO:ace
>> or is the Contig object not getting created?
>> Any ideas?
>>
>> Thanx,
>>
>> Russell Smithies
>>
>> Bioinformatics Software Developer
>> T +64 3 489 9085
>> E  russell.smithies at agresearch.co.nz
>>
>> Invermay  Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T  +64 3 489 3809
>> F  +64 3 489 9174
>> www.agresearch.co.nz
>>
>> = 
>> ======================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use of,  
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> = 
>> ======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From Russell.Smithies at agresearch.co.nz  Mon Dec  3 21:32:43 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 15:32:43 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>

Thanx Chris,
I'm only writing a simple .ace viewer to display assembled contigs in a
Bio::Graphics::Panel so I'll parse the coords from the .ace files
"manually".
Unless anyone else has a better idea ?
(and some example code ;-)

Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, 4 December 2007 3:16 p.m.
> To: Smithies, Russell
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
> 
> This seems similar to the 'too many open filehandles issue' documented
> here:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
> 
> It unfortunately is due to having an open DB_File for every contig,
> and is a problem with the Bio::Assembly implementation that isn't
> easily fixed.  Changing the open filehandle limit using ulimit is the
> only known fix:
> 
> ulimit -n 10000
> 
> chris
> 
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
> 
> > Hi all,
> >
> > It' trying to read .ace files but keep getting an error that I don't
> > know the cause of.
> > Really basic example code:
> >
> > 	#!/usr/local/bin/perl -w
> >
> > 	use lib "/data/home/smithiesr/bioperl-live";
> > 	use Bio::Assembly::IO;
> > 	use Data::Dumper;
> >
> > 	$ace = "CLP0001001240-cE15_20030319.ace";
> >
> > 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> > 	$assembly = $io->next_assembly;
> >
> > 	foreach $contig ($assembly->all_contigs) {
> >      		print Dumper $contig;
> > 	}
> >
> > Gives this error;
> > 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> > 	Can't call method "get_consensus_sequence" on an undefined value
> > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line
170,
> > <GEN0> line 42.
> >
> > Which relates to this bit in ace.pm:
> > 	# Loading contig qualities... (Base Quality field)
> > 	/^BQ/ && do {
> > 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
> >
> > Is this caused by a dud ace file or a problem with
> > Bio::Assembly::IO:ace
> > or is the Contig object not getting created?
> > Any ideas?
> >
> > Thanx,
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> > =
> >
> =============================================================
> =========
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> > =
> >
> =============================================================
> =========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Dec  4 00:10:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:10:57 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <4754BA94.7090600@gmail.com>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<4754BA94.7090600@gmail.com>
Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu>

Yes, it's possible this would cause memory issues as each  
Bio::Assembly::Contig instance would have a  
Bio::SeqFeature::Collection attached (each Collection having a tied DB  
hash, which would be an open filehandle),  So if you had over 1000  
contigs open at any one time (in a parsed scaffold, for instance) you  
would have 1000 open file handles.  Not very efficient.

My thought was to have each Bio::Assembly::Scaffold instance carry a  
single Bio::SeqFeature::CollectionI (it could be a  
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other  
CollectionI, whatever's easiest).  Each Contig would be passed (and  
store) a reference to the Scaffold SF::Collection and pull features  
from there; just haven't had time to mess with it.  I don't think  
anyone's tackling it, so feel free to code away!

chris

On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:

> Would this issue cause an excessive memory usage? Because I was  
> getting a high memory usage when parsing some TIGR Assembler files  
> and was wondering if the tigr parser was responsible for that or the  
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly  
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'  
>> documented  here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every  
>> contig,  and is a problem with the Bio::Assembly implementation  
>> that isn't  easily fixed.  Changing the open filehandle limit using  
>> ulimit is the  only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> 	#!/usr/local/bin/perl -w
>>>
>>> 	use lib "/data/home/smithiesr/bioperl-live";
>>> 	use Bio::Assembly::IO;
>>> 	use Data::Dumper;
>>>
>>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> 	$assembly = $io->next_assembly;
>>>
>>> 	foreach $contig ($assembly->all_contigs) {
>>>   		print Dumper $contig;
>>> 	}
>>>
>>> Gives this error;
>>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> 	Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line  
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> 	# Loading contig qualities... (Base Quality field)
>>> 	/^BQ/ && do {
>>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with   
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>> Attention: The information contained in this message and/or   
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or   
>>> privileged
>>> material. Any review, retransmission, dissemination or other use  
>>> of,  or
>>> taking of any action in reliance upon, this information by persons  
>>> or
>>> entities other than the intended recipients is prohibited by   
>>> AgResearch
>>> Limited. If you have received this message in error, please notify  
>>> the
>>> sender immediately.
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec  4 00:20:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:20:07 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
Message-ID: <C48EC1AC-FEA6-4F60-9791-D4DE449768C2@uiuc.edu>

The ulimit fix usually works but if this is for Gbrowse it probably  
isn't prudent.  It would be nice to get Bio::Assembly working as an  
Bio::AlignI; it would be easier to manipulate for display.  Here's a  
script I wrote up as an example:

http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics

chris

On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote:

> Thanx Chris,
> I'm only writing a simple .ace viewer to display assembled contigs  
> in a
> Bio::Graphics::Panel so I'll parse the coords from the .ace files
> "manually".
> Unless anyone else has a better idea ?
> (and some example code ;-)
>
> Russell


From avilella at gmail.com  Tue Dec  4 06:51:05 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 11:51:05 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the
	SLR program
Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>

Hi all,

There is a new wrapper in bioperl-run for SLR:

http://www.bioperl.org/wiki/SLR

Right now, output parsing is very simple, and I have only tested it on
my linux machine.
Can someone with a Mac give it a try?

update your bioperl-run to cvs head, then:

# try the installer, SLR is option 6
perl scripts/bioperl_application_installer.PLS
# then try to run the tests (should take about a minute)
perl t/SLR.t

Any comments on the code would be appreciated,

Thanks in advance,

Cheers,

    Albert.

From captainrave at hotmail.com  Tue Dec  4 06:04:57 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST)
Subject: [Bioperl-l]  extracting CDS location from Genbank
Message-ID: <14148723.post@talk.nabble.com>


Help.  I'm very new to perl and bioperl.  Basically I need to extract the
location of each CDS in a genbank entry e.g.103...120 and export them to an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 09:48:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 14:48:27 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14148723.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>

>From the SeqIO howto:

#!/bin/perl

use strict;
use Bio::SeqIO;

my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

>From the Feature HOWTO:

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {             
      print "  tag: ", $tag, "\n";             
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";             
      }          
   }       
}

Surely you could have fouind that yourself? ;0 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 11:05
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] extracting CDS location from Genbank


Help.  I'm very new to perl and bioperl.  Basically I need to extract
the
location of each CDS in a genbank entry e.g.103...120 and export them to
an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From captainrave at hotmail.com  Tue Dec  4 10:07:19 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152264.post@talk.nabble.com>


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example, how
do I tell it which input file to read? I know this might sound stupid, but I
dont understand the Biowiki very well!

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:21:34 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:21:34 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>

Post the script that produces that error, and your file's location 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:07
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example,
how
do I tell it which input file to read? I know this might sound stupid,
but I
dont understand the Biowiki very well!

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Tue Dec  4 10:39:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 15:39:31 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <475574B3.8050700@sendu.me.uk>

Captainrave wrote:
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------

The best way to get help is to give us your script and the error 
message, and the command you used to run your script. The less you know, 
the more you should give us (ie. don't edit anything out).

From captainrave at hotmail.com  Tue Dec  4 10:41:37 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152907.post@talk.nabble.com>


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:53:22 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:53:22 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk>

Same script as below, but try:

my $file = 'C:\path\to\my\filename.gbk'; 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:42
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For
example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
>
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Dec  4 11:20:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Dec 2007 10:20:34 -0600
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <C2732712-D32B-449A-8BCA-DCB8BBDE9758@uiuc.edu>

The 'my $file = shift;' is a perl idiom.  The built-in 'shift' used  
implicitly in this way uses @ARGV (from command line); the file would  
the be passed as the first arg when running the script:

get_features.pl myfile.gb

This should work for any OS.  Personally, I use something like the  
following to indicate how the script is used in case a file is never  
entered:

my $USAGE = <<END_USE;
USAGE: get_features.pl <file>
Perl script to grab features from a GenBank file and print to a table
END_USE

my $file = shift || die $USAGE;

chris

On Dec 4, 2007, at 9:41 AM, Captainrave wrote:

>
> #!/bin/perl
>
> use strict;
> use Bio::SeqIO;
> my $file = shift; # get the file name, somehow
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>   print "primary tag: ", $feat_object->primary_tag, "\n";
>   for my $tag ($feat_object->get_all_tags) {
>      print "  tag: ", $tag, "\n";
>      for my $value ($feat_object->get_tag_values($tag)) {
>
>         print "    value: ", $value, "\n";
>      }
>   }
> }
>
> exit;
>
> The file is on the same folder.  But how do I tell it to use this  
> file?
>
>
>
> michael watson (IAH-C) wrote:
>>
>> Post the script that produces that error, and your file's location
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>> Captainrave
>> Sent: 04 December 2007 15:07
>> To: Bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
>>
>>
>> Yes but actually implementing it is another story.
>>
>> I get an error:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: file argument provided, but with an undefined value
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
>> STACK: test3.pl:7
>> -----------------------------------------------------------
>>
>> Basically because I dont understand the code well enough.  For  
>> example,
>> how
>> do I tell it which input file to read? I know this might sound  
>> stupid,
>> but I
>> dont understand the Biowiki very well!
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
>> l#a14152264
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Dec  4 11:22:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 16:22:12 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>	<14152264.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <47557EB4.10003@sendu.me.uk>

Captainrave wrote:
> #!/bin/perl
> my $file = shift; # get the file name, somehow
>
> The file is on the same folder.  But how do I tell it to use this file?

http://stein.cshl.org/genome_informatics/perl_intro/command_line.html

Basically, when you run your script add the name of the file to your 
command line.

me% perl myscript.pl myfile

By saying 'my $file = shift' inside myscript.pl, the variable $file now 
contains the filename 'myfile'.

You could also have hardcoded the filename:
my $file = 'myfile';


Anyway, you're going to run into lots of these issues, and they're 
beyond the scope of this mailing list. For basic perl problems seek help 
via www.perl.org. When you have a BioPerl-specific question, don't 
hesitate to post here.

From jason at bioperl.org  Tue Dec  4 12:16:30 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 09:16:30 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>

Excellent - thanks for this !  I'm giving it whirl on linux and the  
SLR.t test is currently taking more than 30 minutes to run -- is it  
possible to cook up an example that is going to finish in a more  
reasonable amount of time?

Also - I would prefer if the default exe could be 'Slr' rather than  
Slr_Linux_static - it seems like it is possible for users to install  
it this way.  Similarly whether or not the Slr_osx or Slr is the  
default name, is it too big of a deal to expect the user to rename it?

I'll give it a whirl on OSX later, but might be easier if the test  
runs shorter.

Thanks!
-jason
On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:

> Hi all,
>
> There is a new wrapper in bioperl-run for SLR:
>
> http://www.bioperl.org/wiki/SLR
>
> Right now, output parsing is very simple, and I have only tested it on
> my linux machine.
> Can someone with a Mac give it a try?
>
> update your bioperl-run to cvs head, then:
>
> # try the installer, SLR is option 6
> perl scripts/bioperl_application_installer.PLS
> # then try to run the tests (should take about a minute)
> perl t/SLR.t
>
> Any comments on the code would be appreciated,
>
> Thanks in advance,
>
> Cheers,
>
>     Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Dec  4 13:17:08 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 04 Dec 2007 10:17:08 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <475599A4.1040500@gmail.com>

Hi all,
I pushed a new module into bioperl-run CVS a few days ago. It's called 
Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an 
open-source software that assembles DNA sequences.
Input is a list of sequence objects and output assembly objects... easy 
enough...
Let me know if you experience problems with it.
Florent


From jason at bioperl.org  Tue Dec  4 13:51:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 10:51:34 -0800
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com>

You can pass in an array reference of sequences instead of a single sequence
object and the module will build a multi-FASTA database.  You can also pass
in a filename instead of a Sequence object and the file can be an already
built multi-FASTA database.  This is described in the documentation:

http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall

You can also just run BLAST without StandAloneBlast part as I do an just
build your multifile ahead of time with SeqIO and do
# wublast
my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |";
# or NCBI blast
# my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |";
my $fh;

open($fh, $cmd)
my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh);

The advantage of StandAloneBlast in theory is it takes care of the temporary
file creation (sequncefiles) and cleanup.  Personally I find I want easier
access to my programs that are simple cmdline like this.  You can do similar
things withe SSEARCH or FASTA searching too.

-jason

On Dec 3, 2007 4:05 PM, Sendu Bala <bix at sendu.me.uk> wrote:

> Sven Boekhoff wrote:
> > HI!
> > I just started working with Perl and BioPerl. I'm quite impressed what
> > can be easily done with this module. Today I found that my second CPU
> > ist not used, but the first one run's at 100%. I tried to include the
> > "-a"-parameter, but I was not successful:
> >
> > my @params = (
> >       -database => 'my_db',
> >       -a => '2',
> >       -outfile => 'blast1.out'
> > );
> >
> > How do I have to use it?
>
> This should work in the CVS version of StandAloneBlast. In other
> versions, perhaps try using $object->a(2);
>
>
> > Second question: In my perlscript I start BLAST-searches in a loop.
> > Everytime BLAST has finished its search, the memory is cleared and BLAST
> > is started again. I think most of the time is used to reload the
> > database. Is it somehow possible to keep the database loaded (e.g. by
> > starting a second search) or is BLAST reloaded anyway?
>
> I hope someone will correct me for being wrong, but I think you'd have
> to that with a 2-way pipe. StandAloneBlast only uses output to a file
> and input from that file, finishing with the executable inbetween. I've
> thought about improving it with a 2-way pipe, but never got around to
> it, being apprehensive about stability on all platforms.
>
> The more obvious solution, which may be possible depending on exactly
> what you're doing, is to avoid the loop and just supply Blast all your
> input in one go.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason

From stefan.kirov at bms.com  Tue Dec  4 14:25:21 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 14:25:21 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
Message-ID: <4755A9A1.2040608@bms.com>

Jason Stajich wrote:
> PAML4 breaks our PAML parser right now because the order of things in  
> the result file has changed.  Now sequences precede the information  
> about the version or the program run.  This means that $result- 
>  >get_seqs() fails because we don't parse the sequences.
>
> We'll see what we can do, but as usual with supporting 3rd party  
> programs it is brittle when file formats change.  Th
>
> -jason
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
Jason,
I saw a commit after this post on codeml, but not on PAML.pm- I assume
this is not fixed, am I correct?
Thanks!
Stefan

From avilella at gmail.com  Tue Dec  4 15:34:38 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:34:38 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>

hmmm, 30 minutes is quite a lot... it takes much less for me:

avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
1..7
ok 1 - use Bio::Root::IO;
ok 2 - use Bio::Tools::Run::Phylo::SLR;
ok 3 - use Bio::AlignIO;
ok 4 - use Bio::TreeIO;
ok 5
ok 6
ok 7

real    0m21.517s
user    0m20.717s
sys     0m0.100s


On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> Excellent - thanks for this !  I'm giving it whirl on linux and the
> SLR.t test is currently taking more than 30 minutes to run -- is it
> possible to cook up an example that is going to finish in a more
> reasonable amount of time?
>
> Also - I would prefer if the default exe could be 'Slr' rather than
> Slr_Linux_static - it seems like it is possible for users to install
> it this way.  Similarly whether or not the Slr_osx or Slr is the
> default name, is it too big of a deal to expect the user to rename it?
>
> I'll give it a whirl on OSX later, but might be easier if the test
> runs shorter.
>
> Thanks!
> -jason
>
> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>
> > Hi all,
> >
> > There is a new wrapper in bioperl-run for SLR:
> >
> > http://www.bioperl.org/wiki/SLR
> >
> > Right now, output parsing is very simple, and I have only tested it on
> > my linux machine.
> > Can someone with a Mac give it a try?
> >
> > update your bioperl-run to cvs head, then:
> >
> > # try the installer, SLR is option 6
> > perl scripts/bioperl_application_installer.PLS
> > # then try to run the tests (should take about a minute)
> > perl t/SLR.t
> >
> > Any comments on the code would be appreciated,
> >
> > Thanks in advance,
> >
> > Cheers,
> >
> >     Albert.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From avilella at gmail.com  Tue Dec  4 15:39:26 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:39:26 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>

oh, I forgot to mention: SLR uses the lapack and blas libraries if
installed, which makes it a lot faster (according to the author)...
maybe that's the reason...

On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
> hmmm, 30 minutes is quite a lot... it takes much less for me:
>
> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
> 1..7
> ok 1 - use Bio::Root::IO;
> ok 2 - use Bio::Tools::Run::Phylo::SLR;
> ok 3 - use Bio::AlignIO;
> ok 4 - use Bio::TreeIO;
> ok 5
> ok 6
> ok 7
>
> real    0m21.517s
> user    0m20.717s
> sys     0m0.100s
>
>
>
> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> > Excellent - thanks for this !  I'm giving it whirl on linux and the
> > SLR.t test is currently taking more than 30 minutes to run -- is it
> > possible to cook up an example that is going to finish in a more
> > reasonable amount of time?
> >
> > Also - I would prefer if the default exe could be 'Slr' rather than
> > Slr_Linux_static - it seems like it is possible for users to install
> > it this way.  Similarly whether or not the Slr_osx or Slr is the
> > default name, is it too big of a deal to expect the user to rename it?
> >
> > I'll give it a whirl on OSX later, but might be easier if the test
> > runs shorter.
> >
> > Thanks!
> > -jason
> >
> > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
> >
> > > Hi all,
> > >
> > > There is a new wrapper in bioperl-run for SLR:
> > >
> > > http://www.bioperl.org/wiki/SLR
> > >
> > > Right now, output parsing is very simple, and I have only tested it on
> > > my linux machine.
> > > Can someone with a Mac give it a try?
> > >
> > > update your bioperl-run to cvs head, then:
> > >
> > > # try the installer, SLR is option 6
> > > perl scripts/bioperl_application_installer.PLS
> > > # then try to run the tests (should take about a minute)
> > > perl t/SLR.t
> > >
> > > Any comments on the code would be appreciated,
> > >
> > > Thanks in advance,
> > >
> > > Cheers,
> > >
> > >     Albert.
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>

From jason at bioperl.org  Tue Dec  4 16:43:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:43:03 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
	<358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org>

My own icc compiled version seemed to have caused the problem.  
whoops. fixed that.
-jason
On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote:

> oh, I forgot to mention: SLR uses the lapack and blas libraries if
> installed, which makes it a lot faster (according to the author)...
> maybe that's the reason...
>
> On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
>> hmmm, 30 minutes is quite a lot... it takes much less for me:
>>
>> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
>> 1..7
>> ok 1 - use Bio::Root::IO;
>> ok 2 - use Bio::Tools::Run::Phylo::SLR;
>> ok 3 - use Bio::AlignIO;
>> ok 4 - use Bio::TreeIO;
>> ok 5
>> ok 6
>> ok 7
>>
>> real    0m21.517s
>> user    0m20.717s
>> sys     0m0.100s
>>
>>
>>
>> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> Excellent - thanks for this !  I'm giving it whirl on linux and the
>>> SLR.t test is currently taking more than 30 minutes to run -- is it
>>> possible to cook up an example that is going to finish in a more
>>> reasonable amount of time?
>>>
>>> Also - I would prefer if the default exe could be 'Slr' rather than
>>> Slr_Linux_static - it seems like it is possible for users to install
>>> it this way.  Similarly whether or not the Slr_osx or Slr is the
>>> default name, is it too big of a deal to expect the user to  
>>> rename it?
>>>
>>> I'll give it a whirl on OSX later, but might be easier if the test
>>> runs shorter.
>>>
>>> Thanks!
>>> -jason
>>>
>>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is a new wrapper in bioperl-run for SLR:
>>>>
>>>> http://www.bioperl.org/wiki/SLR
>>>>
>>>> Right now, output parsing is very simple, and I have only tested  
>>>> it on
>>>> my linux machine.
>>>> Can someone with a Mac give it a try?
>>>>
>>>> update your bioperl-run to cvs head, then:
>>>>
>>>> # try the installer, SLR is option 6
>>>> perl scripts/bioperl_application_installer.PLS
>>>> # then try to run the tests (should take about a minute)
>>>> perl t/SLR.t
>>>>
>>>> Any comments on the code would be appreciated,
>>>>
>>>> Thanks in advance,
>>>>
>>>> Cheers,
>>>>
>>>>     Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>


From stefan.kirov at bms.com  Tue Dec  4 16:51:51 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 16:51:51 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4755CBF7.5010709@bms.com>

Jason Stajich wrote:
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
Yes, this is the version I have and in some cases the sequences do not
get parsed. I have missed this commit. I will try to assemble a testcase
and send it. Cannot promise when but will try to do it tomorrow. My gut
feeling so far is that the parser works whenever there are gaps in the
alignment, otherwise it does not. PAML surely has very peculiar format.
Thanks again!
Stefan
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From jason at bioperl.org  Tue Dec  4 16:36:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:36:09 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4755A9A1.2040608@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
Message-ID: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>

should be fixed.

$ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
revision 1.56
date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
Parsing PAML4 and PAML3.15 should work now.  Dealing with variable  
order for the sequences and summary results in
the top of the MLC files

On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:

> Jason Stajich wrote:
>> PAML4 breaks our PAML parser right now because the order of things in
>> the result file has changed.  Now sequences precede the information
>> about the version or the program run.  This means that $result-
>>> get_seqs() fails because we don't parse the sequences.
>>
>> We'll see what we can do, but as usual with supporting 3rd party
>> programs it is brittle when file formats change.  Th
>>
>> -jason
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> Jason,
> I saw a commit after this post on codeml, but not on PAML.pm- I assume
> this is not fixed, am I correct?
> Thanks!
> Stefan


From johan.nilsson at sh.se  Wed Dec  5 06:35:58 2007
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Wed, 5 Dec 2007 12:35:58 +0100
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
Message-ID: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>


Hello,

I have a bunch of multiple sequence alignments of protein coding genes,
which I would like to analyse with the SLAC method of the HyPhy package. I
tried using the SLAC.pm module in bioperl-run, but I could not get it to
work properly.

Basically, for each MSA file, I create the Bio::Tree::Tree and
Bio::SimpleAlign objects ($tree and $aln, respectively) required as
arguments to SLAC, and call the method with: "($rc,$result) =
$slac->run($aln,$tree)" in a loop procedure in my script.

When I choose not to save the tmp files (the default option in SLAC.pm),
the program complains that it cannot find the file
"$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
(which works fine). Apparently, it looks for the wrapper.bf file in the
first tmp dir created, which is deleted in the end of the first SLAC call.

If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
all calls to SLAC give returncode 1, and no error message is received.
However, when I look at the resulting $result hashref, it turns out that
all results are for the FIRST alignment read. I've made sure there is
nothing strange with my loop procedure, and I checked that the tree and
alignment objects look OK for each MSA. Apparently, it does create new
"results.tsv" files in the tmp directory after each run, but it is
identical each time it's created. Also, it only creates ONE tmp directory,
no matter how many times SLAC is executed (I would imagine it was supposed
to save each result in separate tmp dirs?)

Thus, it seems to me like the errors occur because something goes wrong in
the creation of temporary files. Have I done something wrong here, or have
any other of you experienced the same problem?

Best regards
/Johan


--
Johan Nilsson, Ph.D.
School of Life Sciences
S?dert?rns University College
S-141 89 Huddinge, Sweden
E-mail: johan.nilsson at sh.se
Phone: +46 8 608 47 05, +46 70 456 10 51


From bernd.web at gmail.com  Wed Dec  5 08:10:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 5 Dec 2007 14:10:04 +0100
Subject: [Bioperl-l] SimpleAlign is_flush
Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>

Hi,

SimpleAlign has an is_flush:
 Function  : Tells you whether the alignment is flush, i.e. all of the
same length
 Returns   : 1 or 0

I  noticed that a file with multiple fasta sequences with different
lengths has an is_flush  value of 1. Printing the "alignment" shows
that sequences are appended with "-" so that the all are the same
length. Does this mean that is_flush for alignments read in via
AlignIO is indeed always true and thus as such a so useful ?

(using bioperl version: 1.005002102)


Regards,
Bernd

From cjfields at uiuc.edu  Wed Dec  5 08:53:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 07:53:59 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>

Yes; it's a convenient way to make sure all seqs have the same length  
(including gaps).  Nice for checking when adding new seqs to an  
alignment or building new parsers.

chris

On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:

> Hi,
>
> SimpleAlign has an is_flush:
> Function  : Tells you whether the alignment is flush, i.e. all of the
> same length
> Returns   : 1 or 0
>
> I  noticed that a file with multiple fasta sequences with different
> lengths has an is_flush  value of 1. Printing the "alignment" shows
> that sequences are appended with "-" so that the all are the same
> length. Does this mean that is_flush for alignments read in via
> AlignIO is indeed always true and thus as such a so useful ?
>
> (using bioperl version: 1.005002102)
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From captainrave at hotmail.com  Wed Dec  5 07:37:02 2007
From: captainrave at hotmail.com (Captainrave)
Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <475574B3.8050700@sendu.me.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk>
Message-ID: <14170499.post@talk.nabble.com>


Thanks, it works great now.

Do any of you know if there is a tag to pull out CDS location. i.e. the
values such as 132...145 etc?  Those are all I need.  Also, is there anyway
to stop it reporting tag and value and literally JUST output the value?

Thanks!!!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From stefan.kirov at bms.com  Wed Dec  5 09:24:20 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:24:20 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4756B494.7020100@bms.com>

Jason,
When there is a gapless alignment we have a differently formatted output
from codeml:
kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc

seed used = 492211105
      3    141

ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA

And parsing this fails...
The next one has gaps and works fine:

kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc

seed used = 492252697

Before deleting alignment gaps
      2    162

ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
CTT GGT TCA GGA GGT CAG TTC CTG
ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
CCT GGT ACA GGA AAC AAG CTT CTG

I will send both whole files as an attachment with another mail (I do
not know if these are going to pass through).
My guess is that the whole _parse_summary method has to be re-worked as
there is no tag to look for before the sequences start. Ugly.
I am not sure what else could become broken if I try to fix it, so I
will leave it to you.
Stefan
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From stefan.kirov at bms.com  Wed Dec  5 09:35:23 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:35:23 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4756B72B.6000103@bms.com>

Here are the files.
Stefan
Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc.tar.gz
Type: application/x-gzip
Size: 3237 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment.gz 

From aaron.j.mackey at gsk.com  Wed Dec  5 09:56:31 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 5 Dec 2007 09:56:31 -0500
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>
Message-ID: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>

Well, if you use AlignIO::fasta to read in a multi-fasta file of 
*unaligned* sequences, AlignIO::fasta makes the assumption that all of 
your sequences are aligned, and pads the ends of shorter sequences with 
gap characters (essentially, enforcing a rather silly, yet valid 
alignment).  The fact that is_flush() then returns 1 is secondary.

If you just want to read in an array of unaligned sequences, use 
SeqIO::fasta instead.  It doesn't really make much sense to use AlignIO 
for sequences that are not aligned ... conversely, if you *do* have 
aligned sequences in a multi-fasta file, then it does make sense to use 
AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences 
with gaps as necessary to get a fully valid, flush multiple sequence 
alignment matrix.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:

> Yes; it's a convenient way to make sure all seqs have the same length 
> (including gaps).  Nice for checking when adding new seqs to an 
> alignment or building new parsers.
> 
> chris
> 
> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
> 
> > Hi,
> >
> > SimpleAlign has an is_flush:
> > Function  : Tells you whether the alignment is flush, i.e. all of the
> > same length
> > Returns   : 1 or 0
> >
> > I  noticed that a file with multiple fasta sequences with different
> > lengths has an is_flush  value of 1. Printing the "alignment" shows
> > that sequences are appended with "-" so that the all are the same
> > length. Does this mean that is_flush for alignments read in via
> > AlignIO is indeed always true and thus as such a so useful ?
> >
> > (using bioperl version: 1.005002102)
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Dec  5 11:22:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 10:22:01 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
References: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
Message-ID: <EC064917-220F-4579-8FA9-934026D7D105@uiuc.edu>

That's true.  I assumed Bernd's seqs were aligned.

chris

On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote:

> Well, if you use AlignIO::fasta to read in a multi-fasta file of
> *unaligned* sequences, AlignIO::fasta makes the assumption that all of
> your sequences are aligned, and pads the ends of shorter sequences  
> with
> gap characters (essentially, enforcing a rather silly, yet valid
> alignment).  The fact that is_flush() then returns 1 is secondary.
>
> If you just want to read in an array of unaligned sequences, use
> SeqIO::fasta instead.  It doesn't really make much sense to use  
> AlignIO
> for sequences that are not aligned ... conversely, if you *do* have
> aligned sequences in a multi-fasta file, then it does make sense to  
> use
> AlignIO, and it also makes sense for AlignIO::fasta to end-pad  
> sequences
> with gaps as necessary to get a fully valid, flush multiple sequence
> alignment matrix.
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:
>
>> Yes; it's a convenient way to make sure all seqs have the same length
>> (including gaps).  Nice for checking when adding new seqs to an
>> alignment or building new parsers.
>>
>> chris
>>
>> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
>>
>>> Hi,
>>>
>>> SimpleAlign has an is_flush:
>>> Function  : Tells you whether the alignment is flush, i.e. all of  
>>> the
>>> same length
>>> Returns   : 1 or 0
>>>
>>> I  noticed that a file with multiple fasta sequences with different
>>> lengths has an is_flush  value of 1. Printing the "alignment" shows
>>> that sequences are appended with "-" so that the all are the same
>>> length. Does this mean that is_flush for alignments read in via
>>> AlignIO is indeed always true and thus as such a so useful ?
>>>
>>> (using bioperl version: 1.005002102)
>>>
>>>
>>> Regards,
>>> Bernd
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Dec  5 14:56:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 14:56:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4757027F.407@bms.com>

Here is a patch that seems to be working and does not break the existing
tests:

--- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
10:16:53.120720000 -0500
+++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm  
2007-12-05 14:46:31.436278000 -0500
@@ -419,7 +419,10 @@
     # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
 
     my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
YN00 )x;
+    my $line;
+    $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0;
     while ($_ = $self->_readline) {
+           $line++;
        if ( m/^($SEQTYPES) \s+                      # seqtype: CODONML,
AAML, BASEML, CODON2AAML, YN00, etc
               (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
3.12 February 2002"; not present < 3.1 or YN00
               (\S+) \s*                             # tree filename
@@ -436,8 +439,11 @@
        } elsif (m/^Data set \d$/) {
            $self->{'_summary'} = {};
            $self->{'_summary'}->{'multidata'}++;
-       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
-           my ($phylip_header) = $self->_readline;
+       }
+       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
+               my ($phylip_header) = $self->_readline;
+               $self->_parse_seqs;
+       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap
            $self->_parse_seqs;
        }
     }
@@ -681,7 +687,6 @@
 }
 
 sub _parse_seqs {
-
     # this should in fact be packed into a Bio::SimpleAlign object
instead of
     # an array but we'll stay with this for now
     my ($self) = @_;


What this does is trigger sequence parsing if the /Before.../ pattern is
not seen until line 4. Since phylip_header seems to be doing nothing one
could completely eliminate the first seq parse elsif (even though
counting lines is not a good thing).
 Since I am not aware of all consequences of changing the sequence
parsing and I have no idea how extensive the tests are, I am not
committing anything, but feel free to use that if you wish.
Stefan

Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From jason at bioperl.org  Wed Dec  5 15:01:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Dec 2007 12:01:29 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4757027F.407@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>

sounds good - can you
- make it as a bug with the patch and sample files in bugzilla
- commit changes and I'll test as well

thanks,
-j

On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:

> Here is a patch that seems to be working and does not break the  
> existing
> tests:
>
> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
> 10:16:53.120720000 -0500
> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
> 2007-12-05 14:46:31.436278000 -0500
> @@ -419,7 +419,10 @@
>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>
>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
> YN00 )x;
> +    my $line;
> +    $self->{'_already_parsed_seqs'}=$self-> 
> {'_already_parsed_seqs'}?1:0;
>      while ($_ = $self->_readline) {
> +           $line++;
>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
> CODONML,
> AAML, BASEML, CODON2AAML, YN00, etc
>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
> 3.12 February 2002"; not present < 3.1 or YN00
>                (\S+) \s*                             # tree filename
> @@ -436,8 +439,11 @@
>         } elsif (m/^Data set \d$/) {
>             $self->{'_summary'} = {};
>             $self->{'_summary'}->{'multidata'}++;
> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
> -           my ($phylip_header) = $self->_readline;
> +       }
> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
> +               my ($phylip_header) = $self->_readline;
> +               $self->_parse_seqs;
> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
> {#No gap
>             $self->_parse_seqs;
>         }
>      }
> @@ -681,7 +687,6 @@
>  }
>
>  sub _parse_seqs {
> -
>      # this should in fact be packed into a Bio::SimpleAlign object
> instead of
>      # an array but we'll stay with this for now
>      my ($self) = @_;
>
>
> What this does is trigger sequence parsing if the /Before.../  
> pattern is
> not seen until line 4. Since phylip_header seems to be doing  
> nothing one
> could completely eliminate the first seq parse elsif (even though
> counting lines is not a good thing).
>  Since I am not aware of all consequences of changing the sequence
> parsing and I have no idea how extensive the tests are, I am not
> committing anything, but feel free to use that if you wish.
> Stefan
>
> Stefan Kirov wrote:
>> Jason,
>> When there is a gapless alignment we have a differently formatted  
>> output
>> from codeml:
>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>
>> seed used = 492211105
>>       3    141
>>
>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>> ACC CAC
>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGT CTG
>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>> ACC CAC
>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGC CTG
>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>> ACC CAC
>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>> AGC ATG
>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>> ACC CTC ATA
>>
>> And parsing this fails...
>> The next one has gaps and works fine:
>>
>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>
>> seed used = 492252697
>>
>> Before deleting alignment gaps
>>       2    162
>>
>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>> GCA GAA
>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>> CCG AAC
>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>> GAT CTC
>> CTT GGT TCA GGA GGT CAG TTC CTG
>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>> GCA GAA
>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>> CCA ACT
>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>> --- ATT
>> CCT GGT ACA GGA AAC AAG CTT CTG
>>
>> I will send both whole files as an attachment with another mail (I do
>> not know if these are going to pass through).
>> My guess is that the whole _parse_summary method has to be re- 
>> worked as
>> there is no tag to look for before the sequences start. Ugly.
>> I am not sure what else could become broken if I try to fix it, so I
>> will leave it to you.
>> Stefan
>>
>>> should be fixed.
>>>
>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>> revision 1.56
>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>> +21 -14
>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>> order for the sequences and summary results in
>>> the top of the MLC files
>>>
>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>
>>>
>>>> Jason Stajich wrote:
>>>>
>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>> things in
>>>>> the result file has changed.  Now sequences precede the  
>>>>> information
>>>>> about the version or the program run.  This means that $result-
>>>>>
>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>
>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>> programs it is brittle when file formats change.  Th
>>>>>
>>>>> -jason
>>>>>
>>>>> -- 
>>>>> Jason Stajich
>>>>> jason at bioperl.org
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> Jason,
>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>> assume
>>>> this is not fixed, am I correct?
>>>> Thanks!
>>>> Stefan
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From stefan.kirov at bms.com  Wed Dec  5 15:33:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 15:33:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
	<8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
Message-ID: <47570B2B.5090602@bms.com>

Done.

Jason Stajich wrote:
> sounds good - can you
> - make it as a bug with the patch and sample files in bugzilla
> - commit changes and I'll test as well
>
> thanks,
> -j
>
> On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:
>
>   
>> Here is a patch that seems to be working and does not break the  
>> existing
>> tests:
>>
>> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
>> 10:16:53.120720000 -0500
>> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
>> 2007-12-05 14:46:31.436278000 -0500
>> @@ -419,7 +419,10 @@
>>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>>
>>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
>> YN00 )x;
>> +    my $line;
>> +    $self->{'_already_parsed_seqs'}=$self-> 
>> {'_already_parsed_seqs'}?1:0;
>>      while ($_ = $self->_readline) {
>> +           $line++;
>>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
>> CODONML,
>> AAML, BASEML, CODON2AAML, YN00, etc
>>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
>> 3.12 February 2002"; not present < 3.1 or YN00
>>                (\S+) \s*                             # tree filename
>> @@ -436,8 +439,11 @@
>>         } elsif (m/^Data set \d$/) {
>>             $self->{'_summary'} = {};
>>             $self->{'_summary'}->{'multidata'}++;
>> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
>> -           my ($phylip_header) = $self->_readline;
>> +       }
>> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
>> +               my ($phylip_header) = $self->_readline;
>> +               $self->_parse_seqs;
>> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
>> {#No gap
>>             $self->_parse_seqs;
>>         }
>>      }
>> @@ -681,7 +687,6 @@
>>  }
>>
>>  sub _parse_seqs {
>> -
>>      # this should in fact be packed into a Bio::SimpleAlign object
>> instead of
>>      # an array but we'll stay with this for now
>>      my ($self) = @_;
>>
>>
>> What this does is trigger sequence parsing if the /Before.../  
>> pattern is
>> not seen until line 4. Since phylip_header seems to be doing  
>> nothing one
>> could completely eliminate the first seq parse elsif (even though
>> counting lines is not a good thing).
>>  Since I am not aware of all consequences of changing the sequence
>> parsing and I have no idea how extensive the tests are, I am not
>> committing anything, but feel free to use that if you wish.
>> Stefan
>>
>> Stefan Kirov wrote:
>>     
>>> Jason,
>>> When there is a gapless alignment we have a differently formatted  
>>> output
>>> from codeml:
>>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>>
>>> seed used = 492211105
>>>       3    141
>>>
>>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>>> ACC CAC
>>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGT CTG
>>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>>> ACC CAC
>>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGC CTG
>>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>>> ACC CAC
>>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>>> AGC ATG
>>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>>> ACC CTC ATA
>>>
>>> And parsing this fails...
>>> The next one has gaps and works fine:
>>>
>>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>>
>>> seed used = 492252697
>>>
>>> Before deleting alignment gaps
>>>       2    162
>>>
>>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>>> GCA GAA
>>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>>> CCG AAC
>>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>>> GAT CTC
>>> CTT GGT TCA GGA GGT CAG TTC CTG
>>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>>> GCA GAA
>>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>>> CCA ACT
>>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>>> --- ATT
>>> CCT GGT ACA GGA AAC AAG CTT CTG
>>>
>>> I will send both whole files as an attachment with another mail (I do
>>> not know if these are going to pass through).
>>> My guess is that the whole _parse_summary method has to be re- 
>>> worked as
>>> there is no tag to look for before the sequences start. Ugly.
>>> I am not sure what else could become broken if I try to fix it, so I
>>> will leave it to you.
>>> Stefan
>>>
>>>       
>>>> should be fixed.
>>>>
>>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>>> revision 1.56
>>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>>> +21 -14
>>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>>> order for the sequences and summary results in
>>>> the top of the MLC files
>>>>
>>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>>
>>>>
>>>>         
>>>>> Jason Stajich wrote:
>>>>>
>>>>>           
>>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>>> things in
>>>>>> the result file has changed.  Now sequences precede the  
>>>>>> information
>>>>>> about the version or the program run.  This means that $result-
>>>>>>
>>>>>>             
>>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>>
>>>>>>>               
>>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>>> programs it is brittle when file formats change.  Th
>>>>>>
>>>>>> -jason
>>>>>>
>>>>>> -- 
>>>>>> Jason Stajich
>>>>>> jason at bioperl.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Jason,
>>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>>> assume
>>>>> this is not fixed, am I correct?
>>>>> Thanks!
>>>>> Stefan
>>>>>
>>>>>           
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From bernd.web at gmail.com  Thu Dec  6 09:58:31 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 6 Dec 2007 15:58:31 +0100
Subject: [Bioperl-l] graphics - Panel
Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com>

Hi,

For map $segstart is available. This holds the left most start of the
feature (The left end of $ref displayed in the detailed view).
However, is it accessible also for track coderefs?
I'd like to access it in add_track, like
  -bgcolor => sub {
 				my $feature = shift;
                                my $start = $feature->segstart;			
                                 ....
                                 do something with the segstart
                                  },

I realize I can add a -tag which holds the left most start of by
segmented feature, and then get it out in from $feature, but I wonder
if the $segstart can also be accessed in the coderef some how.

Does someone know this?

Best regards,
Bernd

From georose at gmail.com  Thu Dec  6 10:28:24 2007
From: georose at gmail.com (geo rose)
Date: Thu, 6 Dec 2007 08:28:24 -0700
Subject: [Bioperl-l] getting sequences from external databank
Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>

Hi Bioperl,

In the past, I have been able to retrieve sequences from an external
databank, but my scripts are not working anymore.
I am afraid that I may have broken my Bioperl installation while updating my
Fedora7 machine with yum update.

Below is an example of what happens.

The script is from
http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and
it works.
(I used it on an older machine with Bioperl and MacOS Tiger)

__________________________________________________________________________________
#!/usr/bin/perl -w

use Bio::SeqIO;
use Bio::DB::GenBank;

$genBank = new Bio::DB::GenBank;  # This object knows how to talk to GenBank

my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by accession


my $seqOut = new Bio::SeqIO(-format => 'genbank');

$seqOut->write_seq($seq);


_________________________________________________________________________________________
This is the error I get
_________________________________________________________________________________________

[home at home Desktop]# perl final-seq-db-test1.pl
Bio::SeqIO: genbank cannot be found
Exception
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not
implemented in the version of perl at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.

STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Root::Root::_load_module
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
STACK: Bio::SeqIO::_load_format_module
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------

For more information about the SeqIO system please see the SeqIO docs.
This includes ways of checking for formats at compile time, not run time

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc AF060485 does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------
[home at home Desktop]# Use of uninitialized value in concatenation (.) or
string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm
line 30.

[home at home Desktop]#


________________________________________________________________________________________


Before I mess things up further I thought I'd ask:
Can I fix this problem by reinstalling some part of Bioperl or Perl?

Thanks,

George

From barry.moore at genetics.utah.edu  Thu Dec  6 12:56:50 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 6 Dec 2007 10:56:50 -0700
Subject: [Bioperl-l] getting sequences from external databank
In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
Message-ID: <B13872F3-4591-4FB6-B057-9215C5DA9059@genetics.utah.edu>

George,

This is a hideous little bug in Red Hat/Fedora installations of  
perl.  It's happened to me a couple time on upgrades, but it's always  
fixed with

perl -MCPAN -e shell
force install Scalar::Util

http://www.perlmonks.org/?node_id=460411

Barry

On Dec 6, 2007, at 8:28 AM, geo rose wrote:

> Hi Bioperl,
>
> In the past, I have been able to retrieve sequences from an external
> databank, but my scripts are not working anymore.
> I am afraid that I may have broken my Bioperl installation while  
> updating my
> Fedora7 machine with yum update.
>
> Below is an example of what happens.
>
> The script is from
> http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ 
> node2.html and
> it works.
> (I used it on an older machine with Bioperl and MacOS Tiger)
>
> ______________________________________________________________________ 
> ____________
> #!/usr/bin/perl -w
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $genBank = new Bio::DB::GenBank;  # This object knows how to talk  
> to GenBank
>
> my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by  
> accession
>
>
> my $seqOut = new Bio::SeqIO(-format => 'genbank');
>
> $seqOut->write_seq($seq);
>
>
> ______________________________________________________________________ 
> ___________________
> This is the error I get
> ______________________________________________________________________ 
> ___________________
>
> [home at home Desktop]# perl final-seq-db-test1.pl
> Bio::SeqIO: genbank cannot be found
> Exception
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are  
> not
> implemented in the version of perl at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Root::Root::_load_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
> STACK: Bio::SeqIO::_load_format_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
> STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
>
> For more information about the SeqIO system please see the SeqIO docs.
> This includes ways of checking for formats at compile time, not run  
> time
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc AF060485 does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
> [home at home Desktop]# Use of uninitialized value in concatenation  
> (.) or
> string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ 
> Util.pm
> line 30.
>
> [home at home Desktop]#
>
>
> ______________________________________________________________________ 
> __________________
>
>
> Before I mess things up further I thought I'd ask:
> Can I fix this problem by reinstalling some part of Bioperl or Perl?
>
> Thanks,
>
> George
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Dec  6 18:58:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 7 Dec 2007 10:58:02 +1100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>

Sven,

> I just started working with Perl and BioPerl. I'm quite impressed what
> can be easily done with this module. Today I found that my second CPU
> ist not used, but the first one run's at 100%. I tried to include the
> "-a"-parameter, but I was not successful:

My experience agrees with you, in that "-a" does not seem to work with
the pre-compiled BLAST binaries you get from NCBI on a multi-core
system.

I'm not sure why, as "ldd blastall" shows it links against
"/lib64/tls/libpthread.so.0".

Any others have any ideas?

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University
--Tel +61 3 9905 9010

From lzhtom at hotmail.com  Thu Dec  6 23:25:42 2007
From: lzhtom at hotmail.com (zhihuali)
Date: Fri, 7 Dec 2007 04:25:42 +0000
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
Message-ID: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>


Hi netters,
 
I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object:
 
use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',                             -user=>'annoymous',                             -dbname=>'bioseqdb');
 
my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -id=>"test1",                      -seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit;
 
It's successful because I found corresponding rows in the bioseqdb tables.
 
Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object?
 
Thanks a lot!
 
Zhihua Li
_________________________________________________________________
?? Live Search ??????????????
http://www.live.com/?searchOnly=true

From Marc.Logghe at ablynx.com  Fri Dec  7 03:33:17 2007
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Dec 2007 09:33:17 +0100
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>
Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com>

Hi,
The BOSC presentation of Hilmar is a very good way to start with.
Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf
Slide 18 for instance.
Regards,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of zhihuali
> Sent: vrijdag 7 december 2007 5:26
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
> 
> 
> Hi netters,
> 
> I've installed BioSQL and bioperl-db, and successfully created and stored
> a persistent object:
> 
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',
> -user=>'annoymous',                             -dbname=>'bioseqdb');
> 
> my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -
> id=>"test1",                      -seq=>"AGCTAGCT",                      -
> version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj-
> >create;$dbobj->commit;
> 
> It's successful because I found corresponding rows in the bioseqdb tables.
> 
> Now I want to retrieve the object back from the database. There's not much
> documents available and I've tried find_by_unique_key/primary_key but all
> failed. Maybe I didn't use them correctly. Could anyone give me an example
> as how to retrieve the stored Bio::Seq object?
> 
> Thanks a lot!
> 
> Zhihua Li
> _________________________________________________________________
> ?? Live Search ??????????????
> http://www.live.com/?searchOnly=true


From avilella at gmail.com  Fri Dec  7 05:32:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 7 Dec 2007 10:32:43 +0000
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
In-Reply-To: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
References: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com>

Hi Johan,

It would be great if you could upload an example reproducible case:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

Maybe simply doing a tar.gz of the directory with the sample files and
the script, and a simple
explanation on how to run it. If you have any special "env" vars
regarding tmp files, could you
specify those as well?

Thanks,

    Albert.

On Dec 5, 2007 11:35 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Hello,
>
> I have a bunch of multiple sequence alignments of protein coding genes,
> which I would like to analyse with the SLAC method of the HyPhy package. I
> tried using the SLAC.pm module in bioperl-run, but I could not get it to
> work properly.
>
> Basically, for each MSA file, I create the Bio::Tree::Tree and
> Bio::SimpleAlign objects ($tree and $aln, respectively) required as
> arguments to SLAC, and call the method with: "($rc,$result) =
> $slac->run($aln,$tree)" in a loop procedure in my script.
>
> When I choose not to save the tmp files (the default option in SLAC.pm),
> the program complains that it cannot find the file
> "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
> (which works fine). Apparently, it looks for the wrapper.bf file in the
> first tmp dir created, which is deleted in the end of the first SLAC call.
>
> If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
> all calls to SLAC give returncode 1, and no error message is received.
> However, when I look at the resulting $result hashref, it turns out that
> all results are for the FIRST alignment read. I've made sure there is
> nothing strange with my loop procedure, and I checked that the tree and
> alignment objects look OK for each MSA. Apparently, it does create new
> "results.tsv" files in the tmp directory after each run, but it is
> identical each time it's created. Also, it only creates ONE tmp directory,
> no matter how many times SLAC is executed (I would imagine it was supposed
> to save each result in separate tmp dirs?)
>
> Thus, it seems to me like the errors occur because something goes wrong in
> the creation of temporary files. Have I done something wrong here, or have
> any other of you experienced the same problem?
>
> Best regards
> /Johan
>
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From J.Hane at murdoch.edu.au  Mon Dec 10 02:31:17 2007
From: J.Hane at murdoch.edu.au (James Hane)
Date: Mon, 10 Dec 2007 16:31:17 +0900
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>

I've been trying to compile some bioperl based scripts for win32 using
perl2exe which have worked out really well - except I've noticed I
cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic
despite requiring perl2exe to include them.  Anyone have any suggestions
how to get these to compile?


From Kevin.M.Brown at asu.edu  Mon Dec 10 10:34:35 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 08:34:35 -0700
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
	<477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu>

I use PAR to create exe's for windows users and it works fine with
bioperl. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane
> Sent: Monday, December 10, 2007 12:31 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
> 
> I've been trying to compile some bioperl based scripts for win32 using
> perl2exe which have worked out really well - except I've noticed I
> cannot compile Align::IO, Bio::Location::Simple or 
> Bio::Location::Atomic
> despite requiring perl2exe to include them.  Anyone have any 
> suggestions
> how to get these to compile?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Kevin.M.Brown at asu.edu  Mon Dec 10 13:23:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 11:23:01 -0700
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU +
	avoidBLAST reload
In-Reply-To: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
References: <47545590.1000703@boekhoff.info>
	<a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu>

I use the -a option with blast all the time and it works, even on
multicore systems. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Torsten Seemann
> Sent: Thursday, December 06, 2007 4:58 PM
> To: Sven Boekhoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one 
> CPU + avoidBLAST reload
> 
> Sven,
> 
> > I just started working with Perl and BioPerl. I'm quite 
> impressed what
> > can be easily done with this module. Today I found that my 
> second CPU
> > ist not used, but the first one run's at 100%. I tried to 
> include the
> > "-a"-parameter, but I was not successful:
> 
> My experience agrees with you, in that "-a" does not seem to work with
> the pre-compiled BLAST binaries you get from NCBI on a multi-core
> system.
> 
> I'm not sure why, as "ldd blastall" shows it links against
> "/lib64/tls/libpthread.so.0".
> 
> Any others have any ideas?
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> --Tel +61 3 9905 9010
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From nadav.denekamp at gmail.com  Wed Dec 12 08:29:18 2007
From: nadav.denekamp at gmail.com (Nadav Y. Denekamp)
Date: Wed, 12 Dec 2007 15:29:18 +0200
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>

Hello,

I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic
thanks - Nadav


------------------------------------------------------------------------------------------------------------
Nadav Y. Denekamp, Ph.D.,
Israel Oceanographic and Limnological Research,
National Institute for Oceanography 
Tel-Shikmona, Haifa, 31080.
Tel: 972-4-8565259
Fax: 972-4-8511911
mobile: 972-50-2167318
Skype: nadavden
Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;

Visit the ?Sleeping Beauty? website: 
http://www.gmm.gu.se/SB


From biojoiner at gmail.com  Wed Dec 12 08:06:42 2007
From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=)
Date: Wed, 12 Dec 2007 21:06:42 +0800
Subject: [Bioperl-l] problem_About_Bioperl_Installation
Message-ID: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>

Dear Admin:

    I have a computer which out of network service, but wanted to have
bioperl installed in it.
    I found the installation method all need net to link CPAN to get the
pakage needed, so is there some complete installation program for me to
install it in a net-isolated computer, or some other method to solve the
problom?
    Wait for your kindful answer.
     Thanks very much!

-- 

============================================================
????

??????????????????????????HapMap??
??????????????????????B??6????
??????+86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/

***********************************************************************************************
Feng Cheng

Division of HapMap Project
Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
Beijing Airport Industrial Zone B-6, Beijing, 101318, China
Tel: +86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/
============================================================


From avilella at gmail.com  Wed Dec 12 09:50:16 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 12 Dec 2007 14:50:16 +0000
Subject: [Bioperl-l] problem_About_Bioperl_Installation
In-Reply-To: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
References: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com>

You can also download the tar.gz packages from the bioperl.org
website, and copy them to the computer. Then unpack
the tar.gzs, and update your PERL5LIB env var.

On Dec 12, 2007 1:06 PM, ???? <biojoiner at gmail.com> wrote:
> Dear Admin:
>
>     I have a computer which out of network service, but wanted to have
> bioperl installed in it.
>     I found the installation method all need net to link CPAN to get the
> pakage needed, so is there some complete installation program for me to
> install it in a net-isolated computer, or some other method to solve the
> problom?
>     Wait for your kindful answer.
>      Thanks very much!
>
> --
>
> ============================================================
> ????
>
> ??????????????????????????HapMap??
> ??????????????????????B??6????
> ??????+86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
>
> ***********************************************************************************************
> Feng Cheng
>
> Division of HapMap Project
> Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
> Beijing Airport Industrial Zone B-6, Beijing, 101318, China
> Tel: +86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
> ============================================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 12 10:22:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 12 Dec 2007 09:22:45 -0600
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
Message-ID: <E95ADE14-FF71-4068-B958-60BD1EEEBF3C@uiuc.edu>

If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA  
files) then you can write up your own script.  From 'perldoc  
Bio::Index::Fasta':

# Once the index is made it can accessed, either in the
# same script or a different one
use Bio::Index::Fasta;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name);
my $out = Bio::SeqIO?>new(?format => ?Fasta?,
                           ?fh => \*STDOUT);

foreach my $id (@ARGV) {
     my $seq = $inx?>fetch($id); # Returns Bio::Seq object
          $out?>write_seq($seq);
}

# or, alternatively
my $id;
my $seq = $inx?>get_Seq_by_id($id); # identical to fetch()


....

chris

On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote:

> Hello,
>
> I am trying to retrieve a list of sequences from an indexed flast  
> FASTA file. I tried to use the script bp_fetch.pl but I could only  
> retrieve one sequence for one identifier. I am looking for a way to  
> provide a list of accession numbers to a script and to retrieve the  
> sequences. I don't have much experience with perl so I appologize if  
> this question is very basic
> thanks - Nadav
>
>
> ------------------------------------------------------------------------------------------------------------
> Nadav Y. Denekamp, Ph.D.,
> Israel Oceanographic and Limnological Research,
> National Institute for Oceanography
> Tel-Shikmona, Haifa, 31080.
> Tel: 972-4-8565259
> Fax: 972-4-8511911
> mobile: 972-50-2167318
> Skype: nadavden
> Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;
>
> Visit the ?Sleeping Beauty? website:
> http://www.gmm.gu.se/SB
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From karchana at ibab.ac.in  Thu Dec 13 22:56:14 2007
From: karchana at ibab.ac.in (Information_details)
Date: Thu, 13 Dec 2007 19:56:14 -0800 (PST)
Subject: [Bioperl-l]  How to get the contents?
Message-ID: <14329679.post@talk.nabble.com>


Hi,

I am new to bioperl.

I am using module  Bio::SeqIO;

I have genbank file. http://www.nabble.com/file/p14329679/seq.gb seq.gb 

In this file i have to match gene tag and get all its contents.

which function i have to use?

The gene portion look like this

 gene            1..485
                     /gene="PRM1"
                     /note="Derived by automated computational analysis
using
                     gene prediction method: BestRefseq. Supporting evidence
                     includes similarity to: 1 mRNA"
                     /db_xref="GeneID:5619"
                     /db_xref="HGNC:9447"

i have to match gene tag and get its contents?

[CODE]
$seq=$seqobj->next_seq();

foreach $feat ($seq->get_all_SeqFeatures())
 {
        if($feat->primary_tag eq "mRNA")
        {
                foreach $tag ($feat->get_all_tags())
                {
                        if($tag eq "gene")
                        {
                            #here i have to retrieve the information like
this.
                           1..485
                         /gene="PRM1"
                        }
                 }
         }
[/CODE]
How do i do that?  

with regards
Archana


-- 
View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From mike.thon at gmail.com  Fri Dec 14 12:41:44 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 14 Dec 2007 18:41:44 +0100
Subject: [Bioperl-l] How to get the contents?
In-Reply-To: <14329679.post@talk.nabble.com>
References: <14329679.post@talk.nabble.com>
Message-ID: <9F93893E-182A-4A5F-B27C-089521CAA355@gmail.com>

Hi Information_details, a.k.a. Archana :)

"1", and "485" can be retrieved with something like:

$feat->start();
$feat->end();

if you want start and end of each exon then you need:
my $location = $feat->location();

which returns a Bio::LocationI object.

I think the 'gene' tag is a tag-value pair that  can be retrieved with:

my @values = $feat->get_tag_values("gene");

-Mike


On Dec 14, 2007, at 4:56 AM, Information_details wrote:

>
> Hi,
>
> I am new to bioperl.
>
> I am using module  Bio::SeqIO;
>
> I have genbank file. http://www.nabble.com/file/p14329679/seq.gb  
> seq.gb
>
> In this file i have to match gene tag and get all its contents.
>
> which function i have to use?
>
> The gene portion look like this
>
> gene            1..485
>                     /gene="PRM1"
>                     /note="Derived by automated computational analysis
> using
>                     gene prediction method: BestRefseq. Supporting  
> evidence
>                     includes similarity to: 1 mRNA"
>                     /db_xref="GeneID:5619"
>                     /db_xref="HGNC:9447"
>
> i have to match gene tag and get its contents?
>
> [CODE]
> $seq=$seqobj->next_seq();
>
> foreach $feat ($seq->get_all_SeqFeatures())
> {
>        if($feat->primary_tag eq "mRNA")
>        {
>                foreach $tag ($feat->get_all_tags())
>                {
>                        if($tag eq "gene")
>                        {
>                            #here i have to retrieve the information  
> like
> this.
>                           1..485
>                         /gene="PRM1"
>                        }
>                 }
>         }
> [/CODE]
> How do i do that?
>
> with regards
> Archana
>
>
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Dec 15 10:15:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 15 Dec 2007 09:15:00 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] CVS freeze
Message-ID: <9FE0873D-E009-42E6-B37A-32584655ED06@uiuc.edu>

All,

We are in the midst of switching over BioPerl from CVS to SVN.  We are  
tentatively freezing the bioperl CVS repository Dec. 19 in order to  
prepare for the switch.  At that time we plan on building and setting  
up the SVN repository, running some remedial tests (commit messages,  
etc), then announcing the switch on the list.  Soon after we will try  
getting a sync'ed read-only CVS set up for legacy purposes.

If anyone has any commits to add to the repository we suggest making  
them as soon as possible.

chris

From margots at mail.nih.gov  Tue Dec 18 10:00:11 2007
From: margots at mail.nih.gov (Margot Sunshine)
Date: Tue, 18 Dec 2007 15:00:11 +0000 (UTC)
Subject: [Bioperl-l] bio-perl cvs freeze
Message-ID: <loom.20071218T145502-552@post.gmane.org>

Hi,

I have been trying to checkout bio-perl from cvs since yesterday afternoon 
(Dec 17). My request just hangs. I can login but I cannot checkout anything. 
My reading of your posting of the planned switch from CVS to SVN seemed to 
indicate that this was not to take place until tomorrow. Help!

Thanks,
Margot Sunshine


From ste.ghi at libero.it  Tue Dec 18 13:04:21 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Tue, 18 Dec 2007 19:04:21 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>

Dear all,
  I'm facing with a really annoying problem regarding large files handling.
I wrote a script (below) which should keep sequences from an embl formatted file and write out the sequences in a customized fasta format. The script works, but since the input file is rather big 5.6 GB unzipped (987 MB zipped), after a while all the physical and virtual memories of my workstation (4GB RAM) are filled and the script is killed...
I really don't know how to avoid this huge memory usage...and now I'm wondering if this is the right approach....
Please help me!
Best wishes,
Stefano 


#################
#!/usr/bin/perl -w

use strict;

use warnings;

use Fcntl;
use Cwd;

use Bio::SeqIO;

my $infile = $ARGV[0];
my $outfile = "$ARGV[0].fasta";
my $organism;
my $count;
my $path = cwd()."/$outfile";

print "Working dir is: ".cwd().".\nCreating file: $path\n";

my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -format => 'EMBL');

while ( my $seq = $in->next_seq() ) {
	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);  
	my $id = $seq->accession_number();	
	my $desc = $seq->desc(); chop $desc;
	my $species = $seq->species->binomial();
	my $subspecies = $seq->species->sub_species();
	if ($seq->species->sub_species()) {chop $subspecies; $organism = $species." ".$subspecies;}
		else {$organism = $species;}
	my $sequence = $seq->seq();
	print TO ">$id $desc [$organism]\n$sequence\n";
    	$count++;
	warn $@ if $@;
	close TO;
}

print "Done!\n\t$count sequences have been treated. The file $ARGV[0].fasta is ready.\n";


From jason at bioperl.org  Tue Dec 18 13:22:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:22:07 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <loom.20071218T145502-552@post.gmane.org>
References: <loom.20071218T145502-552@post.gmane.org>
Message-ID: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>

Margot -
The code freeze won't affect the the anonymous cvs, and we'll likely  
keep anonymous CVS as is (and maybe even figure out how to keep it  
updated with the SVN) since external tools depend on it and have  
published CVS instructions.

I was able to do an anonymous checkout fine on my machine just now --  
if the problem persists please send a message to support at open-bio.org  
and the support volunteers will track it from there.

-jason
On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:

> Hi,
>
> I have been trying to checkout bio-perl from cvs since yesterday  
> afternoon
> (Dec 17). My request just hangs. I can login but I cannot checkout  
> anything.
> My reading of your posting of the planned switch from CVS to SVN  
> seemed to
> indicate that this was not to take place until tomorrow. Help!
>
> Thanks,
> Margot Sunshine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Dec 18 13:31:39 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:31:39 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>

Not exactly clear why you aren't using Bio::SeqIO to write the  
sequence back out in FASTA format and why you are re-opening the file  
each time?

Did you look at the examples that show how to convert file formats?
http://bioperl.org/wiki/HOWTO:SeqIO

You can set the description with
$seq->description($newdescription);
and the ID with
$seq->display_id($newid);
before writing.

It isn't clear to me from your code why it would be leaking memory  
and causing a problem - is it possible that you have a huge sequence  
in the EMBL file?

-jason
On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:

> Dear all,
>   I'm facing with a really annoying problem regarding large files  
> handling.
> I wrote a script (below) which should keep sequences from an embl  
> formatted file and write out the sequences in a customized fasta  
> format. The script works, but since the input file is rather big  
> 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> virtual memories of my workstation (4GB RAM) are filled and the  
> script is killed...
> I really don't know how to avoid this huge memory usage...and now  
> I'm wondering if this is the right approach....
> Please help me!
> Best wishes,
> Stefano
>
>
>
> #################
> #!/usr/bin/perl -w
>
> use strict;
>
> use warnings;
>
> use Fcntl;
> use Cwd;
>
> use Bio::SeqIO;
>
> my $infile = $ARGV[0];
> my $outfile = "$ARGV[0].fasta";
> my $organism;
> my $count;
> my $path = cwd()."/$outfile";
>
> print "Working dir is: ".cwd().".\nCreating file: $path\n";
>
> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> 	my $id = $seq->accession_number();	
> 	my $desc = $seq->desc(); chop $desc;
> 	my $species = $seq->species->binomial();
> 	my $subspecies = $seq->species->sub_species();
> 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> $species." ".$subspecies;}
> 		else {$organism = $species;}
> 	my $sequence = $seq->seq();
> 	print TO ">$id $desc [$organism]\n$sequence\n";
>     	$count++;
> 	warn $@ if $@;
> 	close TO;
> }
>
> print "Done!\n\t$count sequences have been treated. The file $ARGV 
> [0].fasta is ready.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cain.cshl at gmail.com  Tue Dec 18 14:04:11 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:04:11 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
Message-ID: <1198004651.11000.19.camel@frissell>

Hi Jason and all,

Does the fact that cvs is sticking around (read only) mean that viewcvs
(the web interface) will stick around too?  I was thinking about
modifying the GBrowse net installer to use the 'automatic' tarball of
bioperl-live to download and install via nmake on Windows since it
doesn't have cvs support built in.  Also, with cvs sticking around, I
don't need to rewrite the installer to use svn (yeah!).

Thanks,
Scott

On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> Margot -
> The code freeze won't affect the the anonymous cvs, and we'll likely  
> keep anonymous CVS as is (and maybe even figure out how to keep it  
> updated with the SVN) since external tools depend on it and have  
> published CVS instructions.
> 
> I was able to do an anonymous checkout fine on my machine just now --  
> if the problem persists please send a message to support at open-bio.org  
> and the support volunteers will track it from there.
> 
> -jason
> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> 
> > Hi,
> >
> > I have been trying to checkout bio-perl from cvs since yesterday  
> > afternoon
> > (Dec 17). My request just hangs. I can login but I cannot checkout  
> > anything.
> > My reading of your posting of the planned switch from CVS to SVN  
> > seemed to
> > indicate that this was not to take place until tomorrow. Help!
> >
> > Thanks,
> > Margot Sunshine
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From jason at bioperl.org  Tue Dec 18 14:20:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 11:20:11 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <1198004651.11000.19.camel@frissell>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
Message-ID: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>


On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:

> Hi Jason and all,
>
> Does the fact that cvs is sticking around (read only) mean that  
> viewcvs
> (the web interface) will stick around too?  I was thinking about
> modifying the GBrowse net installer to use the 'automatic' tarball of
> bioperl-live to download and install via nmake on Windows since it
> doesn't have cvs support built in.  Also, with cvs sticking around, I
> don't need to rewrite the installer to use svn (yeah!).
>
Hey Scott -

Perhaps, there may be better tools with SVN anyways, we could also  
just instantiate a script that tarballed the already auto-updated  
code here (i think it syncs every hour):
http://bioperl.org/SRC/

We'll still playing around with this and I can't guarantee that we'll  
get the SVN commits back to CVS to work.

-jason
> Thanks,
> Scott
>
> On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
>> Margot -
>> The code freeze won't affect the the anonymous cvs, and we'll likely
>> keep anonymous CVS as is (and maybe even figure out how to keep it
>> updated with the SVN) since external tools depend on it and have
>> published CVS instructions.
>>
>> I was able to do an anonymous checkout fine on my machine just now --
>> if the problem persists please send a message to support at open-bio.org
>> and the support volunteers will track it from there.
>>
>> -jason
>> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
>>
>>> Hi,
>>>
>>> I have been trying to checkout bio-perl from cvs since yesterday
>>> afternoon
>>> (Dec 17). My request just hangs. I can login but I cannot checkout
>>> anything.
>>> My reading of your posting of the planned switch from CVS to SVN
>>> seemed to
>>> indicate that this was not to take place until tomorrow. Help!
>>>
>>> Thanks,
>>> Margot Sunshine
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD


From cain.cshl at gmail.com  Tue Dec 18 14:31:23 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:31:23 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
	<4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
Message-ID: <1198006283.11000.20.camel@frissell>

Cool.  For the moment, I'll just wait and see what happens :-)

Thanks,
Scott

On Tue, 2007-12-18 at 11:20 -0800, Jason Stajich wrote:
> On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:
> 
> > Hi Jason and all,
> >
> > Does the fact that cvs is sticking around (read only) mean that  
> > viewcvs
> > (the web interface) will stick around too?  I was thinking about
> > modifying the GBrowse net installer to use the 'automatic' tarball of
> > bioperl-live to download and install via nmake on Windows since it
> > doesn't have cvs support built in.  Also, with cvs sticking around, I
> > don't need to rewrite the installer to use svn (yeah!).
> >
> Hey Scott -
> 
> Perhaps, there may be better tools with SVN anyways, we could also  
> just instantiate a script that tarballed the already auto-updated  
> code here (i think it syncs every hour):
> http://bioperl.org/SRC/
> 
> We'll still playing around with this and I can't guarantee that we'll  
> get the SVN commits back to CVS to work.
> 
> -jason
> > Thanks,
> > Scott
> >
> > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> >> Margot -
> >> The code freeze won't affect the the anonymous cvs, and we'll likely
> >> keep anonymous CVS as is (and maybe even figure out how to keep it
> >> updated with the SVN) since external tools depend on it and have
> >> published CVS instructions.
> >>
> >> I was able to do an anonymous checkout fine on my machine just now --
> >> if the problem persists please send a message to support at open-bio.org
> >> and the support volunteers will track it from there.
> >>
> >> -jason
> >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> >>
> >>> Hi,
> >>>
> >>> I have been trying to checkout bio-perl from cvs since yesterday
> >>> afternoon
> >>> (Dec 17). My request just hangs. I can login but I cannot checkout
> >>> anything.
> >>> My reading of your posting of the planned switch from CVS to SVN
> >>> seemed to
> >>> indicate that this was not to take place until tomorrow. Help!
> >>>
> >>> Thanks,
> >>> Margot Sunshine
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From avilella at gmail.com  Tue Dec 18 15:33:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 18 Dec 2007 20:33:43 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
Message-ID: <358f4d650712181233q2a1627c3v6fb4e3e20b9f6c78@mail.gmail.com>

There is a Bio::SeqIO "largefasta" object that will use the hard-disk
for very large fasta files.

On Dec 18, 2007 6:31 PM, Jason Stajich <jason at bioperl.org> wrote:
> Not exactly clear why you aren't using Bio::SeqIO to write the
> sequence back out in FASTA format and why you are re-opening the file
> each time?
>
> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
>
> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
>
> It isn't clear to me from your code why it would be leaking memory
> and causing a problem - is it possible that you have a huge sequence
> in the EMBL file?
>
> -jason
>
> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
>
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files
> > handling.
> > I wrote a script (below) which should keep sequences from an embl
> > formatted file and write out the sequences in a customized fasta
> > format. The script works, but since the input file is rather big
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and
> > virtual memories of my workstation (4GB RAM) are filled and the
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> >       sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> >       my $id = $seq->accession_number();
> >       my $desc = $seq->desc(); chop $desc;
> >       my $species = $seq->species->binomial();
> >       my $subspecies = $seq->species->sub_species();
> >       if ($seq->species->sub_species()) {chop $subspecies; $organism =
> > $species." ".$subspecies;}
> >               else {$organism = $species;}
> >       my $sequence = $seq->seq();
> >       print TO ">$id $desc [$organism]\n$sequence\n";
> >       $count++;
> >       warn $@ if $@;
> >       close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Tue Dec 18 21:29:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 18 Dec 2007 20:29:19 -0600
Subject: [Bioperl-l] perl 5.10 released
Message-ID: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>

The next major perl release, perl 5.10, has officially been released:

http://use.perl.org/article.pl?sid=07/12/18/195247

I'll try testing BioPerl with perl 5.10 and any relevant modules when  
I can; this may have to wait until after SVN migration.  If there are  
any interested parties who want to bioperl compatibility with perl  
5.10 feel free to post your results!

chris

From David.Messina at sbc.su.se  Wed Dec 19 11:44:06 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 10:44:06 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
Message-ID: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>

Hi everyone,

Perl 5.10 builds fine and passes all tests on my PB G4 running OS X 10.5.1.
Piece o' cake.

Here are results of testing BioPerl on this virgin install:

I downloaded the latest CVS tarball. I did 'perl Build.PL', which used CPAN
to install a bunch of dependencies. I then did 'Build test'. For the most
part everything was fine.

- Bio::Biblio::IO::medlinexml throws an exception because XML::Parser isn't
installed.

- RNA_SearchIO fails a few tests.

- Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception because
Graph::Directed isn't installed.

- Spidey fails one test.

And of course without the optional dependencies installed, many tests were
skipped.

I'll now go back and install the optional dependencies and do the network
tests, but it looks like for the most part we play nice with the new Perl.

Dave

From ste.ghi at libero.it  Wed Dec 19 11:45:15 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Wed, 19 Dec 2007 17:45:15 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>

> Not exactly clear why you aren't using Bio::SeqIO to write the  
> sequence back out in FASTA format and why you are re-opening the file  
> each time?
It was to avoid tho keep the out file always opened...

> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
yes I did...but I didn't realized how to set a customized description...

> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
Thanks for the hint. Anyway, just using the simple code reported to convert embl to fasta format, the results are the same...I remember you that I'm using a huge input file: the uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!

> It isn't clear to me from your code why it would be leaking memory  
> and causing a problem - is it possible that you have a huge sequence  
> in the EMBL file?
> -jason

At the end, I succeeded in the format conversion using this command:

gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
(/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
(/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'

(Thanks to Riccardo Percudani). It's not bioperl...but it works!

My best wishes,
Stefano


> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
> 
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files  
> > handling.
> > I wrote a script (below) which should keep sequences from an embl  
> > formatted file and write out the sequences in a customized fasta  
> > format. The script works, but since the input file is rather big  
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> > virtual memories of my workstation (4GB RAM) are filled and the  
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now  
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> > 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> > 	my $id = $seq->accession_number();	
> > 	my $desc = $seq->desc(); chop $desc;
> > 	my $species = $seq->species->binomial();
> > 	my $subspecies = $seq->species->sub_species();
> > 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> > $species." ".$subspecies;}
> > 		else {$organism = $species;}
> > 	my $sequence = $seq->seq();
> > 	print TO ">$id $desc [$organism]\n$sequence\n";
> >     	$count++;
> > 	warn $@ if $@;
> > 	close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV 
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Wed Dec 19 12:17:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:17:28 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
Message-ID: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>


On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:

>> Not exactly clear why you aren't using Bio::SeqIO to write the
>> sequence back out in FASTA format and why you are re-opening the file
>> each time?
> It was to avoid tho keep the out file always opened...
>
>> Did you look at the examples that show how to convert file formats?
>> http://bioperl.org/wiki/HOWTO:SeqIO
> yes I did...but I didn't realized how to set a customized  
> description...
>
>> You can set the description with
>> $seq->description($newdescription);
>> and the ID with
>> $seq->display_id($newid);
>> before writing.
> Thanks for the hint. Anyway, just using the simple code reported to  
> convert embl to fasta format, the results are the same...I remember  
> you that I'm using a huge input file: the  
> uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!
>
>> It isn't clear to me from your code why it would be leaking memory
>> and causing a problem - is it possible that you have a huge sequence
>> in the EMBL file?
>> -jason
>
> At the end, I succeeded in the format conversion using this command:
>
> gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
>
> (Thanks to Riccardo Percudani). It's not bioperl...but it works!
>
> My best wishes,
> Stefano


As this shows, sometimes BioPerl isn't always the best answer (I know,  
blasphemy...).  As Jason suggested it's quite likely there are large  
sequence records causing your problems when using BioPerl.  The one- 
liner works b/c it doesn't retain data (sequence, annotation, etc) in  
memory as Bio::Seq object; it's a direct conversion.

It would be nice to code up a lazy sequence object and related  
parsers; maybe for the next dev release.

chris

From cjfields at uiuc.edu  Wed Dec 19 12:08:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:08:31 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
Message-ID: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>


On Dec 19, 2007, at 10:44 AM, Dave Messina wrote:

> Hi everyone,
>
>
> Perl 5.10 builds fine and passes all tests on my PB G4 running OS X  
> 10.5.1. Piece o' cake.
>
> Here are results of testing BioPerl on this virgin install:
>
> I downloaded the latest CVS tarball. I did 'perl Build.PL', which  
> used CPAN to install a bunch of dependencies. I then did 'Build  
> test'. For the most part everything was fine.
>
> - Bio::Biblio::IO::medlinexml throws an exception because  
> XML::Parser isn't installed.

XML::Parser used to be shipped with a number of perl distros even  
though it isn't core.  We should add a require to these.

> - RNA_SearchIO fails a few tests.

These are very likely from recent commits I made re:GenericHSP and use  
of bits(), raw_score(), etc. (the fails look like missing/switched  
vals with these method tests).  I'll fix these post-svn migration, but  
I don't think these are related to 5.10.

> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception  
> because Graph::Directed isn't installed.

Odd, that should be caught out before tests are run.  Needs to be  
fixed, but one would think this would fail as well under 5.8.

> - Spidey fails one test.

Passes for me.  Is it dependency-related?

> And of course without the optional dependencies installed, many  
> tests were skipped.
>
> I'll now go back and install the optional dependencies and do the  
> network tests, but it looks like for the most part we play nice with  
> the new Perl.
>
> Dave

Not sure, but it seems a bit faster.  Maybe it's just me but it would  
be nice to see some benchmarks comparing perl 5.8 vs 5.10.  I agree,  
it was a very fast and easy install.

I'll start a page on the wiki for test fails using perl 5.10.  I'm  
seeing a few fails;  I'm getting the following with everything  
installed (including DBD::mysql, DBI, etc) using perl 5.10, Mac OS X  
10.5.1 (note Test::Harness now gives TODO's, so some of these are  
actually passing).  Note the entrezgene.t and DB.t fails; I looked  
into these and I think they are related to the odd 'pseudohashes are  
deprecated' warnings we were getting in perl 5.8 tests, so there may  
be something legitimately buggy.

Test Summary Report
-------------------
t/Annotation.t                (Wstat: 0 Tests: 112 Failed: 0)
   TODO passed:   96
t/BioGraphics.t               (Wstat: 256 Tests: 35 Failed: 1)
   Failed test number(s):  4
   Non-zero exit status: 1
t/DB.t                        (Wstat: 65280 Tests: 106 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 116 tests but ran 106.
t/DBCUTG.t                    (Wstat: 1024 Tests: 33 Failed: 4)
   Failed test number(s):  29-31, 33
   Non-zero exit status: 4
t/RNA_SearchIO.t              (Wstat: 2048 Tests: 496 Failed: 8)
   Failed test number(s):  291, 338, 372-374, 395, 455, 486
   Non-zero exit status: 8
t/entrezgene.t                (Wstat: 65280 Tests: 648 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 1422 tests but ran 648.
Files=255, Tests=15066, 435 wallclock secs ( 3.15 usr  1.72 sys +  
124.87 cusr 13.29 csys = 143.03 CPU)
Result: FAIL
Failed 5/255 test programs. 13/15066 subtests failed.


chris

From David.Messina at sbc.su.se  Wed Dec 19 12:49:32 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 11:49:32 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
Message-ID: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>

>
> XML::Parser used to be shipped with a number of perl distros even
> though it isn't core.  We should add a require to these.


Agreed.


> - RNA_SearchIO fails a few tests.
>
> These are very likely from recent commits I made re:GenericHSP and use
> of bits(), raw_score(), etc. (the fails look like missing/switched
> vals with these method tests).  I'll fix these post-svn migration, but
> I don't think these are related to 5.10.


Agreed -- I doubt this is 5.10-specific.


> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
> > because Graph::Directed isn't installed.
>
> Odd, that should be caught out before tests are run.  Needs to be
> fixed, but one would think this would fail as well under 5.8.


Yep, and in a minute here I'll test it under 5.8.


> > - Spidey fails one test.
>
> Passes for me.  Is it dependency-related?


I don't think so, but I guess we'll see once I finish installing the
dependencies. Here's what I got:

t/Spidey........................ok 1/26 Can't call method "sub_SeqFeature"
on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
# Looks like you planned 26 tests but only ran 3.
# Looks like your test died just after 3.
t/Spidey........................dubious

        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 4-26
        Failed 23/26 tests, 11.54% okay


Dave

From cjfields at uiuc.edu  Wed Dec 19 14:19:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 13:19:10 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
Message-ID: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>

Just updated from CVS and reran tests, Spidey.t is failing now.  This  
may be from a recent commit:

http://lists.open-bio.org/pipermail/bioperl-guts-l/2007-December/026854.html

I'm updating the following page on the wiki for tracking.  There are a  
few more we should look into at some point:

http://www.bioperl.org/w/index.php?title=Bioperl_and_Perl_5.10

chris

On Dec 19, 2007, at 11:49 AM, Dave Messina wrote:

>>
>> XML::Parser used to be shipped with a number of perl distros even
>> though it isn't core.  We should add a require to these.
>
>
> Agreed.
>
>
>> - RNA_SearchIO fails a few tests.
>>
>> These are very likely from recent commits I made re:GenericHSP and  
>> use
>> of bits(), raw_score(), etc. (the fails look like missing/switched
>> vals with these method tests).  I'll fix these post-svn migration,  
>> but
>> I don't think these are related to 5.10.
>
>
> Agreed -- I doubt this is 5.10-specific.
>
>
>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>> because Graph::Directed isn't installed.
>>
>> Odd, that should be caught out before tests are run.  Needs to be
>> fixed, but one would think this would fail as well under 5.8.
>
>
> Yep, and in a minute here I'll test it under 5.8.
>
>
>
>
>>> - Spidey fails one test.
>>
>> Passes for me.  Is it dependency-related?
>
>
> I don't think so, but I guess we'll see once I finish installing the
> dependencies. Here's what I got:
>
> t/Spidey........................ok 1/26 Can't call method  
> "sub_SeqFeature"
> on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
> # Looks like you planned 26 tests but only ran 3.
> # Looks like your test died just after 3.
> t/Spidey........................dubious
>
>        Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 4-26
>        Failed 23/26 tests, 11.54% okay
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Wed Dec 19 18:42:14 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 17:42:14 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
Message-ID: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>

Hi Chris and everyone,

With most of the optional dependencies installed, I'm seeing  
essentially the same test failures, including the CODE ref thingy.  
I've noted this on the new Wiki page you created.

According to Data::Dumper's documentation,
Data::Dumper cheats with CODE references. If a code reference is  
encountered in the structure being processed (and if you haven't set  
theDeparse flag), an anonymous subroutine that contains the string  
'"DUMMY"' will be inserted in its place, and a warning will be printed  
if Purity is set. You can eval the result, but bear in mind that the  
anonymous sub that gets created is just a placeholder. Someday, perl  
will have a switch to cache-on-demand the string representation of a  
compiled piece of code, I hope. If you have prior knowledge of all the  
code refs that your data structures are likely to have, you can use  
the Seen method to pre-seed the internal reference table and make the  
dumped output point to them, instead. See EXAMPLES above.


So it's not BioPerl per se, but we can probably work around it.


>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>> because Graph::Directed isn't installed.
>>>
>>> Odd, that should be caught out before tests are run.  Needs to be
>>> fixed, but one would think this would fail as well under 5.8.
>>
>>
>> Yep, and in a minute here I'll test it under 5.8.


Strangely, the Ontology tests properly get skipped under 5.8.

Dave


From ki.baik at roche.com  Wed Dec 19 19:58:42 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 19 Dec 2007 16:58:42 -0800
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>

Hello,

 
I'm interested in parsing the output of the CAP contig assembly program
into a format that is more manageable. The CAP output is shown below:

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

            ____________________________________________________________

consensus   CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

Seq2+       ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

            ____________________________________________________________

consensus   ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

Seq2+       CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

            ____________________________________________________________

consensus   CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

Seq2+       TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC

            ____________________________________________________________

consensus   TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG

 
I would like to maintain the alignment with their base positions for
each sequence. A fasta format retaining the alignment position is ideal
such as below:

 
>Seq1+

CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

>Seq2+

------------------------------------------------------------

ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC--------

 
Does anyone have any experience doing this?

 
Regards,

 
KB


From cjfields at uiuc.edu  Wed Dec 19 20:41:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 19:41:51 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
Message-ID: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>

On Dec 19, 2007, at 5:42 PM, Dave Messina wrote:

> Hi Chris and everyone,
>
> With most of the optional dependencies installed, I'm seeing  
> essentially the same test failures, including the CODE ref thingy.  
> I've noted this on the new Wiki page you created.
>
> According to Data::Dumper's documentation,
> Data::Dumper cheats with CODE references. If a code reference is  
> encountered in the structure being processed (and if you haven't set  
> theDeparse flag), an anonymous subroutine that contains the string  
> '"DUMMY"' will be inserted in its place, and a warning will be  
> printed if Purity is set. You can eval the result, but bear in mind  
> that the anonymous sub that gets created is just a placeholder.  
> Someday, perl will have a switch to cache-on-demand the string  
> representation of a compiled piece of code, I hope. If you have  
> prior knowledge of all the code refs that your data structures are  
> likely to have, you can use the Seen method to pre-seed the internal  
> reference table and make the dumped output point to them, instead.  
> See EXAMPLES above.
>
>
> So it's not BioPerl per se, but we can probably work around it.

May be something in Module::Build or Build.PL that needs tweaking.

It looks like EntrezGene parsing is broken for now using perl 5.10;  
the 'pseudohash' warnings with perl 5.8 were indicating something was  
amiss but we could never place it.  Any fixes will have to wait until  
after svn migration.  Not sure what's going on with the others fails  
just yet.

>>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>>> because Graph::Directed isn't installed.
>>>>
>>>> Odd, that should be caught out before tests are run.  Needs to be
>>>> fixed, but one would think this would fail as well under 5.8.
>>>
>>>
>>> Yep, and in a minute here I'll test it under 5.8.
>
>
> Strangely, the Ontology tests properly get skipped under 5.8.
>
> Dave

May be worth looking into.  Have you added it to the wiki?

chris

From David.Messina at sbc.su.se  Wed Dec 19 23:52:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 22:52:16 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
	<980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
Message-ID: <628aabb70712192052p5d9afe3bvf4fa1da872f56355@mail.gmail.com>

>
> May be something in Module::Build or Build.PL that needs tweaking.


I took a quick look-see and I'm pretty sure it's Module::Build.
Specifically, Module::Build::Base::write_config(), where there are three
calls with coderefs as parameters to _write_data() to match the three
coderef errors we are seeing at the end of 'perl Build.PL'.

_write_data() in turn calls Module::Build::Dumper::_data_dump() and uses
some ugly Data::Dumper voodoo to serialize.

I don't understand the voodoo well enough to explain why this appears only
with Perl 5.10, though; it sure looks like it should have with 5.8, too.


> Strangely, the Ontology tests properly get skipped under 5.8.
>
> May be worth looking into.  Have you added it to the wiki?


Uhhh, yeah...of course! (just now)

Should be a simple fix after the post-svn thaw.

Dave

From David.Messina at sbc.su.se  Thu Dec 20 00:39:41 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 23:39:41 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
Message-ID: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>

Hi Ki,

Hopefully someone who (unlike me) uses these modules regularly will chime
in, but in the meantime, here are some ideas:

The Bio::AssemblyIO module can read and write ace files, which CAP3 can
produce as output. I don't think there is an explicit means to dump to a
multi-fasta file like you want.

But you could probably write a Bio::AssemblyIO::Fasta class which could
write the multi-Fasta format you want. Then you could use Bio::AssemblyIO
objects to read in ace files from CAP3 and write out to multi-fasta.

Look at

Bio::AssemblyIO::*
Bio::Assembly::ScaffoldI
Bio::Assembly::Contig
Bio::LocatableSeq
Bio::AlignIO

Assemblies are made of scaffolds, scaffolds are made of contigs, and contigs
are made of sequences which can be manipulated like any old seq in BioPerl.
Bio::AlignIO can read and write multiple sequence alignments and
multi-fastas, so that should help you to get from AssemblyIO to your desired
output format.


Hope this helps,
Dave

From mike.thon at gmail.com  Thu Dec 20 00:59:06 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Thu, 20 Dec 2007 06:59:06 +0100
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>


On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:

> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');

This is just for the sake of curiosity, since you already found a  
solution to your problem, but I wonder how perl will handle a file  
opened this way.  Will it try to suck the whole thing into ram in one  
go?

Mike

From cjfields at uiuc.edu  Thu Dec 20 00:54:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 23:54:36 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
	<628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
Message-ID: <EB4F110F-9F12-4478-89C2-5DDF4FEF07C6@uiuc.edu>


On Dec 19, 2007, at 11:39 PM, Dave Messina wrote:

> Hi Ki,
>
> Hopefully someone who (unlike me) uses these modules regularly will  
> chime
> in, but in the meantime, here are some ideas:
>
> The Bio::AssemblyIO module can read and write ace files, which CAP3  
> can
> produce as output. I don't think there is an explicit means to dump  
> to a
> multi-fasta file like you want.
>
> But you could probably write a Bio::AssemblyIO::Fasta class which  
> could
> write the multi-Fasta format you want. Then you could use  
> Bio::AssemblyIO
> objects to read in ace files from CAP3 and write out to multi-fasta.
>
> Look at
>
> Bio::AssemblyIO::*
> Bio::Assembly::ScaffoldI
> Bio::Assembly::Contig
> Bio::LocatableSeq
> Bio::AlignIO
>
> Assemblies are made of scaffolds, scaffolds are made of contigs, and  
> contigs
> are made of sequences which can be manipulated like any old seq in  
> BioPerl.
> Bio::AlignIO can read and write multiple sequence alignments and
> multi-fastas, so that should help you to get from AssemblyIO to your  
> desired
> output format.
>
>
>
> Hope this helps,
> Dave

What would help is to make Bio::Assembly::Contig implement Bio::AlignI  
correctly, or make it a subclass of Bio::SimpleAlign.  That way one  
could read in Scaffolds in via Bio::Assembly::IO and write out Contigs  
through Bio::AlignIO directly.  In theory that should work but IIRC it  
doesn't.

chris

From jason at bioperl.org  Thu Dec 20 02:13:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Dec 2007 23:13:55 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
Message-ID: <02EC6D6D-F807-492F-B125-9FE0393B1FD9@bioperl.org>

It gets buffered via the OS -- Bio::Root::IO calls next_line  
iteratively, but eventually the whole sequence object will get put  
into RAM as it is built up.
zcat or bzcat can also be used for gzipped and bzipped files  
respectively, I like to use this where I want to disk space footprint  
down.

Because we treat data input usually as from a stream ignoring whether  
it is in a file or not, we have to have a more flexible structure to  
really handle this, although I'd argue the data really belongs in a  
database when it is too big for memory.
More compact Feature/Location objects would probably also help here.   
I would not be surprised if the memory requirement has more to do  
with the number of features than length of the sequence - human chrom  
1 can fit into memory just fine on most machines with 2GB of RAM.

But it would require someone taking an interest in some re- 
architecting here.

-jason

On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:

>
> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>
>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
>> format => 'EMBL');
>
> This is just for the sake of curiosity, since you already found a  
> solution to your problem, but I wonder how perl will handle a file  
> opened this way.  Will it try to suck the whole thing into ram in  
> one go?
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ste.ghi at libero.it  Thu Dec 20 08:57:54 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 20 Dec 2007 14:57:54 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>

I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want...

> It gets buffered via the OS -- Bio::Root::IO calls next_line  
> iteratively, but eventually the whole sequence object will get put  
> into RAM as it is built up.
> zcat or bzcat can also be used for gzipped and bzipped files  
> respectively, I like to use this where I want to disk space footprint  
> down.
> 
> Because we treat data input usually as from a stream ignoring whether  
> it is in a file or not, we have to have a more flexible structure to  
> really handle this, although I'd argue the data really belongs in a  
> database when it is too big for memory.
> More compact Feature/Location objects would probably also help here.   
> I would not be surprised if the memory requirement has more to do  
> with the number of features than length of the sequence - human chrom  
> 1 can fit into memory just fine on most machines with 2GB of RAM.
> 
> But it would require someone taking an interest in some re- 
> architecting here.
> 
> -jason
> 
> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
> 
> >
> > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
> >
> >> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> >> format => 'EMBL');
> >
> > This is just for the sake of curiosity, since you already found a  
> > solution to your problem, but I wonder how perl will handle a file  
> > opened this way.  Will it try to suck the whole thing into ram in  
> > one go?
> >
> > Mike
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From amackey at pcbi.upenn.edu  Thu Dec 20 10:32:19 2007
From: amackey at pcbi.upenn.edu (Aaron Mackey)
Date: Thu, 20 Dec 2007 10:32:19 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <476A7736.109@toulouse.inra.fr>
References: <476A7736.109@toulouse.inra.fr>
Message-ID: <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com>

The NHX writer will only add the [&&NHX] block when there are tags to
be written.  Your code reads in a Newick tree without tags, and then
writes it back out without adding any new tags.  So yes, you need to
1) read the Newick tree, 2) traverse the tree, calling
$node->nhx_tag({T => $taxon_id}) for each node with each corresponding
$taxon_id, and then 3) write out the NHX tree.

-Aaron

On Dec 20, 2007 9:07 AM, Laurence Amilhat
<Laurence.Amilhat at toulouse.inra.fr> wrote:
> Dear Mr MacKey,
>
>
> I am pretty new in Tree parsing and writing with BioPerl.
> I am trying to convert a Newick tree file to a NHX tree file with adding
> the Taxid for the node in the NHX tree file.
>
> I saw the module Bio::Tree::NodeNHX, but very few examples...
>
> I don't know where do i need to start, I tried the easy way with
> Bio::TreeIO,
> but the resulting tree doesn't have the [&&NHX] in the internal node,
> and I don't know how to add the tag [&&NHX:T=xxxx] on the node,
> Do I need to use the nhx_tag method to do this?
>
> Maybe you have an example that use NHX tag in tree node, that might be
> very helpfull for me to get to understand how it works...
>
>
> Have a nice holidays,
>
>
> Best regards,
>
>
> Laurence Amilhat.
>
>
>
>
> This is the simple code that I use to convert a tree from  newick to nhx:
>
> use Bio::TreeIO;
> use Getopt::Long;
> my $tree_file;
> my $outfile;
>
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile);
>
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>
> while (my $tree= $treeio->next_tree)
> {
>    $treeout->write_tree($tree);
> }
>
> --
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan         =
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
>
>
>
>

From cjfields at uiuc.edu  Thu Dec 20 11:14:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 10:14:55 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>

As Jason mentioned, it may be the number of features in the record if  
the record itself is huge (i.e. human chromosome-sized, full  
metagenome, etc).  If (my) memory serves correctly the mem. footprint  
for a perl object is ~10x the actual data, give or take (it depends on  
the complexity of the object itself).  In cases like this indexing may  
not fix the problem, unless you have an object which retains the file  
position of the data instead of the data itself; I don't think we have  
this object type in BioPerl.

The only way I can think of to fix this would be (as Jason also  
suggested) lightweight objects, or something like the lazy sequence  
object ala the SwissKnife suite (which only bring what you want into  
memory).

Related to that, I have been testing something like that, which uses  
iterators to pass in chunks of data from a stream to handlers to build  
a sequence object.  Wouldn't be too hard to reconfigure that to return  
file positions as well.  Maybe for the 1.7 release...

chris

On Dec 20, 2007, at 7:57 AM, Stefano Ghignone wrote:

> I was wandering if, working with so big FILE, should be better first  
> index the database, than query it formatting the sequences as one  
> want...
>
>> It gets buffered via the OS -- Bio::Root::IO calls next_line
>> iteratively, but eventually the whole sequence object will get put
>> into RAM as it is built up.
>> zcat or bzcat can also be used for gzipped and bzipped files
>> respectively, I like to use this where I want to disk space footprint
>> down.
>>
>> Because we treat data input usually as from a stream ignoring whether
>> it is in a file or not, we have to have a more flexible structure to
>> really handle this, although I'd argue the data really belongs in a
>> database when it is too big for memory.
>> More compact Feature/Location objects would probably also help here.
>> I would not be surprised if the memory requirement has more to do
>> with the number of features than length of the sequence - human chrom
>> 1 can fit into memory just fine on most machines with 2GB of RAM.
>>
>> But it would require someone taking an interest in some re-
>> architecting here.
>>
>> -jason
>>
>> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>>
>>>
>>> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>>>
>>>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
>>>> format => 'EMBL');
>>>
>>> This is just for the sake of curiosity, since you already found a
>>> solution to your problem, but I wonder how perl will handle a file
>>> opened this way.  Will it try to suck the whole thing into ram in
>>> one go?
>>>
>>> Mike
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Thu Dec 20 11:26:17 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 20 Dec 2007 10:26:17 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <628aabb70712200826p36d3d451wdcd901f555bc210a@mail.gmail.com>

On 12/20/07, Stefano Ghignone <ste.ghi at libero.it> wrote:
>
> I was wandering if, working with so big FILE, should be better first index
> the database, than query it formatting the sequences as one want...
>

Agreed, but only if you want to randomly access sequences within the file. I
believe the original poster intends to do something with every sequence in
the big file, in which case streaming the file is likely to be much faster.


Dave

From akarger at CGR.Harvard.edu  Thu Dec 20 11:48:58 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 11:48:58 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> 
> 
> On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:
> 
> > At the end, I succeeded in the format conversion using this command:
> >
> > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
> >
> > (Thanks to Riccardo Percudani). It's not bioperl...but it works!
> 
> 
> As this shows, sometimes BioPerl isn't always the best answer 
> (I know,  
> blasphemy...).  As Jason suggested it's quite likely there are large  
> sequence records causing your problems when using BioPerl.  The one- 
> liner works b/c it doesn't retain data (sequence, annotation, 
> etc) in  
> memory as Bio::Seq object; it's a direct conversion.
> 
> It would be nice to code up a lazy sequence object and related  
> parsers; maybe for the next dev release.

Yes!

Also, BLAST parsing. Blasting the proteome against the genome makes for
rather large result files. Right now, if you want to delete queries that
hit, say, more than 1000 times, you still need to wait for Bioperl to
create objects and sub-objects for every single hit. Sadly, this example
isn't hypothetical. I'm going to solve it with something like:

perl -wne 'BEGIN {$/="TBLASTN"} print if length($_) < $some_big_value'
big_blast > filtered_blast

(Not that I'm volunteering to help with the parser writing, so I should
stop complaining.)

-Amir


From bix at sendu.me.uk  Thu Dec 20 12:06:28 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 17:06:28 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
Message-ID: <476AA114.2060201@sendu.me.uk>

Chris Fields wrote:
> The only way I can think of to fix this would be (as Jason also 
> suggested) lightweight objects, or something like the lazy sequence 
> object ala the SwissKnife suite (which only bring what you want into 
> memory).
> 
> Related to that, I have been testing something like that, which uses 
> iterators to pass in chunks of data from a stream to handlers to build a 
> sequence object.  Wouldn't be too hard to reconfigure that to return 
> file positions as well.  Maybe for the 1.7 release...

Bio::PullParserI is your friend.

From bix at sendu.me.uk  Thu Dec 20 13:48:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 18:48:29 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
Message-ID: <476AB8FD.8090108@sendu.me.uk>

Amir Karger wrote:
>> It would be nice to code up a lazy sequence object and related  
>> parsers; maybe for the next dev release.
> 
> Yes!
> 
> Also, BLAST parsing. Blasting the proteome against the genome makes for
> rather large result files.

This has already been done. Use Bio::SearchIO::blast_pull. In a 
situation like yours I dropped run time from 20223s to
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40%
less).

From akarger at CGR.Harvard.edu  Thu Dec 20 13:52:51 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 13:52:51 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AB8FD.8090108@sendu.me.uk>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>

> Amir Karger wrote:
> >> It would be nice to code up a lazy sequence object and related  
> >> parsers; maybe for the next dev release.
> > 
> > Also, BLAST parsing. Blasting the proteome against the 
> genome makes for
> > rather large result files.
> 
> This has already been done. Use Bio::SearchIO::blast_pull. In a 
> situation like yours I dropped run time from 20223s to
> 951s (~20x faster) and memory usage from over 8GB to less 
> than 5GB (~40%
> less).

Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
can put in my own perl lib for this, or does it require large bunches of
new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
here, but I don't see our whole center using CVS Bioperl.

-Amir


From cjfields at uiuc.edu  Thu Dec 20 15:27:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:27:45 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AA114.2060201@sendu.me.uk>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
	<476AA114.2060201@sendu.me.uk>
Message-ID: <29E190AB-8A6C-4F1C-BDD1-6034CFFEEFFF@uiuc.edu>

On Dec 20, 2007, at 11:06 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> The only way I can think of to fix this would be (as Jason also  
>> suggested) lightweight objects, or something like the lazy sequence  
>> object ala the SwissKnife suite (which only bring what you want  
>> into memory).
>> Related to that, I have been testing something like that, which  
>> uses iterators to pass in chunks of data from a stream to handlers  
>> to build a sequence object.  Wouldn't be too hard to reconfigure  
>> that to return file positions as well.  Maybe for the 1.7 release...
>
> Bio::PullParserI is your friend.

I'm looking into that, yes.  I'm thinking of something like a generic  
lazy sequence class with an embedded Handler/PullParser object which  
processes stuff on the fly.

Oh, when I have a bit more time...

chris

From cjfields at uiuc.edu  Thu Dec 20 15:39:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:39:48 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <2EC6A1C2-FBC9-45F6-AD1B-040E29FAFA28@uiuc.edu>


On Dec 20, 2007, at 12:52 PM, Amir Karger wrote:

>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related
>>>> parsers; maybe for the next dev release.
>>>
>>> Also, BLAST parsing. Blasting the proteome against the
>> genome makes for
>>> rather large result files.
>>
>> This has already been done. Use Bio::SearchIO::blast_pull. In a
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less
>> than 5GB (~40%
>> less).
>
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large  
> bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.
>
> -Amir

It's in CVS.

Just to note: there have been a lot of changes between 1.5.1 and  
1.5.2, and probably as many from 1.5.2 to now.  We are cleaning up  
some code introduced prior to the 1.5 release and working on other  
fixes and code docs, with the final aim to be a new 1.6; I'm hoping  
that release will have routine point releases for bug fixes.  Of  
course that'll have to wait until after SVN migration!

There a few discussions on the list about speeding up parsing using  
lightweight/featherweight objects or even straight hashes (for  
instance, Jason has a lightweight seqfeature implementation committed  
on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser  
implementations).  My feeling is that will be part of the next dev  
release, along with GFF3 integration and code cleanup.

chris

From bix at sendu.me.uk  Thu Dec 20 18:29:30 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 23:29:30 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <476AFADA.20604@sendu.me.uk>

Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related  
>>>> parsers; maybe for the next dev release.
>>> Also, BLAST parsing. Blasting the proteome against the 
>>> genome makes for rather large result files.
>> This has already been done. Use Bio::SearchIO::blast_pull. In a 
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less 
>> than 5GB (~40% less).
> 
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.

blast_pull is only in CVS (and needs a whole bunch of associated modules 
to work), though 1.5.2 also contains significant improvements to 
SearchIO generally which should provide you with significant speed 
improvements during blast parsing with the normal Bio::SearchIO::blast.

From abdul.sattar4 at ntlworld.com  Thu Dec 20 19:32:06 2007
From: abdul.sattar4 at ntlworld.com (Abdul Sattar)
Date: Fri, 21 Dec 2007 00:32:06 -0000
Subject: [Bioperl-l]  bioperl-db & biperl version
Message-ID: <000001c84368$ee7872b0$c5836351@owner00d4289a7>

BFG-0DRTGO0EEGREWTYU


From DGroskreutz at twt.com  Fri Dec 21 02:01:27 2007
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Fri, 21 Dec 2007 01:01:27 -0600
Subject: [Bioperl-l] Groskreutz, Deb is out of the office.
Message-ID: <OF1CBDB887.820A02D2-ON862573B8.002695BD-862573B8.002695BD@twt.com>


I will be out of the office starting  12/20/2007 and will not return until
01/01/2008.

I will respond to your message when I return on January 2nd, 2008


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From bug-bioperl at rt.cpan.org  Fri Dec 21 07:07:39 2007
From: bug-bioperl at rt.cpan.org (Brandi Cantarel via RT)
Date: Fri, 21 Dec 2007 07:07:39 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25638-1198238855-470.31796-4-0@rt.cpan.org>


Fri Dec 21 07:07:30 2007: Request 31796 was acted upon.
Transaction: Ticket created by brandi.cantarel at afmb.univ-mrs.fr
       Queue: bioperl
     Subject: SeqIO
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: brandi.cantarel at afmb.univ-mrs.fr
      Status: new
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >


I might have found a bug in SeqIO in bioperl.  Well it is actually a  
memory leak.  When I try to load large file, I can step through the  
first 10K or so sequences (using next_seq) but then it just hangs.....

If this bug is fixed please let me know.

Brandi Cantarel

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug-bioperl at rt.cpan.org  Fri Dec 21 08:57:20 2007
From: bug-bioperl at rt.cpan.org (Sendu Bala via RT)
Date: Fri, 21 Dec 2007 08:57:20 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25615-1198245436-879.31796-5-0@rt.cpan.org>


       Queue: bioperl
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >

On Fri Dec 21 07:07:30 2007, brandi.cantarel at afmb.univ-mrs.fr wrote:
> I might have found a bug in SeqIO in bioperl.  Well it is actually a  
> memory leak.  When I try to load large file, I can step through the  
> first 10K or so sequences (using next_seq) but then it just hangs.....
> 
> If this bug is fixed please let me know.

Please use http://bugzilla.bioperl.org/ to tell us about this bug. 
After creating a bug report you'll be able to attach the script in 
which you encounter the problem, which we need to diagnose this issue.

From susantoroy at gmail.com  Sat Dec 22 07:06:42 2007
From: susantoroy at gmail.com (Susanta Roy)
Date: Sat, 22 Dec 2007 17:36:42 +0530
Subject: [Bioperl-l] Enquiry about bioperl project
Message-ID: <236a58340712220406m3d3f9884h8f7b5e58bdfb356@mail.gmail.com>

Dear Sir,


Most humbly I have to state that I am Susanta Roy, 25 years and I have
done  my masters in bioinformatics. I have more than  nine months of work
experience as Associate Technical Content  Developer. I have also worked
in the journal "Bioinformatics  India" (The first bioinformatics journal
of India, now "Bioinformatics Trends"). My work with  previous employer
was highly appreciated.

This year I have founded Bioexplore, a bioinformatics KPO (Knowledge
Process Outsourcing) due to lack of bioinformatics jobs in India.

Our services include

1. Bioinformatics data mining / programming
2. HR solution
3. Technical writing solution
4. E-learning
5. Abstracing & indexing
6. Business promotion solution

I want to inquire if you can give me a project.

-- Looking forward to your reply.

Kind Regards
Mr. Susanta Roy, MS Bioinformatics
Founder Director
Bioexplore
C-5, Hazipark Market
Dimapur, Nagaland - 797112
India
+ 91 - 9811517324 (Mobile)
susanta.roy at bioexplore.co.in
susantoroy at gmail.com

From alan.bridge at isb-sib.ch  Sun Dec  2 13:29:48 2007
From: alan.bridge at isb-sib.ch (Alan Bridge)
Date: Sun, 02 Dec 2007 19:29:48 +0100
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
Message-ID: <4752F99C.9050504@isb-sib.ch>

Hello,

I was just wondering if, when performing a RemoteBlast, it would be 
possible to specify the entire UniProt database (i.e. Swiss-Prot + 
TrEMBL), or even just TrEMBL.

It seems that currently, you can only specify Swiss-Prot (the annotated 
portion of UniProt, which is much smaller than its automatically 
annotated counterpart, TrEMBL). Any hints on how to expand the search 
space to include TrEMBL would be really appreciated.

Regards, Alan Bridge

            my $prog = 'blastp';
            my $db   = 'swissprot'; # use TrEMBL ?
            my $e_val= '1e-10';

            my @params = ( '-prog' => $prog, '-data' => $db, '-expect' 
=> $e_val, '-readmethod' => 'SearchIO' );

-- 
Alan Bridge PhD
Swiss-Prot annotator
Swiss Institute of Bioinformatics (SIB)
1, rue Michel Servet
CH-1211 Geneva 4  
Switzerland   

Tel: (+41 22) 379 58 90
Fax: (+41 22) 379 58 58 

http://www.expasy.org/ 


From avilella at gmail.com  Mon Dec  3 06:39:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 3 Dec 2007 11:39:59 +0000
Subject: [Bioperl-l] Query about SLAC.pm module
In-Reply-To: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
References: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com>

[CCing to the bioperl ml]

Sorry, there were some bits left in the pod header referring to PAML
objects that aren't quite true.
I've updated now the PODs. The Hyphy executions return hashes:

If you run the SLAC test in t/Hyphy.t you will se that the $results
are something like:

DB<3> x 2 $results
0  HASH(0x8df3110)
   'E[NS Sites]' => ARRAY(0x8e6cff4)
   'E[S Sites]' => ARRAY(0x8e6ceb0)
   'Observed NS Changes' => ARRAY(0x8e7b380)
   'Observed S Changes' => ARRAY(0x8e7b344)
   'Observed S. Prop.' => ARRAY(0x8e6d018)
   'P{S geq. observed}' => ARRAY(0x8e6d360)
   'P{S leq. observed}' => ARRAY(0x8e6d33c)
   'P{S}' => ARRAY(0x8e6d03c)
   'Scaled dN-dS' => ARRAY(0x8e6d384)
   'dN' => ARRAY(0x8e6d084)
   'dN-dS' => ARRAY(0x8e6d0a8)
   'dS' => ARRAY(0x8e6d060)
  DB<4> x $rc

which correspond to the csv file that hyphy produces.

Cheers,

    Albert.

On Dec 3, 2007 10:04 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Dear Dr. Vilella,
>
> Please allow me to introduce myself. My name is Johan Nilsson and I am a
> postdoctoral researcher in bioinformatics.
>
> I was  planning to perform a large-scale analysis for positively selected
> protein coding genes using any appropriate method from the Hyphy package,
> and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very
> useful for this.
>
> IF I interpreted the documents of e.g. the SLAC module correctly, running
> $slac->run($aln,$tree) will return a
> Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results
> from the obtained hashref (running my script on the test files provided
> with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains
> that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed
> reference').
>
> I am fairly new to bioperl, so please appologise if this question was a
> stupid one :)
>
> Thanks in advance!
>
> Yours Sincerely
> /Johan
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>


From cjfields at uiuc.edu  Mon Dec  3 09:04:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 08:04:06 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
In-Reply-To: <4752F99C.9050504@isb-sib.ch>
References: <4752F99C.9050504@isb-sib.ch>
Message-ID: <CF967851-5E6C-448A-87C6-CC3F63A5D9AD@uiuc.edu>

You are limited to the databases hosted on the NCBI server, so it's  
really up to them; RemoteBlast is an interface to NCBI's WebBlast  
using URLAPI.

A list of current databases can be found here:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote:

> Hello,
>
> I was just wondering if, when performing a RemoteBlast, it would be
> possible to specify the entire UniProt database (i.e. Swiss-Prot +
> TrEMBL), or even just TrEMBL.
>
> It seems that currently, you can only specify Swiss-Prot (the  
> annotated
> portion of UniProt, which is much smaller than its automatically
> annotated counterpart, TrEMBL). Any hints on how to expand the search
> space to include TrEMBL would be really appreciated.
>
> Regards, Alan Bridge
>
>            my $prog = 'blastp';
>            my $db   = 'swissprot'; # use TrEMBL ?
>            my $e_val= '1e-10';
>
>            my @params = ( '-prog' => $prog, '-data' => $db, '-expect'
> => $e_val, '-readmethod' => 'SearchIO' );
>
> -- 
> Alan Bridge PhD
> Swiss-Prot annotator
> Swiss Institute of Bioinformatics (SIB)
> 1, rue Michel Servet
> CH-1211 Geneva 4
> Switzerland
>
> Tel: (+41 22) 379 58 90
> Fax: (+41 22) 379 58 58
>
> http://www.expasy.org/


From bioperl at boekhoff.info  Mon Dec  3 14:14:24 2007
From: bioperl at boekhoff.info (Sven Boekhoff)
Date: Mon, 03 Dec 2007 20:14:24 +0100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST
	reload
Message-ID: <47545590.1000703@boekhoff.info>

HI!
I just started working with Perl and BioPerl. I'm quite impressed what 
can be easily done with this module. Today I found that my second CPU 
ist not used, but the first one run's at 100%. I tried to include the 
"-a"-parameter, but I was not successful:

my @params = (
	-database => 'my_db',
	-a => '2',
	-outfile => 'blast1.out'
);

How do I have to use it?

Second question: In my perlscript I start BLAST-searches in a loop. 
Everytime BLAST has finished its search, the memory is cleared and BLAST 
is started again. I think most of the time is used to reload the 
database. Is it somehow possible to keep the database loaded (e.g. by 
starting a second search) or is BLAST reloaded anyway?

Thanks for your help!

Sven


www.boekhoff.info


From bix at sendu.me.uk  Mon Dec  3 19:05:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 00:05:23 +0000
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
 BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <475499C3.20801@sendu.me.uk>

Sven Boekhoff wrote:
> HI!
> I just started working with Perl and BioPerl. I'm quite impressed what 
> can be easily done with this module. Today I found that my second CPU 
> ist not used, but the first one run's at 100%. I tried to include the 
> "-a"-parameter, but I was not successful:
> 
> my @params = (
> 	-database => 'my_db',
> 	-a => '2',
> 	-outfile => 'blast1.out'
> );
> 
> How do I have to use it?

This should work in the CVS version of StandAloneBlast. In other 
versions, perhaps try using $object->a(2);


> Second question: In my perlscript I start BLAST-searches in a loop. 
> Everytime BLAST has finished its search, the memory is cleared and BLAST 
> is started again. I think most of the time is used to reload the 
> database. Is it somehow possible to keep the database loaded (e.g. by 
> starting a second search) or is BLAST reloaded anyway?

I hope someone will correct me for being wrong, but I think you'd have 
to that with a 2-way pipe. StandAloneBlast only uses output to a file 
and input from that file, finishing with the executable inbetween. I've 
thought about improving it with a 2-way pipe, but never got around to 
it, being apprehensive about stability on all platforms.

The more obvious solution, which may be possible depending on exactly 
what you're doing, is to avoid the loop and just supply Blast all your 
input in one go.


From Russell.Smithies at agresearch.co.nz  Mon Dec  3 19:49:21 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 13:49:21 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>

Hi all,

It' trying to read .ace files but keep getting an error that I don't
know the cause of.
Really basic example code:

	#!/usr/local/bin/perl -w

	use lib "/data/home/smithiesr/bioperl-live";
	use Bio::Assembly::IO;
	use Data::Dumper;

	$ace = "CLP0001001240-cE15_20030319.ace";

	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
	$assembly = $io->next_assembly;

	foreach $contig ($assembly->all_contigs) {
      		print Dumper $contig;
	}

Gives this error;
	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
	Can't call method "get_consensus_sequence" on an undefined value
at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
<GEN0> line 42.

Which relates to this bit in ace.pm:
	# Loading contig qualities... (Base Quality field)
	/^BQ/ && do {
	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();

Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace
or is the Contig object not getting created?
Any ideas?

Thanx,

Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Mon Dec  3 21:15:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 20:15:58 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>

This seems similar to the 'too many open filehandles issue' documented  
here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

It unfortunately is due to having an open DB_File for every contig,  
and is a problem with the Bio::Assembly implementation that isn't  
easily fixed.  Changing the open filehandle limit using ulimit is the  
only known fix:

ulimit -n 10000

chris

On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:

> Hi all,
>
> It' trying to read .ace files but keep getting an error that I don't
> know the cause of.
> Really basic example code:
>
> 	#!/usr/local/bin/perl -w
>
> 	use lib "/data/home/smithiesr/bioperl-live";
> 	use Bio::Assembly::IO;
> 	use Data::Dumper;
>
> 	$ace = "CLP0001001240-cE15_20030319.ace";
>
> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> 	$assembly = $io->next_assembly;
>
> 	foreach $contig ($assembly->all_contigs) {
>      		print Dumper $contig;
> 	}
>
> Gives this error;
> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> 	Can't call method "get_consensus_sequence" on an undefined value
> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
> <GEN0> line 42.
>
> Which relates to this bit in ace.pm:
> 	# Loading contig qualities... (Base Quality field)
> 	/^BQ/ && do {
> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>
> Is this caused by a dud ace file or a problem with  
> Bio::Assembly::IO:ace
> or is the Contig object not getting created?
> Any ideas?
>
> Thanx,
>
> Russell Smithies
>
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From florent.angly at gmail.com  Mon Dec  3 21:25:24 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 03 Dec 2007 18:25:24 -0800
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <4754BA94.7090600@gmail.com>

Would this issue cause an excessive memory usage? Because I was getting 
a high memory usage when parsing some TIGR Assembler files and was 
wondering if the tigr parser was responsible for that or the parent 
assembly IO module.
I'd definitely be interested in a fix of the Bio::Assembly 
implementation if it's the assembly IO module's fault....
Florent

Chris Fields wrote:
> This seems similar to the 'too many open filehandles issue' documented  
> here:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>
> It unfortunately is due to having an open DB_File for every contig,  
> and is a problem with the Bio::Assembly implementation that isn't  
> easily fixed.  Changing the open filehandle limit using ulimit is the  
> only known fix:
>
> ulimit -n 10000
>
> chris
>
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>
>   
>> Hi all,
>>
>> It' trying to read .ace files but keep getting an error that I don't
>> know the cause of.
>> Really basic example code:
>>
>> 	#!/usr/local/bin/perl -w
>>
>> 	use lib "/data/home/smithiesr/bioperl-live";
>> 	use Bio::Assembly::IO;
>> 	use Data::Dumper;
>>
>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>
>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>> 	$assembly = $io->next_assembly;
>>
>> 	foreach $contig ($assembly->all_contigs) {
>>      		print Dumper $contig;
>> 	}
>>
>> Gives this error;
>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>> 	Can't call method "get_consensus_sequence" on an undefined value
>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
>> <GEN0> line 42.
>>
>> Which relates to this bit in ace.pm:
>> 	# Loading contig qualities... (Base Quality field)
>> 	/^BQ/ && do {
>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>
>> Is this caused by a dud ace file or a problem with  
>> Bio::Assembly::IO:ace
>> or is the Contig object not getting created?
>> Any ideas?
>>
>> Thanx,
>>
>> Russell Smithies
>>
>> Bioinformatics Software Developer
>> T +64 3 489 9085
>> E  russell.smithies at agresearch.co.nz
>>
>> Invermay  Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T  +64 3 489 3809
>> F  +64 3 489 9174
>> www.agresearch.co.nz
>>
>> = 
>> ======================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use of,  
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> = 
>> ======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From Russell.Smithies at agresearch.co.nz  Mon Dec  3 21:32:43 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 15:32:43 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>

Thanx Chris,
I'm only writing a simple .ace viewer to display assembled contigs in a
Bio::Graphics::Panel so I'll parse the coords from the .ace files
"manually".
Unless anyone else has a better idea ?
(and some example code ;-)

Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, 4 December 2007 3:16 p.m.
> To: Smithies, Russell
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
> 
> This seems similar to the 'too many open filehandles issue' documented
> here:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
> 
> It unfortunately is due to having an open DB_File for every contig,
> and is a problem with the Bio::Assembly implementation that isn't
> easily fixed.  Changing the open filehandle limit using ulimit is the
> only known fix:
> 
> ulimit -n 10000
> 
> chris
> 
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
> 
> > Hi all,
> >
> > It' trying to read .ace files but keep getting an error that I don't
> > know the cause of.
> > Really basic example code:
> >
> > 	#!/usr/local/bin/perl -w
> >
> > 	use lib "/data/home/smithiesr/bioperl-live";
> > 	use Bio::Assembly::IO;
> > 	use Data::Dumper;
> >
> > 	$ace = "CLP0001001240-cE15_20030319.ace";
> >
> > 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> > 	$assembly = $io->next_assembly;
> >
> > 	foreach $contig ($assembly->all_contigs) {
> >      		print Dumper $contig;
> > 	}
> >
> > Gives this error;
> > 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> > 	Can't call method "get_consensus_sequence" on an undefined value
> > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line
170,
> > <GEN0> line 42.
> >
> > Which relates to this bit in ace.pm:
> > 	# Loading contig qualities... (Base Quality field)
> > 	/^BQ/ && do {
> > 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
> >
> > Is this caused by a dud ace file or a problem with
> > Bio::Assembly::IO:ace
> > or is the Contig object not getting created?
> > Any ideas?
> >
> > Thanx,
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> > =
> >
> =============================================================
> =========
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> > =
> >
> =============================================================
> =========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Dec  4 00:10:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:10:57 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <4754BA94.7090600@gmail.com>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<4754BA94.7090600@gmail.com>
Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu>

Yes, it's possible this would cause memory issues as each  
Bio::Assembly::Contig instance would have a  
Bio::SeqFeature::Collection attached (each Collection having a tied DB  
hash, which would be an open filehandle),  So if you had over 1000  
contigs open at any one time (in a parsed scaffold, for instance) you  
would have 1000 open file handles.  Not very efficient.

My thought was to have each Bio::Assembly::Scaffold instance carry a  
single Bio::SeqFeature::CollectionI (it could be a  
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other  
CollectionI, whatever's easiest).  Each Contig would be passed (and  
store) a reference to the Scaffold SF::Collection and pull features  
from there; just haven't had time to mess with it.  I don't think  
anyone's tackling it, so feel free to code away!

chris

On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:

> Would this issue cause an excessive memory usage? Because I was  
> getting a high memory usage when parsing some TIGR Assembler files  
> and was wondering if the tigr parser was responsible for that or the  
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly  
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'  
>> documented  here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every  
>> contig,  and is a problem with the Bio::Assembly implementation  
>> that isn't  easily fixed.  Changing the open filehandle limit using  
>> ulimit is the  only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> 	#!/usr/local/bin/perl -w
>>>
>>> 	use lib "/data/home/smithiesr/bioperl-live";
>>> 	use Bio::Assembly::IO;
>>> 	use Data::Dumper;
>>>
>>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> 	$assembly = $io->next_assembly;
>>>
>>> 	foreach $contig ($assembly->all_contigs) {
>>>   		print Dumper $contig;
>>> 	}
>>>
>>> Gives this error;
>>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> 	Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line  
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> 	# Loading contig qualities... (Base Quality field)
>>> 	/^BQ/ && do {
>>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with   
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>> Attention: The information contained in this message and/or   
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or   
>>> privileged
>>> material. Any review, retransmission, dissemination or other use  
>>> of,  or
>>> taking of any action in reliance upon, this information by persons  
>>> or
>>> entities other than the intended recipients is prohibited by   
>>> AgResearch
>>> Limited. If you have received this message in error, please notify  
>>> the
>>> sender immediately.
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec  4 00:20:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:20:07 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
Message-ID: <C48EC1AC-FEA6-4F60-9791-D4DE449768C2@uiuc.edu>

The ulimit fix usually works but if this is for Gbrowse it probably  
isn't prudent.  It would be nice to get Bio::Assembly working as an  
Bio::AlignI; it would be easier to manipulate for display.  Here's a  
script I wrote up as an example:

http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics

chris

On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote:

> Thanx Chris,
> I'm only writing a simple .ace viewer to display assembled contigs  
> in a
> Bio::Graphics::Panel so I'll parse the coords from the .ace files
> "manually".
> Unless anyone else has a better idea ?
> (and some example code ;-)
>
> Russell


From avilella at gmail.com  Tue Dec  4 06:51:05 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 11:51:05 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the
	SLR program
Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>

Hi all,

There is a new wrapper in bioperl-run for SLR:

http://www.bioperl.org/wiki/SLR

Right now, output parsing is very simple, and I have only tested it on
my linux machine.
Can someone with a Mac give it a try?

update your bioperl-run to cvs head, then:

# try the installer, SLR is option 6
perl scripts/bioperl_application_installer.PLS
# then try to run the tests (should take about a minute)
perl t/SLR.t

Any comments on the code would be appreciated,

Thanks in advance,

Cheers,

    Albert.


From captainrave at hotmail.com  Tue Dec  4 06:04:57 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST)
Subject: [Bioperl-l]  extracting CDS location from Genbank
Message-ID: <14148723.post@talk.nabble.com>


Help.  I'm very new to perl and bioperl.  Basically I need to extract the
location of each CDS in a genbank entry e.g.103...120 and export them to an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 09:48:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 14:48:27 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14148723.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>

>From the SeqIO howto:

#!/bin/perl

use strict;
use Bio::SeqIO;

my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

>From the Feature HOWTO:

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {             
      print "  tag: ", $tag, "\n";             
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";             
      }          
   }       
}

Surely you could have fouind that yourself? ;0 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 11:05
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] extracting CDS location from Genbank


Help.  I'm very new to perl and bioperl.  Basically I need to extract
the
location of each CDS in a genbank entry e.g.103...120 and export them to
an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From captainrave at hotmail.com  Tue Dec  4 10:07:19 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152264.post@talk.nabble.com>


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example, how
do I tell it which input file to read? I know this might sound stupid, but I
dont understand the Biowiki very well!

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:21:34 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:21:34 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>

Post the script that produces that error, and your file's location 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:07
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example,
how
do I tell it which input file to read? I know this might sound stupid,
but I
dont understand the Biowiki very well!

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Tue Dec  4 10:39:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 15:39:31 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <475574B3.8050700@sendu.me.uk>

Captainrave wrote:
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------

The best way to get help is to give us your script and the error 
message, and the command you used to run your script. The less you know, 
the more you should give us (ie. don't edit anything out).


From captainrave at hotmail.com  Tue Dec  4 10:41:37 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152907.post@talk.nabble.com>


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:53:22 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:53:22 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk>

Same script as below, but try:

my $file = 'C:\path\to\my\filename.gbk'; 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:42
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For
example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
>
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Dec  4 11:20:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Dec 2007 10:20:34 -0600
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <C2732712-D32B-449A-8BCA-DCB8BBDE9758@uiuc.edu>

The 'my $file = shift;' is a perl idiom.  The built-in 'shift' used  
implicitly in this way uses @ARGV (from command line); the file would  
the be passed as the first arg when running the script:

get_features.pl myfile.gb

This should work for any OS.  Personally, I use something like the  
following to indicate how the script is used in case a file is never  
entered:

my $USAGE = <<END_USE;
USAGE: get_features.pl <file>
Perl script to grab features from a GenBank file and print to a table
END_USE

my $file = shift || die $USAGE;

chris

On Dec 4, 2007, at 9:41 AM, Captainrave wrote:

>
> #!/bin/perl
>
> use strict;
> use Bio::SeqIO;
> my $file = shift; # get the file name, somehow
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>   print "primary tag: ", $feat_object->primary_tag, "\n";
>   for my $tag ($feat_object->get_all_tags) {
>      print "  tag: ", $tag, "\n";
>      for my $value ($feat_object->get_tag_values($tag)) {
>
>         print "    value: ", $value, "\n";
>      }
>   }
> }
>
> exit;
>
> The file is on the same folder.  But how do I tell it to use this  
> file?
>
>
>
> michael watson (IAH-C) wrote:
>>
>> Post the script that produces that error, and your file's location
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>> Captainrave
>> Sent: 04 December 2007 15:07
>> To: Bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
>>
>>
>> Yes but actually implementing it is another story.
>>
>> I get an error:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: file argument provided, but with an undefined value
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
>> STACK: test3.pl:7
>> -----------------------------------------------------------
>>
>> Basically because I dont understand the code well enough.  For  
>> example,
>> how
>> do I tell it which input file to read? I know this might sound  
>> stupid,
>> but I
>> dont understand the Biowiki very well!
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
>> l#a14152264
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Dec  4 11:22:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 16:22:12 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>	<14152264.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <47557EB4.10003@sendu.me.uk>

Captainrave wrote:
> #!/bin/perl
> my $file = shift; # get the file name, somehow
>
> The file is on the same folder.  But how do I tell it to use this file?

http://stein.cshl.org/genome_informatics/perl_intro/command_line.html

Basically, when you run your script add the name of the file to your 
command line.

me% perl myscript.pl myfile

By saying 'my $file = shift' inside myscript.pl, the variable $file now 
contains the filename 'myfile'.

You could also have hardcoded the filename:
my $file = 'myfile';


Anyway, you're going to run into lots of these issues, and they're 
beyond the scope of this mailing list. For basic perl problems seek help 
via www.perl.org. When you have a BioPerl-specific question, don't 
hesitate to post here.


From jason at bioperl.org  Tue Dec  4 12:16:30 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 09:16:30 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>

Excellent - thanks for this !  I'm giving it whirl on linux and the  
SLR.t test is currently taking more than 30 minutes to run -- is it  
possible to cook up an example that is going to finish in a more  
reasonable amount of time?

Also - I would prefer if the default exe could be 'Slr' rather than  
Slr_Linux_static - it seems like it is possible for users to install  
it this way.  Similarly whether or not the Slr_osx or Slr is the  
default name, is it too big of a deal to expect the user to rename it?

I'll give it a whirl on OSX later, but might be easier if the test  
runs shorter.

Thanks!
-jason
On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:

> Hi all,
>
> There is a new wrapper in bioperl-run for SLR:
>
> http://www.bioperl.org/wiki/SLR
>
> Right now, output parsing is very simple, and I have only tested it on
> my linux machine.
> Can someone with a Mac give it a try?
>
> update your bioperl-run to cvs head, then:
>
> # try the installer, SLR is option 6
> perl scripts/bioperl_application_installer.PLS
> # then try to run the tests (should take about a minute)
> perl t/SLR.t
>
> Any comments on the code would be appreciated,
>
> Thanks in advance,
>
> Cheers,
>
>     Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Dec  4 13:17:08 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 04 Dec 2007 10:17:08 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <475599A4.1040500@gmail.com>

Hi all,
I pushed a new module into bioperl-run CVS a few days ago. It's called 
Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an 
open-source software that assembles DNA sequences.
Input is a list of sequence objects and output assembly objects... easy 
enough...
Let me know if you experience problems with it.
Florent


From jason at bioperl.org  Tue Dec  4 13:51:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 10:51:34 -0800
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com>

You can pass in an array reference of sequences instead of a single sequence
object and the module will build a multi-FASTA database.  You can also pass
in a filename instead of a Sequence object and the file can be an already
built multi-FASTA database.  This is described in the documentation:

http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall

You can also just run BLAST without StandAloneBlast part as I do an just
build your multifile ahead of time with SeqIO and do
# wublast
my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |";
# or NCBI blast
# my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |";
my $fh;

open($fh, $cmd)
my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh);

The advantage of StandAloneBlast in theory is it takes care of the temporary
file creation (sequncefiles) and cleanup.  Personally I find I want easier
access to my programs that are simple cmdline like this.  You can do similar
things withe SSEARCH or FASTA searching too.

-jason

On Dec 3, 2007 4:05 PM, Sendu Bala <bix at sendu.me.uk> wrote:

> Sven Boekhoff wrote:
> > HI!
> > I just started working with Perl and BioPerl. I'm quite impressed what
> > can be easily done with this module. Today I found that my second CPU
> > ist not used, but the first one run's at 100%. I tried to include the
> > "-a"-parameter, but I was not successful:
> >
> > my @params = (
> >       -database => 'my_db',
> >       -a => '2',
> >       -outfile => 'blast1.out'
> > );
> >
> > How do I have to use it?
>
> This should work in the CVS version of StandAloneBlast. In other
> versions, perhaps try using $object->a(2);
>
>
> > Second question: In my perlscript I start BLAST-searches in a loop.
> > Everytime BLAST has finished its search, the memory is cleared and BLAST
> > is started again. I think most of the time is used to reload the
> > database. Is it somehow possible to keep the database loaded (e.g. by
> > starting a second search) or is BLAST reloaded anyway?
>
> I hope someone will correct me for being wrong, but I think you'd have
> to that with a 2-way pipe. StandAloneBlast only uses output to a file
> and input from that file, finishing with the executable inbetween. I've
> thought about improving it with a 2-way pipe, but never got around to
> it, being apprehensive about stability on all platforms.
>
> The more obvious solution, which may be possible depending on exactly
> what you're doing, is to avoid the loop and just supply Blast all your
> input in one go.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From stefan.kirov at bms.com  Tue Dec  4 14:25:21 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 14:25:21 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
Message-ID: <4755A9A1.2040608@bms.com>

Jason Stajich wrote:
> PAML4 breaks our PAML parser right now because the order of things in  
> the result file has changed.  Now sequences precede the information  
> about the version or the program run.  This means that $result- 
>  >get_seqs() fails because we don't parse the sequences.
>
> We'll see what we can do, but as usual with supporting 3rd party  
> programs it is brittle when file formats change.  Th
>
> -jason
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
Jason,
I saw a commit after this post on codeml, but not on PAML.pm- I assume
this is not fixed, am I correct?
Thanks!
Stefan


From avilella at gmail.com  Tue Dec  4 15:34:38 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:34:38 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>

hmmm, 30 minutes is quite a lot... it takes much less for me:

avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
1..7
ok 1 - use Bio::Root::IO;
ok 2 - use Bio::Tools::Run::Phylo::SLR;
ok 3 - use Bio::AlignIO;
ok 4 - use Bio::TreeIO;
ok 5
ok 6
ok 7

real    0m21.517s
user    0m20.717s
sys     0m0.100s


On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> Excellent - thanks for this !  I'm giving it whirl on linux and the
> SLR.t test is currently taking more than 30 minutes to run -- is it
> possible to cook up an example that is going to finish in a more
> reasonable amount of time?
>
> Also - I would prefer if the default exe could be 'Slr' rather than
> Slr_Linux_static - it seems like it is possible for users to install
> it this way.  Similarly whether or not the Slr_osx or Slr is the
> default name, is it too big of a deal to expect the user to rename it?
>
> I'll give it a whirl on OSX later, but might be easier if the test
> runs shorter.
>
> Thanks!
> -jason
>
> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>
> > Hi all,
> >
> > There is a new wrapper in bioperl-run for SLR:
> >
> > http://www.bioperl.org/wiki/SLR
> >
> > Right now, output parsing is very simple, and I have only tested it on
> > my linux machine.
> > Can someone with a Mac give it a try?
> >
> > update your bioperl-run to cvs head, then:
> >
> > # try the installer, SLR is option 6
> > perl scripts/bioperl_application_installer.PLS
> > # then try to run the tests (should take about a minute)
> > perl t/SLR.t
> >
> > Any comments on the code would be appreciated,
> >
> > Thanks in advance,
> >
> > Cheers,
> >
> >     Albert.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From avilella at gmail.com  Tue Dec  4 15:39:26 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:39:26 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>

oh, I forgot to mention: SLR uses the lapack and blas libraries if
installed, which makes it a lot faster (according to the author)...
maybe that's the reason...

On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
> hmmm, 30 minutes is quite a lot... it takes much less for me:
>
> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
> 1..7
> ok 1 - use Bio::Root::IO;
> ok 2 - use Bio::Tools::Run::Phylo::SLR;
> ok 3 - use Bio::AlignIO;
> ok 4 - use Bio::TreeIO;
> ok 5
> ok 6
> ok 7
>
> real    0m21.517s
> user    0m20.717s
> sys     0m0.100s
>
>
>
> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> > Excellent - thanks for this !  I'm giving it whirl on linux and the
> > SLR.t test is currently taking more than 30 minutes to run -- is it
> > possible to cook up an example that is going to finish in a more
> > reasonable amount of time?
> >
> > Also - I would prefer if the default exe could be 'Slr' rather than
> > Slr_Linux_static - it seems like it is possible for users to install
> > it this way.  Similarly whether or not the Slr_osx or Slr is the
> > default name, is it too big of a deal to expect the user to rename it?
> >
> > I'll give it a whirl on OSX later, but might be easier if the test
> > runs shorter.
> >
> > Thanks!
> > -jason
> >
> > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
> >
> > > Hi all,
> > >
> > > There is a new wrapper in bioperl-run for SLR:
> > >
> > > http://www.bioperl.org/wiki/SLR
> > >
> > > Right now, output parsing is very simple, and I have only tested it on
> > > my linux machine.
> > > Can someone with a Mac give it a try?
> > >
> > > update your bioperl-run to cvs head, then:
> > >
> > > # try the installer, SLR is option 6
> > > perl scripts/bioperl_application_installer.PLS
> > > # then try to run the tests (should take about a minute)
> > > perl t/SLR.t
> > >
> > > Any comments on the code would be appreciated,
> > >
> > > Thanks in advance,
> > >
> > > Cheers,
> > >
> > >     Albert.
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>


From jason at bioperl.org  Tue Dec  4 16:43:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:43:03 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
	<358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org>

My own icc compiled version seemed to have caused the problem.  
whoops. fixed that.
-jason
On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote:

> oh, I forgot to mention: SLR uses the lapack and blas libraries if
> installed, which makes it a lot faster (according to the author)...
> maybe that's the reason...
>
> On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
>> hmmm, 30 minutes is quite a lot... it takes much less for me:
>>
>> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
>> 1..7
>> ok 1 - use Bio::Root::IO;
>> ok 2 - use Bio::Tools::Run::Phylo::SLR;
>> ok 3 - use Bio::AlignIO;
>> ok 4 - use Bio::TreeIO;
>> ok 5
>> ok 6
>> ok 7
>>
>> real    0m21.517s
>> user    0m20.717s
>> sys     0m0.100s
>>
>>
>>
>> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> Excellent - thanks for this !  I'm giving it whirl on linux and the
>>> SLR.t test is currently taking more than 30 minutes to run -- is it
>>> possible to cook up an example that is going to finish in a more
>>> reasonable amount of time?
>>>
>>> Also - I would prefer if the default exe could be 'Slr' rather than
>>> Slr_Linux_static - it seems like it is possible for users to install
>>> it this way.  Similarly whether or not the Slr_osx or Slr is the
>>> default name, is it too big of a deal to expect the user to  
>>> rename it?
>>>
>>> I'll give it a whirl on OSX later, but might be easier if the test
>>> runs shorter.
>>>
>>> Thanks!
>>> -jason
>>>
>>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is a new wrapper in bioperl-run for SLR:
>>>>
>>>> http://www.bioperl.org/wiki/SLR
>>>>
>>>> Right now, output parsing is very simple, and I have only tested  
>>>> it on
>>>> my linux machine.
>>>> Can someone with a Mac give it a try?
>>>>
>>>> update your bioperl-run to cvs head, then:
>>>>
>>>> # try the installer, SLR is option 6
>>>> perl scripts/bioperl_application_installer.PLS
>>>> # then try to run the tests (should take about a minute)
>>>> perl t/SLR.t
>>>>
>>>> Any comments on the code would be appreciated,
>>>>
>>>> Thanks in advance,
>>>>
>>>> Cheers,
>>>>
>>>>     Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>


From stefan.kirov at bms.com  Tue Dec  4 16:51:51 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 16:51:51 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4755CBF7.5010709@bms.com>

Jason Stajich wrote:
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
Yes, this is the version I have and in some cases the sequences do not
get parsed. I have missed this commit. I will try to assemble a testcase
and send it. Cannot promise when but will try to do it tomorrow. My gut
feeling so far is that the parser works whenever there are gaps in the
alignment, otherwise it does not. PAML surely has very peculiar format.
Thanks again!
Stefan
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From jason at bioperl.org  Tue Dec  4 16:36:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:36:09 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4755A9A1.2040608@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
Message-ID: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>

should be fixed.

$ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
revision 1.56
date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
Parsing PAML4 and PAML3.15 should work now.  Dealing with variable  
order for the sequences and summary results in
the top of the MLC files

On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:

> Jason Stajich wrote:
>> PAML4 breaks our PAML parser right now because the order of things in
>> the result file has changed.  Now sequences precede the information
>> about the version or the program run.  This means that $result-
>>> get_seqs() fails because we don't parse the sequences.
>>
>> We'll see what we can do, but as usual with supporting 3rd party
>> programs it is brittle when file formats change.  Th
>>
>> -jason
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> Jason,
> I saw a commit after this post on codeml, but not on PAML.pm- I assume
> this is not fixed, am I correct?
> Thanks!
> Stefan


From johan.nilsson at sh.se  Wed Dec  5 06:35:58 2007
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Wed, 5 Dec 2007 12:35:58 +0100
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
Message-ID: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>


Hello,

I have a bunch of multiple sequence alignments of protein coding genes,
which I would like to analyse with the SLAC method of the HyPhy package. I
tried using the SLAC.pm module in bioperl-run, but I could not get it to
work properly.

Basically, for each MSA file, I create the Bio::Tree::Tree and
Bio::SimpleAlign objects ($tree and $aln, respectively) required as
arguments to SLAC, and call the method with: "($rc,$result) =
$slac->run($aln,$tree)" in a loop procedure in my script.

When I choose not to save the tmp files (the default option in SLAC.pm),
the program complains that it cannot find the file
"$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
(which works fine). Apparently, it looks for the wrapper.bf file in the
first tmp dir created, which is deleted in the end of the first SLAC call.

If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
all calls to SLAC give returncode 1, and no error message is received.
However, when I look at the resulting $result hashref, it turns out that
all results are for the FIRST alignment read. I've made sure there is
nothing strange with my loop procedure, and I checked that the tree and
alignment objects look OK for each MSA. Apparently, it does create new
"results.tsv" files in the tmp directory after each run, but it is
identical each time it's created. Also, it only creates ONE tmp directory,
no matter how many times SLAC is executed (I would imagine it was supposed
to save each result in separate tmp dirs?)

Thus, it seems to me like the errors occur because something goes wrong in
the creation of temporary files. Have I done something wrong here, or have
any other of you experienced the same problem?

Best regards
/Johan


--
Johan Nilsson, Ph.D.
School of Life Sciences
S?dert?rns University College
S-141 89 Huddinge, Sweden
E-mail: johan.nilsson at sh.se
Phone: +46 8 608 47 05, +46 70 456 10 51


From bernd.web at gmail.com  Wed Dec  5 08:10:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 5 Dec 2007 14:10:04 +0100
Subject: [Bioperl-l] SimpleAlign is_flush
Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>

Hi,

SimpleAlign has an is_flush:
 Function  : Tells you whether the alignment is flush, i.e. all of the
same length
 Returns   : 1 or 0

I  noticed that a file with multiple fasta sequences with different
lengths has an is_flush  value of 1. Printing the "alignment" shows
that sequences are appended with "-" so that the all are the same
length. Does this mean that is_flush for alignments read in via
AlignIO is indeed always true and thus as such a so useful ?

(using bioperl version: 1.005002102)


Regards,
Bernd


From cjfields at uiuc.edu  Wed Dec  5 08:53:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 07:53:59 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>

Yes; it's a convenient way to make sure all seqs have the same length  
(including gaps).  Nice for checking when adding new seqs to an  
alignment or building new parsers.

chris

On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:

> Hi,
>
> SimpleAlign has an is_flush:
> Function  : Tells you whether the alignment is flush, i.e. all of the
> same length
> Returns   : 1 or 0
>
> I  noticed that a file with multiple fasta sequences with different
> lengths has an is_flush  value of 1. Printing the "alignment" shows
> that sequences are appended with "-" so that the all are the same
> length. Does this mean that is_flush for alignments read in via
> AlignIO is indeed always true and thus as such a so useful ?
>
> (using bioperl version: 1.005002102)
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From captainrave at hotmail.com  Wed Dec  5 07:37:02 2007
From: captainrave at hotmail.com (Captainrave)
Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <475574B3.8050700@sendu.me.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk>
Message-ID: <14170499.post@talk.nabble.com>


Thanks, it works great now.

Do any of you know if there is a tag to pull out CDS location. i.e. the
values such as 132...145 etc?  Those are all I need.  Also, is there anyway
to stop it reporting tag and value and literally JUST output the value?

Thanks!!!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From stefan.kirov at bms.com  Wed Dec  5 09:24:20 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:24:20 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4756B494.7020100@bms.com>

Jason,
When there is a gapless alignment we have a differently formatted output
from codeml:
kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc

seed used = 492211105
      3    141

ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA

And parsing this fails...
The next one has gaps and works fine:

kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc

seed used = 492252697

Before deleting alignment gaps
      2    162

ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
CTT GGT TCA GGA GGT CAG TTC CTG
ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
CCT GGT ACA GGA AAC AAG CTT CTG

I will send both whole files as an attachment with another mail (I do
not know if these are going to pass through).
My guess is that the whole _parse_summary method has to be re-worked as
there is no tag to look for before the sequences start. Ugly.
I am not sure what else could become broken if I try to fix it, so I
will leave it to you.
Stefan
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From stefan.kirov at bms.com  Wed Dec  5 09:35:23 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:35:23 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4756B72B.6000103@bms.com>

Here are the files.
Stefan
Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc.tar.gz
Type: application/x-gzip
Size: 3237 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment-0002.gz>

From aaron.j.mackey at gsk.com  Wed Dec  5 09:56:31 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 5 Dec 2007 09:56:31 -0500
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>
Message-ID: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>

Well, if you use AlignIO::fasta to read in a multi-fasta file of 
*unaligned* sequences, AlignIO::fasta makes the assumption that all of 
your sequences are aligned, and pads the ends of shorter sequences with 
gap characters (essentially, enforcing a rather silly, yet valid 
alignment).  The fact that is_flush() then returns 1 is secondary.

If you just want to read in an array of unaligned sequences, use 
SeqIO::fasta instead.  It doesn't really make much sense to use AlignIO 
for sequences that are not aligned ... conversely, if you *do* have 
aligned sequences in a multi-fasta file, then it does make sense to use 
AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences 
with gaps as necessary to get a fully valid, flush multiple sequence 
alignment matrix.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:

> Yes; it's a convenient way to make sure all seqs have the same length 
> (including gaps).  Nice for checking when adding new seqs to an 
> alignment or building new parsers.
> 
> chris
> 
> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
> 
> > Hi,
> >
> > SimpleAlign has an is_flush:
> > Function  : Tells you whether the alignment is flush, i.e. all of the
> > same length
> > Returns   : 1 or 0
> >
> > I  noticed that a file with multiple fasta sequences with different
> > lengths has an is_flush  value of 1. Printing the "alignment" shows
> > that sequences are appended with "-" so that the all are the same
> > length. Does this mean that is_flush for alignments read in via
> > AlignIO is indeed always true and thus as such a so useful ?
> >
> > (using bioperl version: 1.005002102)
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Dec  5 11:22:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 10:22:01 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
References: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
Message-ID: <EC064917-220F-4579-8FA9-934026D7D105@uiuc.edu>

That's true.  I assumed Bernd's seqs were aligned.

chris

On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote:

> Well, if you use AlignIO::fasta to read in a multi-fasta file of
> *unaligned* sequences, AlignIO::fasta makes the assumption that all of
> your sequences are aligned, and pads the ends of shorter sequences  
> with
> gap characters (essentially, enforcing a rather silly, yet valid
> alignment).  The fact that is_flush() then returns 1 is secondary.
>
> If you just want to read in an array of unaligned sequences, use
> SeqIO::fasta instead.  It doesn't really make much sense to use  
> AlignIO
> for sequences that are not aligned ... conversely, if you *do* have
> aligned sequences in a multi-fasta file, then it does make sense to  
> use
> AlignIO, and it also makes sense for AlignIO::fasta to end-pad  
> sequences
> with gaps as necessary to get a fully valid, flush multiple sequence
> alignment matrix.
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:
>
>> Yes; it's a convenient way to make sure all seqs have the same length
>> (including gaps).  Nice for checking when adding new seqs to an
>> alignment or building new parsers.
>>
>> chris
>>
>> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
>>
>>> Hi,
>>>
>>> SimpleAlign has an is_flush:
>>> Function  : Tells you whether the alignment is flush, i.e. all of  
>>> the
>>> same length
>>> Returns   : 1 or 0
>>>
>>> I  noticed that a file with multiple fasta sequences with different
>>> lengths has an is_flush  value of 1. Printing the "alignment" shows
>>> that sequences are appended with "-" so that the all are the same
>>> length. Does this mean that is_flush for alignments read in via
>>> AlignIO is indeed always true and thus as such a so useful ?
>>>
>>> (using bioperl version: 1.005002102)
>>>
>>>
>>> Regards,
>>> Bernd
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Dec  5 14:56:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 14:56:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4757027F.407@bms.com>

Here is a patch that seems to be working and does not break the existing
tests:

--- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
10:16:53.120720000 -0500
+++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm  
2007-12-05 14:46:31.436278000 -0500
@@ -419,7 +419,10 @@
     # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
 
     my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
YN00 )x;
+    my $line;
+    $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0;
     while ($_ = $self->_readline) {
+           $line++;
        if ( m/^($SEQTYPES) \s+                      # seqtype: CODONML,
AAML, BASEML, CODON2AAML, YN00, etc
               (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
3.12 February 2002"; not present < 3.1 or YN00
               (\S+) \s*                             # tree filename
@@ -436,8 +439,11 @@
        } elsif (m/^Data set \d$/) {
            $self->{'_summary'} = {};
            $self->{'_summary'}->{'multidata'}++;
-       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
-           my ($phylip_header) = $self->_readline;
+       }
+       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
+               my ($phylip_header) = $self->_readline;
+               $self->_parse_seqs;
+       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap
            $self->_parse_seqs;
        }
     }
@@ -681,7 +687,6 @@
 }
 
 sub _parse_seqs {
-
     # this should in fact be packed into a Bio::SimpleAlign object
instead of
     # an array but we'll stay with this for now
     my ($self) = @_;


What this does is trigger sequence parsing if the /Before.../ pattern is
not seen until line 4. Since phylip_header seems to be doing nothing one
could completely eliminate the first seq parse elsif (even though
counting lines is not a good thing).
 Since I am not aware of all consequences of changing the sequence
parsing and I have no idea how extensive the tests are, I am not
committing anything, but feel free to use that if you wish.
Stefan

Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From jason at bioperl.org  Wed Dec  5 15:01:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Dec 2007 12:01:29 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4757027F.407@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>

sounds good - can you
- make it as a bug with the patch and sample files in bugzilla
- commit changes and I'll test as well

thanks,
-j

On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:

> Here is a patch that seems to be working and does not break the  
> existing
> tests:
>
> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
> 10:16:53.120720000 -0500
> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
> 2007-12-05 14:46:31.436278000 -0500
> @@ -419,7 +419,10 @@
>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>
>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
> YN00 )x;
> +    my $line;
> +    $self->{'_already_parsed_seqs'}=$self-> 
> {'_already_parsed_seqs'}?1:0;
>      while ($_ = $self->_readline) {
> +           $line++;
>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
> CODONML,
> AAML, BASEML, CODON2AAML, YN00, etc
>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
> 3.12 February 2002"; not present < 3.1 or YN00
>                (\S+) \s*                             # tree filename
> @@ -436,8 +439,11 @@
>         } elsif (m/^Data set \d$/) {
>             $self->{'_summary'} = {};
>             $self->{'_summary'}->{'multidata'}++;
> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
> -           my ($phylip_header) = $self->_readline;
> +       }
> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
> +               my ($phylip_header) = $self->_readline;
> +               $self->_parse_seqs;
> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
> {#No gap
>             $self->_parse_seqs;
>         }
>      }
> @@ -681,7 +687,6 @@
>  }
>
>  sub _parse_seqs {
> -
>      # this should in fact be packed into a Bio::SimpleAlign object
> instead of
>      # an array but we'll stay with this for now
>      my ($self) = @_;
>
>
> What this does is trigger sequence parsing if the /Before.../  
> pattern is
> not seen until line 4. Since phylip_header seems to be doing  
> nothing one
> could completely eliminate the first seq parse elsif (even though
> counting lines is not a good thing).
>  Since I am not aware of all consequences of changing the sequence
> parsing and I have no idea how extensive the tests are, I am not
> committing anything, but feel free to use that if you wish.
> Stefan
>
> Stefan Kirov wrote:
>> Jason,
>> When there is a gapless alignment we have a differently formatted  
>> output
>> from codeml:
>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>
>> seed used = 492211105
>>       3    141
>>
>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>> ACC CAC
>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGT CTG
>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>> ACC CAC
>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGC CTG
>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>> ACC CAC
>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>> AGC ATG
>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>> ACC CTC ATA
>>
>> And parsing this fails...
>> The next one has gaps and works fine:
>>
>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>
>> seed used = 492252697
>>
>> Before deleting alignment gaps
>>       2    162
>>
>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>> GCA GAA
>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>> CCG AAC
>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>> GAT CTC
>> CTT GGT TCA GGA GGT CAG TTC CTG
>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>> GCA GAA
>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>> CCA ACT
>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>> --- ATT
>> CCT GGT ACA GGA AAC AAG CTT CTG
>>
>> I will send both whole files as an attachment with another mail (I do
>> not know if these are going to pass through).
>> My guess is that the whole _parse_summary method has to be re- 
>> worked as
>> there is no tag to look for before the sequences start. Ugly.
>> I am not sure what else could become broken if I try to fix it, so I
>> will leave it to you.
>> Stefan
>>
>>> should be fixed.
>>>
>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>> revision 1.56
>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>> +21 -14
>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>> order for the sequences and summary results in
>>> the top of the MLC files
>>>
>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>
>>>
>>>> Jason Stajich wrote:
>>>>
>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>> things in
>>>>> the result file has changed.  Now sequences precede the  
>>>>> information
>>>>> about the version or the program run.  This means that $result-
>>>>>
>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>
>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>> programs it is brittle when file formats change.  Th
>>>>>
>>>>> -jason
>>>>>
>>>>> -- 
>>>>> Jason Stajich
>>>>> jason at bioperl.org
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> Jason,
>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>> assume
>>>> this is not fixed, am I correct?
>>>> Thanks!
>>>> Stefan
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From stefan.kirov at bms.com  Wed Dec  5 15:33:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 15:33:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
	<8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
Message-ID: <47570B2B.5090602@bms.com>

Done.

Jason Stajich wrote:
> sounds good - can you
> - make it as a bug with the patch and sample files in bugzilla
> - commit changes and I'll test as well
>
> thanks,
> -j
>
> On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:
>
>   
>> Here is a patch that seems to be working and does not break the  
>> existing
>> tests:
>>
>> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
>> 10:16:53.120720000 -0500
>> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
>> 2007-12-05 14:46:31.436278000 -0500
>> @@ -419,7 +419,10 @@
>>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>>
>>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
>> YN00 )x;
>> +    my $line;
>> +    $self->{'_already_parsed_seqs'}=$self-> 
>> {'_already_parsed_seqs'}?1:0;
>>      while ($_ = $self->_readline) {
>> +           $line++;
>>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
>> CODONML,
>> AAML, BASEML, CODON2AAML, YN00, etc
>>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
>> 3.12 February 2002"; not present < 3.1 or YN00
>>                (\S+) \s*                             # tree filename
>> @@ -436,8 +439,11 @@
>>         } elsif (m/^Data set \d$/) {
>>             $self->{'_summary'} = {};
>>             $self->{'_summary'}->{'multidata'}++;
>> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
>> -           my ($phylip_header) = $self->_readline;
>> +       }
>> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
>> +               my ($phylip_header) = $self->_readline;
>> +               $self->_parse_seqs;
>> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
>> {#No gap
>>             $self->_parse_seqs;
>>         }
>>      }
>> @@ -681,7 +687,6 @@
>>  }
>>
>>  sub _parse_seqs {
>> -
>>      # this should in fact be packed into a Bio::SimpleAlign object
>> instead of
>>      # an array but we'll stay with this for now
>>      my ($self) = @_;
>>
>>
>> What this does is trigger sequence parsing if the /Before.../  
>> pattern is
>> not seen until line 4. Since phylip_header seems to be doing  
>> nothing one
>> could completely eliminate the first seq parse elsif (even though
>> counting lines is not a good thing).
>>  Since I am not aware of all consequences of changing the sequence
>> parsing and I have no idea how extensive the tests are, I am not
>> committing anything, but feel free to use that if you wish.
>> Stefan
>>
>> Stefan Kirov wrote:
>>     
>>> Jason,
>>> When there is a gapless alignment we have a differently formatted  
>>> output
>>> from codeml:
>>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>>
>>> seed used = 492211105
>>>       3    141
>>>
>>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>>> ACC CAC
>>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGT CTG
>>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>>> ACC CAC
>>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGC CTG
>>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>>> ACC CAC
>>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>>> AGC ATG
>>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>>> ACC CTC ATA
>>>
>>> And parsing this fails...
>>> The next one has gaps and works fine:
>>>
>>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>>
>>> seed used = 492252697
>>>
>>> Before deleting alignment gaps
>>>       2    162
>>>
>>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>>> GCA GAA
>>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>>> CCG AAC
>>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>>> GAT CTC
>>> CTT GGT TCA GGA GGT CAG TTC CTG
>>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>>> GCA GAA
>>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>>> CCA ACT
>>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>>> --- ATT
>>> CCT GGT ACA GGA AAC AAG CTT CTG
>>>
>>> I will send both whole files as an attachment with another mail (I do
>>> not know if these are going to pass through).
>>> My guess is that the whole _parse_summary method has to be re- 
>>> worked as
>>> there is no tag to look for before the sequences start. Ugly.
>>> I am not sure what else could become broken if I try to fix it, so I
>>> will leave it to you.
>>> Stefan
>>>
>>>       
>>>> should be fixed.
>>>>
>>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>>> revision 1.56
>>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>>> +21 -14
>>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>>> order for the sequences and summary results in
>>>> the top of the MLC files
>>>>
>>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>>
>>>>
>>>>         
>>>>> Jason Stajich wrote:
>>>>>
>>>>>           
>>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>>> things in
>>>>>> the result file has changed.  Now sequences precede the  
>>>>>> information
>>>>>> about the version or the program run.  This means that $result-
>>>>>>
>>>>>>             
>>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>>
>>>>>>>               
>>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>>> programs it is brittle when file formats change.  Th
>>>>>>
>>>>>> -jason
>>>>>>
>>>>>> -- 
>>>>>> Jason Stajich
>>>>>> jason at bioperl.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Jason,
>>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>>> assume
>>>>> this is not fixed, am I correct?
>>>>> Thanks!
>>>>> Stefan
>>>>>
>>>>>           
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From bernd.web at gmail.com  Thu Dec  6 09:58:31 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 6 Dec 2007 15:58:31 +0100
Subject: [Bioperl-l] graphics - Panel
Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com>

Hi,

For map $segstart is available. This holds the left most start of the
feature (The left end of $ref displayed in the detailed view).
However, is it accessible also for track coderefs?
I'd like to access it in add_track, like
  -bgcolor => sub {
 				my $feature = shift;
                                my $start = $feature->segstart;			
                                 ....
                                 do something with the segstart
                                  },

I realize I can add a -tag which holds the left most start of by
segmented feature, and then get it out in from $feature, but I wonder
if the $segstart can also be accessed in the coderef some how.

Does someone know this?

Best regards,
Bernd


From georose at gmail.com  Thu Dec  6 10:28:24 2007
From: georose at gmail.com (geo rose)
Date: Thu, 6 Dec 2007 08:28:24 -0700
Subject: [Bioperl-l] getting sequences from external databank
Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>

Hi Bioperl,

In the past, I have been able to retrieve sequences from an external
databank, but my scripts are not working anymore.
I am afraid that I may have broken my Bioperl installation while updating my
Fedora7 machine with yum update.

Below is an example of what happens.

The script is from
http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and
it works.
(I used it on an older machine with Bioperl and MacOS Tiger)

__________________________________________________________________________________
#!/usr/bin/perl -w

use Bio::SeqIO;
use Bio::DB::GenBank;

$genBank = new Bio::DB::GenBank;  # This object knows how to talk to GenBank

my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by accession


my $seqOut = new Bio::SeqIO(-format => 'genbank');

$seqOut->write_seq($seq);


_________________________________________________________________________________________
This is the error I get
_________________________________________________________________________________________

[home at home Desktop]# perl final-seq-db-test1.pl
Bio::SeqIO: genbank cannot be found
Exception
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not
implemented in the version of perl at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.

STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Root::Root::_load_module
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
STACK: Bio::SeqIO::_load_format_module
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------

For more information about the SeqIO system please see the SeqIO docs.
This includes ways of checking for formats at compile time, not run time

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc AF060485 does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------
[home at home Desktop]# Use of uninitialized value in concatenation (.) or
string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm
line 30.

[home at home Desktop]#


________________________________________________________________________________________


Before I mess things up further I thought I'd ask:
Can I fix this problem by reinstalling some part of Bioperl or Perl?

Thanks,

George


From barry.moore at genetics.utah.edu  Thu Dec  6 12:56:50 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 6 Dec 2007 10:56:50 -0700
Subject: [Bioperl-l] getting sequences from external databank
In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
Message-ID: <B13872F3-4591-4FB6-B057-9215C5DA9059@genetics.utah.edu>

George,

This is a hideous little bug in Red Hat/Fedora installations of  
perl.  It's happened to me a couple time on upgrades, but it's always  
fixed with

perl -MCPAN -e shell
force install Scalar::Util

http://www.perlmonks.org/?node_id=460411

Barry

On Dec 6, 2007, at 8:28 AM, geo rose wrote:

> Hi Bioperl,
>
> In the past, I have been able to retrieve sequences from an external
> databank, but my scripts are not working anymore.
> I am afraid that I may have broken my Bioperl installation while  
> updating my
> Fedora7 machine with yum update.
>
> Below is an example of what happens.
>
> The script is from
> http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ 
> node2.html and
> it works.
> (I used it on an older machine with Bioperl and MacOS Tiger)
>
> ______________________________________________________________________ 
> ____________
> #!/usr/bin/perl -w
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $genBank = new Bio::DB::GenBank;  # This object knows how to talk  
> to GenBank
>
> my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by  
> accession
>
>
> my $seqOut = new Bio::SeqIO(-format => 'genbank');
>
> $seqOut->write_seq($seq);
>
>
> ______________________________________________________________________ 
> ___________________
> This is the error I get
> ______________________________________________________________________ 
> ___________________
>
> [home at home Desktop]# perl final-seq-db-test1.pl
> Bio::SeqIO: genbank cannot be found
> Exception
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are  
> not
> implemented in the version of perl at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Root::Root::_load_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
> STACK: Bio::SeqIO::_load_format_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
> STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
>
> For more information about the SeqIO system please see the SeqIO docs.
> This includes ways of checking for formats at compile time, not run  
> time
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc AF060485 does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
> [home at home Desktop]# Use of uninitialized value in concatenation  
> (.) or
> string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ 
> Util.pm
> line 30.
>
> [home at home Desktop]#
>
>
> ______________________________________________________________________ 
> __________________
>
>
> Before I mess things up further I thought I'd ask:
> Can I fix this problem by reinstalling some part of Bioperl or Perl?
>
> Thanks,
>
> George
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Dec  6 18:58:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 7 Dec 2007 10:58:02 +1100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>

Sven,

> I just started working with Perl and BioPerl. I'm quite impressed what
> can be easily done with this module. Today I found that my second CPU
> ist not used, but the first one run's at 100%. I tried to include the
> "-a"-parameter, but I was not successful:

My experience agrees with you, in that "-a" does not seem to work with
the pre-compiled BLAST binaries you get from NCBI on a multi-core
system.

I'm not sure why, as "ldd blastall" shows it links against
"/lib64/tls/libpthread.so.0".

Any others have any ideas?

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University
--Tel +61 3 9905 9010


From lzhtom at hotmail.com  Thu Dec  6 23:25:42 2007
From: lzhtom at hotmail.com (zhihuali)
Date: Fri, 7 Dec 2007 04:25:42 +0000
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
Message-ID: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>


Hi netters,
 
I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object:
 
use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',                             -user=>'annoymous',                             -dbname=>'bioseqdb');
 
my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -id=>"test1",                      -seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit;
 
It's successful because I found corresponding rows in the bioseqdb tables.
 
Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object?
 
Thanks a lot!
 
Zhihua Li
_________________________________________________________________
?? Live Search ??????????????
http://www.live.com/?searchOnly=true


From Marc.Logghe at ablynx.com  Fri Dec  7 03:33:17 2007
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Dec 2007 09:33:17 +0100
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>
Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com>

Hi,
The BOSC presentation of Hilmar is a very good way to start with.
Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf
Slide 18 for instance.
Regards,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of zhihuali
> Sent: vrijdag 7 december 2007 5:26
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
> 
> 
> Hi netters,
> 
> I've installed BioSQL and bioperl-db, and successfully created and stored
> a persistent object:
> 
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',
> -user=>'annoymous',                             -dbname=>'bioseqdb');
> 
> my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -
> id=>"test1",                      -seq=>"AGCTAGCT",                      -
> version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj-
> >create;$dbobj->commit;
> 
> It's successful because I found corresponding rows in the bioseqdb tables.
> 
> Now I want to retrieve the object back from the database. There's not much
> documents available and I've tried find_by_unique_key/primary_key but all
> failed. Maybe I didn't use them correctly. Could anyone give me an example
> as how to retrieve the stored Bio::Seq object?
> 
> Thanks a lot!
> 
> Zhihua Li
> _________________________________________________________________
> ?? Live Search ??????????????
> http://www.live.com/?searchOnly=true


From avilella at gmail.com  Fri Dec  7 05:32:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 7 Dec 2007 10:32:43 +0000
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
In-Reply-To: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
References: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com>

Hi Johan,

It would be great if you could upload an example reproducible case:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

Maybe simply doing a tar.gz of the directory with the sample files and
the script, and a simple
explanation on how to run it. If you have any special "env" vars
regarding tmp files, could you
specify those as well?

Thanks,

    Albert.

On Dec 5, 2007 11:35 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Hello,
>
> I have a bunch of multiple sequence alignments of protein coding genes,
> which I would like to analyse with the SLAC method of the HyPhy package. I
> tried using the SLAC.pm module in bioperl-run, but I could not get it to
> work properly.
>
> Basically, for each MSA file, I create the Bio::Tree::Tree and
> Bio::SimpleAlign objects ($tree and $aln, respectively) required as
> arguments to SLAC, and call the method with: "($rc,$result) =
> $slac->run($aln,$tree)" in a loop procedure in my script.
>
> When I choose not to save the tmp files (the default option in SLAC.pm),
> the program complains that it cannot find the file
> "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
> (which works fine). Apparently, it looks for the wrapper.bf file in the
> first tmp dir created, which is deleted in the end of the first SLAC call.
>
> If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
> all calls to SLAC give returncode 1, and no error message is received.
> However, when I look at the resulting $result hashref, it turns out that
> all results are for the FIRST alignment read. I've made sure there is
> nothing strange with my loop procedure, and I checked that the tree and
> alignment objects look OK for each MSA. Apparently, it does create new
> "results.tsv" files in the tmp directory after each run, but it is
> identical each time it's created. Also, it only creates ONE tmp directory,
> no matter how many times SLAC is executed (I would imagine it was supposed
> to save each result in separate tmp dirs?)
>
> Thus, it seems to me like the errors occur because something goes wrong in
> the creation of temporary files. Have I done something wrong here, or have
> any other of you experienced the same problem?
>
> Best regards
> /Johan
>
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From J.Hane at murdoch.edu.au  Mon Dec 10 02:31:17 2007
From: J.Hane at murdoch.edu.au (James Hane)
Date: Mon, 10 Dec 2007 16:31:17 +0900
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>

I've been trying to compile some bioperl based scripts for win32 using
perl2exe which have worked out really well - except I've noticed I
cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic
despite requiring perl2exe to include them.  Anyone have any suggestions
how to get these to compile?


From Kevin.M.Brown at asu.edu  Mon Dec 10 10:34:35 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 08:34:35 -0700
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
	<477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu>

I use PAR to create exe's for windows users and it works fine with
bioperl. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane
> Sent: Monday, December 10, 2007 12:31 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
> 
> I've been trying to compile some bioperl based scripts for win32 using
> perl2exe which have worked out really well - except I've noticed I
> cannot compile Align::IO, Bio::Location::Simple or 
> Bio::Location::Atomic
> despite requiring perl2exe to include them.  Anyone have any 
> suggestions
> how to get these to compile?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Kevin.M.Brown at asu.edu  Mon Dec 10 13:23:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 11:23:01 -0700
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU +
	avoidBLAST reload
In-Reply-To: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
References: <47545590.1000703@boekhoff.info>
	<a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu>

I use the -a option with blast all the time and it works, even on
multicore systems. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Torsten Seemann
> Sent: Thursday, December 06, 2007 4:58 PM
> To: Sven Boekhoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one 
> CPU + avoidBLAST reload
> 
> Sven,
> 
> > I just started working with Perl and BioPerl. I'm quite 
> impressed what
> > can be easily done with this module. Today I found that my 
> second CPU
> > ist not used, but the first one run's at 100%. I tried to 
> include the
> > "-a"-parameter, but I was not successful:
> 
> My experience agrees with you, in that "-a" does not seem to work with
> the pre-compiled BLAST binaries you get from NCBI on a multi-core
> system.
> 
> I'm not sure why, as "ldd blastall" shows it links against
> "/lib64/tls/libpthread.so.0".
> 
> Any others have any ideas?
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> --Tel +61 3 9905 9010
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From nadav.denekamp at gmail.com  Wed Dec 12 08:29:18 2007
From: nadav.denekamp at gmail.com (Nadav Y. Denekamp)
Date: Wed, 12 Dec 2007 15:29:18 +0200
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>

Hello,

I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic
thanks - Nadav


------------------------------------------------------------------------------------------------------------
Nadav Y. Denekamp, Ph.D.,
Israel Oceanographic and Limnological Research,
National Institute for Oceanography 
Tel-Shikmona, Haifa, 31080.
Tel: 972-4-8565259
Fax: 972-4-8511911
mobile: 972-50-2167318
Skype: nadavden
Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;

Visit the ?Sleeping Beauty? website: 
http://www.gmm.gu.se/SB


From biojoiner at gmail.com  Wed Dec 12 08:06:42 2007
From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=)
Date: Wed, 12 Dec 2007 21:06:42 +0800
Subject: [Bioperl-l] problem_About_Bioperl_Installation
Message-ID: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>

Dear Admin:

    I have a computer which out of network service, but wanted to have
bioperl installed in it.
    I found the installation method all need net to link CPAN to get the
pakage needed, so is there some complete installation program for me to
install it in a net-isolated computer, or some other method to solve the
problom?
    Wait for your kindful answer.
     Thanks very much!

-- 

============================================================
????

??????????????????????????HapMap??
??????????????????????B??6????
??????+86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/

***********************************************************************************************
Feng Cheng

Division of HapMap Project
Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
Beijing Airport Industrial Zone B-6, Beijing, 101318, China
Tel: +86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/
============================================================


From avilella at gmail.com  Wed Dec 12 09:50:16 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 12 Dec 2007 14:50:16 +0000
Subject: [Bioperl-l] problem_About_Bioperl_Installation
In-Reply-To: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
References: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com>

You can also download the tar.gz packages from the bioperl.org
website, and copy them to the computer. Then unpack
the tar.gzs, and update your PERL5LIB env var.

On Dec 12, 2007 1:06 PM, ???? <biojoiner at gmail.com> wrote:
> Dear Admin:
>
>     I have a computer which out of network service, but wanted to have
> bioperl installed in it.
>     I found the installation method all need net to link CPAN to get the
> pakage needed, so is there some complete installation program for me to
> install it in a net-isolated computer, or some other method to solve the
> problom?
>     Wait for your kindful answer.
>      Thanks very much!
>
> --
>
> ============================================================
> ????
>
> ??????????????????????????HapMap??
> ??????????????????????B??6????
> ??????+86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
>
> ***********************************************************************************************
> Feng Cheng
>
> Division of HapMap Project
> Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
> Beijing Airport Industrial Zone B-6, Beijing, 101318, China
> Tel: +86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
> ============================================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 12 10:22:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 12 Dec 2007 09:22:45 -0600
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
Message-ID: <E95ADE14-FF71-4068-B958-60BD1EEEBF3C@uiuc.edu>

If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA  
files) then you can write up your own script.  From 'perldoc  
Bio::Index::Fasta':

# Once the index is made it can accessed, either in the
# same script or a different one
use Bio::Index::Fasta;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name);
my $out = Bio::SeqIO?>new(?format => ?Fasta?,
                           ?fh => \*STDOUT);

foreach my $id (@ARGV) {
     my $seq = $inx?>fetch($id); # Returns Bio::Seq object
          $out?>write_seq($seq);
}

# or, alternatively
my $id;
my $seq = $inx?>get_Seq_by_id($id); # identical to fetch()


....

chris

On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote:

> Hello,
>
> I am trying to retrieve a list of sequences from an indexed flast  
> FASTA file. I tried to use the script bp_fetch.pl but I could only  
> retrieve one sequence for one identifier. I am looking for a way to  
> provide a list of accession numbers to a script and to retrieve the  
> sequences. I don't have much experience with perl so I appologize if  
> this question is very basic
> thanks - Nadav
>
>
> ------------------------------------------------------------------------------------------------------------
> Nadav Y. Denekamp, Ph.D.,
> Israel Oceanographic and Limnological Research,
> National Institute for Oceanography
> Tel-Shikmona, Haifa, 31080.
> Tel: 972-4-8565259
> Fax: 972-4-8511911
> mobile: 972-50-2167318
> Skype: nadavden
> Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;
>
> Visit the ?Sleeping Beauty? website:
> http://www.gmm.gu.se/SB
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From karchana at ibab.ac.in  Thu Dec 13 22:56:14 2007
From: karchana at ibab.ac.in (Information_details)
Date: Thu, 13 Dec 2007 19:56:14 -0800 (PST)
Subject: [Bioperl-l]  How to get the contents?
Message-ID: <14329679.post@talk.nabble.com>


Hi,

I am new to bioperl.

I am using module  Bio::SeqIO;

I have genbank file. http://www.nabble.com/file/p14329679/seq.gb seq.gb 

In this file i have to match gene tag and get all its contents.

which function i have to use?

The gene portion look like this

 gene            1..485
                     /gene="PRM1"
                     /note="Derived by automated computational analysis
using
                     gene prediction method: BestRefseq. Supporting evidence
                     includes similarity to: 1 mRNA"
                     /db_xref="GeneID:5619"
                     /db_xref="HGNC:9447"

i have to match gene tag and get its contents?

[CODE]
$seq=$seqobj->next_seq();

foreach $feat ($seq->get_all_SeqFeatures())
 {
        if($feat->primary_tag eq "mRNA")
        {
                foreach $tag ($feat->get_all_tags())
                {
                        if($tag eq "gene")
                        {
                            #here i have to retrieve the information like
this.
                           1..485
                         /gene="PRM1"
                        }
                 }
         }
[/CODE]
How do i do that?  

with regards
Archana


-- 
View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From mike.thon at gmail.com  Fri Dec 14 12:41:44 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 14 Dec 2007 18:41:44 +0100
Subject: [Bioperl-l] How to get the contents?
In-Reply-To: <14329679.post@talk.nabble.com>
References: <14329679.post@talk.nabble.com>
Message-ID: <9F93893E-182A-4A5F-B27C-089521CAA355@gmail.com>

Hi Information_details, a.k.a. Archana :)

"1", and "485" can be retrieved with something like:

$feat->start();
$feat->end();

if you want start and end of each exon then you need:
my $location = $feat->location();

which returns a Bio::LocationI object.

I think the 'gene' tag is a tag-value pair that  can be retrieved with:

my @values = $feat->get_tag_values("gene");

-Mike


On Dec 14, 2007, at 4:56 AM, Information_details wrote:

>
> Hi,
>
> I am new to bioperl.
>
> I am using module  Bio::SeqIO;
>
> I have genbank file. http://www.nabble.com/file/p14329679/seq.gb  
> seq.gb
>
> In this file i have to match gene tag and get all its contents.
>
> which function i have to use?
>
> The gene portion look like this
>
> gene            1..485
>                     /gene="PRM1"
>                     /note="Derived by automated computational analysis
> using
>                     gene prediction method: BestRefseq. Supporting  
> evidence
>                     includes similarity to: 1 mRNA"
>                     /db_xref="GeneID:5619"
>                     /db_xref="HGNC:9447"
>
> i have to match gene tag and get its contents?
>
> [CODE]
> $seq=$seqobj->next_seq();
>
> foreach $feat ($seq->get_all_SeqFeatures())
> {
>        if($feat->primary_tag eq "mRNA")
>        {
>                foreach $tag ($feat->get_all_tags())
>                {
>                        if($tag eq "gene")
>                        {
>                            #here i have to retrieve the information  
> like
> this.
>                           1..485
>                         /gene="PRM1"
>                        }
>                 }
>         }
> [/CODE]
> How do i do that?
>
> with regards
> Archana
>
>
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Dec 15 10:15:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 15 Dec 2007 09:15:00 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] CVS freeze
Message-ID: <9FE0873D-E009-42E6-B37A-32584655ED06@uiuc.edu>

All,

We are in the midst of switching over BioPerl from CVS to SVN.  We are  
tentatively freezing the bioperl CVS repository Dec. 19 in order to  
prepare for the switch.  At that time we plan on building and setting  
up the SVN repository, running some remedial tests (commit messages,  
etc), then announcing the switch on the list.  Soon after we will try  
getting a sync'ed read-only CVS set up for legacy purposes.

If anyone has any commits to add to the repository we suggest making  
them as soon as possible.

chris


From margots at mail.nih.gov  Tue Dec 18 10:00:11 2007
From: margots at mail.nih.gov (Margot Sunshine)
Date: Tue, 18 Dec 2007 15:00:11 +0000 (UTC)
Subject: [Bioperl-l] bio-perl cvs freeze
Message-ID: <loom.20071218T145502-552@post.gmane.org>

Hi,

I have been trying to checkout bio-perl from cvs since yesterday afternoon 
(Dec 17). My request just hangs. I can login but I cannot checkout anything. 
My reading of your posting of the planned switch from CVS to SVN seemed to 
indicate that this was not to take place until tomorrow. Help!

Thanks,
Margot Sunshine


From ste.ghi at libero.it  Tue Dec 18 13:04:21 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Tue, 18 Dec 2007 19:04:21 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>

Dear all,
  I'm facing with a really annoying problem regarding large files handling.
I wrote a script (below) which should keep sequences from an embl formatted file and write out the sequences in a customized fasta format. The script works, but since the input file is rather big 5.6 GB unzipped (987 MB zipped), after a while all the physical and virtual memories of my workstation (4GB RAM) are filled and the script is killed...
I really don't know how to avoid this huge memory usage...and now I'm wondering if this is the right approach....
Please help me!
Best wishes,
Stefano 


#################
#!/usr/bin/perl -w

use strict;

use warnings;

use Fcntl;
use Cwd;

use Bio::SeqIO;

my $infile = $ARGV[0];
my $outfile = "$ARGV[0].fasta";
my $organism;
my $count;
my $path = cwd()."/$outfile";

print "Working dir is: ".cwd().".\nCreating file: $path\n";

my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -format => 'EMBL');

while ( my $seq = $in->next_seq() ) {
	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);  
	my $id = $seq->accession_number();	
	my $desc = $seq->desc(); chop $desc;
	my $species = $seq->species->binomial();
	my $subspecies = $seq->species->sub_species();
	if ($seq->species->sub_species()) {chop $subspecies; $organism = $species." ".$subspecies;}
		else {$organism = $species;}
	my $sequence = $seq->seq();
	print TO ">$id $desc [$organism]\n$sequence\n";
    	$count++;
	warn $@ if $@;
	close TO;
}

print "Done!\n\t$count sequences have been treated. The file $ARGV[0].fasta is ready.\n";


From jason at bioperl.org  Tue Dec 18 13:22:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:22:07 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <loom.20071218T145502-552@post.gmane.org>
References: <loom.20071218T145502-552@post.gmane.org>
Message-ID: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>

Margot -
The code freeze won't affect the the anonymous cvs, and we'll likely  
keep anonymous CVS as is (and maybe even figure out how to keep it  
updated with the SVN) since external tools depend on it and have  
published CVS instructions.

I was able to do an anonymous checkout fine on my machine just now --  
if the problem persists please send a message to support at open-bio.org  
and the support volunteers will track it from there.

-jason
On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:

> Hi,
>
> I have been trying to checkout bio-perl from cvs since yesterday  
> afternoon
> (Dec 17). My request just hangs. I can login but I cannot checkout  
> anything.
> My reading of your posting of the planned switch from CVS to SVN  
> seemed to
> indicate that this was not to take place until tomorrow. Help!
>
> Thanks,
> Margot Sunshine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Dec 18 13:31:39 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:31:39 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>

Not exactly clear why you aren't using Bio::SeqIO to write the  
sequence back out in FASTA format and why you are re-opening the file  
each time?

Did you look at the examples that show how to convert file formats?
http://bioperl.org/wiki/HOWTO:SeqIO

You can set the description with
$seq->description($newdescription);
and the ID with
$seq->display_id($newid);
before writing.

It isn't clear to me from your code why it would be leaking memory  
and causing a problem - is it possible that you have a huge sequence  
in the EMBL file?

-jason
On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:

> Dear all,
>   I'm facing with a really annoying problem regarding large files  
> handling.
> I wrote a script (below) which should keep sequences from an embl  
> formatted file and write out the sequences in a customized fasta  
> format. The script works, but since the input file is rather big  
> 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> virtual memories of my workstation (4GB RAM) are filled and the  
> script is killed...
> I really don't know how to avoid this huge memory usage...and now  
> I'm wondering if this is the right approach....
> Please help me!
> Best wishes,
> Stefano
>
>
>
> #################
> #!/usr/bin/perl -w
>
> use strict;
>
> use warnings;
>
> use Fcntl;
> use Cwd;
>
> use Bio::SeqIO;
>
> my $infile = $ARGV[0];
> my $outfile = "$ARGV[0].fasta";
> my $organism;
> my $count;
> my $path = cwd()."/$outfile";
>
> print "Working dir is: ".cwd().".\nCreating file: $path\n";
>
> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> 	my $id = $seq->accession_number();	
> 	my $desc = $seq->desc(); chop $desc;
> 	my $species = $seq->species->binomial();
> 	my $subspecies = $seq->species->sub_species();
> 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> $species." ".$subspecies;}
> 		else {$organism = $species;}
> 	my $sequence = $seq->seq();
> 	print TO ">$id $desc [$organism]\n$sequence\n";
>     	$count++;
> 	warn $@ if $@;
> 	close TO;
> }
>
> print "Done!\n\t$count sequences have been treated. The file $ARGV 
> [0].fasta is ready.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cain.cshl at gmail.com  Tue Dec 18 14:04:11 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:04:11 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
Message-ID: <1198004651.11000.19.camel@frissell>

Hi Jason and all,

Does the fact that cvs is sticking around (read only) mean that viewcvs
(the web interface) will stick around too?  I was thinking about
modifying the GBrowse net installer to use the 'automatic' tarball of
bioperl-live to download and install via nmake on Windows since it
doesn't have cvs support built in.  Also, with cvs sticking around, I
don't need to rewrite the installer to use svn (yeah!).

Thanks,
Scott

On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> Margot -
> The code freeze won't affect the the anonymous cvs, and we'll likely  
> keep anonymous CVS as is (and maybe even figure out how to keep it  
> updated with the SVN) since external tools depend on it and have  
> published CVS instructions.
> 
> I was able to do an anonymous checkout fine on my machine just now --  
> if the problem persists please send a message to support at open-bio.org  
> and the support volunteers will track it from there.
> 
> -jason
> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> 
> > Hi,
> >
> > I have been trying to checkout bio-perl from cvs since yesterday  
> > afternoon
> > (Dec 17). My request just hangs. I can login but I cannot checkout  
> > anything.
> > My reading of your posting of the planned switch from CVS to SVN  
> > seemed to
> > indicate that this was not to take place until tomorrow. Help!
> >
> > Thanks,
> > Margot Sunshine
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From jason at bioperl.org  Tue Dec 18 14:20:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 11:20:11 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <1198004651.11000.19.camel@frissell>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
Message-ID: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>


On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:

> Hi Jason and all,
>
> Does the fact that cvs is sticking around (read only) mean that  
> viewcvs
> (the web interface) will stick around too?  I was thinking about
> modifying the GBrowse net installer to use the 'automatic' tarball of
> bioperl-live to download and install via nmake on Windows since it
> doesn't have cvs support built in.  Also, with cvs sticking around, I
> don't need to rewrite the installer to use svn (yeah!).
>
Hey Scott -

Perhaps, there may be better tools with SVN anyways, we could also  
just instantiate a script that tarballed the already auto-updated  
code here (i think it syncs every hour):
http://bioperl.org/SRC/

We'll still playing around with this and I can't guarantee that we'll  
get the SVN commits back to CVS to work.

-jason
> Thanks,
> Scott
>
> On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
>> Margot -
>> The code freeze won't affect the the anonymous cvs, and we'll likely
>> keep anonymous CVS as is (and maybe even figure out how to keep it
>> updated with the SVN) since external tools depend on it and have
>> published CVS instructions.
>>
>> I was able to do an anonymous checkout fine on my machine just now --
>> if the problem persists please send a message to support at open-bio.org
>> and the support volunteers will track it from there.
>>
>> -jason
>> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
>>
>>> Hi,
>>>
>>> I have been trying to checkout bio-perl from cvs since yesterday
>>> afternoon
>>> (Dec 17). My request just hangs. I can login but I cannot checkout
>>> anything.
>>> My reading of your posting of the planned switch from CVS to SVN
>>> seemed to
>>> indicate that this was not to take place until tomorrow. Help!
>>>
>>> Thanks,
>>> Margot Sunshine
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD


From cain.cshl at gmail.com  Tue Dec 18 14:31:23 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:31:23 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
	<4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
Message-ID: <1198006283.11000.20.camel@frissell>

Cool.  For the moment, I'll just wait and see what happens :-)

Thanks,
Scott

On Tue, 2007-12-18 at 11:20 -0800, Jason Stajich wrote:
> On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:
> 
> > Hi Jason and all,
> >
> > Does the fact that cvs is sticking around (read only) mean that  
> > viewcvs
> > (the web interface) will stick around too?  I was thinking about
> > modifying the GBrowse net installer to use the 'automatic' tarball of
> > bioperl-live to download and install via nmake on Windows since it
> > doesn't have cvs support built in.  Also, with cvs sticking around, I
> > don't need to rewrite the installer to use svn (yeah!).
> >
> Hey Scott -
> 
> Perhaps, there may be better tools with SVN anyways, we could also  
> just instantiate a script that tarballed the already auto-updated  
> code here (i think it syncs every hour):
> http://bioperl.org/SRC/
> 
> We'll still playing around with this and I can't guarantee that we'll  
> get the SVN commits back to CVS to work.
> 
> -jason
> > Thanks,
> > Scott
> >
> > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> >> Margot -
> >> The code freeze won't affect the the anonymous cvs, and we'll likely
> >> keep anonymous CVS as is (and maybe even figure out how to keep it
> >> updated with the SVN) since external tools depend on it and have
> >> published CVS instructions.
> >>
> >> I was able to do an anonymous checkout fine on my machine just now --
> >> if the problem persists please send a message to support at open-bio.org
> >> and the support volunteers will track it from there.
> >>
> >> -jason
> >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> >>
> >>> Hi,
> >>>
> >>> I have been trying to checkout bio-perl from cvs since yesterday
> >>> afternoon
> >>> (Dec 17). My request just hangs. I can login but I cannot checkout
> >>> anything.
> >>> My reading of your posting of the planned switch from CVS to SVN
> >>> seemed to
> >>> indicate that this was not to take place until tomorrow. Help!
> >>>
> >>> Thanks,
> >>> Margot Sunshine
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From avilella at gmail.com  Tue Dec 18 15:33:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 18 Dec 2007 20:33:43 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
Message-ID: <358f4d650712181233q2a1627c3v6fb4e3e20b9f6c78@mail.gmail.com>

There is a Bio::SeqIO "largefasta" object that will use the hard-disk
for very large fasta files.

On Dec 18, 2007 6:31 PM, Jason Stajich <jason at bioperl.org> wrote:
> Not exactly clear why you aren't using Bio::SeqIO to write the
> sequence back out in FASTA format and why you are re-opening the file
> each time?
>
> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
>
> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
>
> It isn't clear to me from your code why it would be leaking memory
> and causing a problem - is it possible that you have a huge sequence
> in the EMBL file?
>
> -jason
>
> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
>
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files
> > handling.
> > I wrote a script (below) which should keep sequences from an embl
> > formatted file and write out the sequences in a customized fasta
> > format. The script works, but since the input file is rather big
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and
> > virtual memories of my workstation (4GB RAM) are filled and the
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> >       sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> >       my $id = $seq->accession_number();
> >       my $desc = $seq->desc(); chop $desc;
> >       my $species = $seq->species->binomial();
> >       my $subspecies = $seq->species->sub_species();
> >       if ($seq->species->sub_species()) {chop $subspecies; $organism =
> > $species." ".$subspecies;}
> >               else {$organism = $species;}
> >       my $sequence = $seq->seq();
> >       print TO ">$id $desc [$organism]\n$sequence\n";
> >       $count++;
> >       warn $@ if $@;
> >       close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Dec 18 21:29:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 18 Dec 2007 20:29:19 -0600
Subject: [Bioperl-l] perl 5.10 released
Message-ID: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>

The next major perl release, perl 5.10, has officially been released:

http://use.perl.org/article.pl?sid=07/12/18/195247

I'll try testing BioPerl with perl 5.10 and any relevant modules when  
I can; this may have to wait until after SVN migration.  If there are  
any interested parties who want to bioperl compatibility with perl  
5.10 feel free to post your results!

chris


From David.Messina at sbc.su.se  Wed Dec 19 11:44:06 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 10:44:06 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
Message-ID: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>

Hi everyone,

Perl 5.10 builds fine and passes all tests on my PB G4 running OS X 10.5.1.
Piece o' cake.

Here are results of testing BioPerl on this virgin install:

I downloaded the latest CVS tarball. I did 'perl Build.PL', which used CPAN
to install a bunch of dependencies. I then did 'Build test'. For the most
part everything was fine.

- Bio::Biblio::IO::medlinexml throws an exception because XML::Parser isn't
installed.

- RNA_SearchIO fails a few tests.

- Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception because
Graph::Directed isn't installed.

- Spidey fails one test.

And of course without the optional dependencies installed, many tests were
skipped.

I'll now go back and install the optional dependencies and do the network
tests, but it looks like for the most part we play nice with the new Perl.

Dave


From ste.ghi at libero.it  Wed Dec 19 11:45:15 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Wed, 19 Dec 2007 17:45:15 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>

> Not exactly clear why you aren't using Bio::SeqIO to write the  
> sequence back out in FASTA format and why you are re-opening the file  
> each time?
It was to avoid tho keep the out file always opened...

> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
yes I did...but I didn't realized how to set a customized description...

> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
Thanks for the hint. Anyway, just using the simple code reported to convert embl to fasta format, the results are the same...I remember you that I'm using a huge input file: the uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!

> It isn't clear to me from your code why it would be leaking memory  
> and causing a problem - is it possible that you have a huge sequence  
> in the EMBL file?
> -jason

At the end, I succeeded in the format conversion using this command:

gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
(/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
(/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'

(Thanks to Riccardo Percudani). It's not bioperl...but it works!

My best wishes,
Stefano


> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
> 
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files  
> > handling.
> > I wrote a script (below) which should keep sequences from an embl  
> > formatted file and write out the sequences in a customized fasta  
> > format. The script works, but since the input file is rather big  
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> > virtual memories of my workstation (4GB RAM) are filled and the  
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now  
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> > 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> > 	my $id = $seq->accession_number();	
> > 	my $desc = $seq->desc(); chop $desc;
> > 	my $species = $seq->species->binomial();
> > 	my $subspecies = $seq->species->sub_species();
> > 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> > $species." ".$subspecies;}
> > 		else {$organism = $species;}
> > 	my $sequence = $seq->seq();
> > 	print TO ">$id $desc [$organism]\n$sequence\n";
> >     	$count++;
> > 	warn $@ if $@;
> > 	close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV 
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Wed Dec 19 12:17:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:17:28 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
Message-ID: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>


On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:

>> Not exactly clear why you aren't using Bio::SeqIO to write the
>> sequence back out in FASTA format and why you are re-opening the file
>> each time?
> It was to avoid tho keep the out file always opened...
>
>> Did you look at the examples that show how to convert file formats?
>> http://bioperl.org/wiki/HOWTO:SeqIO
> yes I did...but I didn't realized how to set a customized  
> description...
>
>> You can set the description with
>> $seq->description($newdescription);
>> and the ID with
>> $seq->display_id($newid);
>> before writing.
> Thanks for the hint. Anyway, just using the simple code reported to  
> convert embl to fasta format, the results are the same...I remember  
> you that I'm using a huge input file: the  
> uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!
>
>> It isn't clear to me from your code why it would be leaking memory
>> and causing a problem - is it possible that you have a huge sequence
>> in the EMBL file?
>> -jason
>
> At the end, I succeeded in the format conversion using this command:
>
> gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
>
> (Thanks to Riccardo Percudani). It's not bioperl...but it works!
>
> My best wishes,
> Stefano


As this shows, sometimes BioPerl isn't always the best answer (I know,  
blasphemy...).  As Jason suggested it's quite likely there are large  
sequence records causing your problems when using BioPerl.  The one- 
liner works b/c it doesn't retain data (sequence, annotation, etc) in  
memory as Bio::Seq object; it's a direct conversion.

It would be nice to code up a lazy sequence object and related  
parsers; maybe for the next dev release.

chris


From cjfields at uiuc.edu  Wed Dec 19 12:08:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:08:31 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
Message-ID: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>


On Dec 19, 2007, at 10:44 AM, Dave Messina wrote:

> Hi everyone,
>
>
> Perl 5.10 builds fine and passes all tests on my PB G4 running OS X  
> 10.5.1. Piece o' cake.
>
> Here are results of testing BioPerl on this virgin install:
>
> I downloaded the latest CVS tarball. I did 'perl Build.PL', which  
> used CPAN to install a bunch of dependencies. I then did 'Build  
> test'. For the most part everything was fine.
>
> - Bio::Biblio::IO::medlinexml throws an exception because  
> XML::Parser isn't installed.

XML::Parser used to be shipped with a number of perl distros even  
though it isn't core.  We should add a require to these.

> - RNA_SearchIO fails a few tests.

These are very likely from recent commits I made re:GenericHSP and use  
of bits(), raw_score(), etc. (the fails look like missing/switched  
vals with these method tests).  I'll fix these post-svn migration, but  
I don't think these are related to 5.10.

> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception  
> because Graph::Directed isn't installed.

Odd, that should be caught out before tests are run.  Needs to be  
fixed, but one would think this would fail as well under 5.8.

> - Spidey fails one test.

Passes for me.  Is it dependency-related?

> And of course without the optional dependencies installed, many  
> tests were skipped.
>
> I'll now go back and install the optional dependencies and do the  
> network tests, but it looks like for the most part we play nice with  
> the new Perl.
>
> Dave

Not sure, but it seems a bit faster.  Maybe it's just me but it would  
be nice to see some benchmarks comparing perl 5.8 vs 5.10.  I agree,  
it was a very fast and easy install.

I'll start a page on the wiki for test fails using perl 5.10.  I'm  
seeing a few fails;  I'm getting the following with everything  
installed (including DBD::mysql, DBI, etc) using perl 5.10, Mac OS X  
10.5.1 (note Test::Harness now gives TODO's, so some of these are  
actually passing).  Note the entrezgene.t and DB.t fails; I looked  
into these and I think they are related to the odd 'pseudohashes are  
deprecated' warnings we were getting in perl 5.8 tests, so there may  
be something legitimately buggy.

Test Summary Report
-------------------
t/Annotation.t                (Wstat: 0 Tests: 112 Failed: 0)
   TODO passed:   96
t/BioGraphics.t               (Wstat: 256 Tests: 35 Failed: 1)
   Failed test number(s):  4
   Non-zero exit status: 1
t/DB.t                        (Wstat: 65280 Tests: 106 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 116 tests but ran 106.
t/DBCUTG.t                    (Wstat: 1024 Tests: 33 Failed: 4)
   Failed test number(s):  29-31, 33
   Non-zero exit status: 4
t/RNA_SearchIO.t              (Wstat: 2048 Tests: 496 Failed: 8)
   Failed test number(s):  291, 338, 372-374, 395, 455, 486
   Non-zero exit status: 8
t/entrezgene.t                (Wstat: 65280 Tests: 648 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 1422 tests but ran 648.
Files=255, Tests=15066, 435 wallclock secs ( 3.15 usr  1.72 sys +  
124.87 cusr 13.29 csys = 143.03 CPU)
Result: FAIL
Failed 5/255 test programs. 13/15066 subtests failed.


chris


From David.Messina at sbc.su.se  Wed Dec 19 12:49:32 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 11:49:32 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
Message-ID: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>

>
> XML::Parser used to be shipped with a number of perl distros even
> though it isn't core.  We should add a require to these.


Agreed.


> - RNA_SearchIO fails a few tests.
>
> These are very likely from recent commits I made re:GenericHSP and use
> of bits(), raw_score(), etc. (the fails look like missing/switched
> vals with these method tests).  I'll fix these post-svn migration, but
> I don't think these are related to 5.10.


Agreed -- I doubt this is 5.10-specific.


> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
> > because Graph::Directed isn't installed.
>
> Odd, that should be caught out before tests are run.  Needs to be
> fixed, but one would think this would fail as well under 5.8.


Yep, and in a minute here I'll test it under 5.8.


> > - Spidey fails one test.
>
> Passes for me.  Is it dependency-related?


I don't think so, but I guess we'll see once I finish installing the
dependencies. Here's what I got:

t/Spidey........................ok 1/26 Can't call method "sub_SeqFeature"
on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
# Looks like you planned 26 tests but only ran 3.
# Looks like your test died just after 3.
t/Spidey........................dubious

        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 4-26
        Failed 23/26 tests, 11.54% okay


Dave


From cjfields at uiuc.edu  Wed Dec 19 14:19:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 13:19:10 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
Message-ID: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>

Just updated from CVS and reran tests, Spidey.t is failing now.  This  
may be from a recent commit:

http://lists.open-bio.org/pipermail/bioperl-guts-l/2007-December/026854.html

I'm updating the following page on the wiki for tracking.  There are a  
few more we should look into at some point:

http://www.bioperl.org/w/index.php?title=Bioperl_and_Perl_5.10

chris

On Dec 19, 2007, at 11:49 AM, Dave Messina wrote:

>>
>> XML::Parser used to be shipped with a number of perl distros even
>> though it isn't core.  We should add a require to these.
>
>
> Agreed.
>
>
>> - RNA_SearchIO fails a few tests.
>>
>> These are very likely from recent commits I made re:GenericHSP and  
>> use
>> of bits(), raw_score(), etc. (the fails look like missing/switched
>> vals with these method tests).  I'll fix these post-svn migration,  
>> but
>> I don't think these are related to 5.10.
>
>
> Agreed -- I doubt this is 5.10-specific.
>
>
>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>> because Graph::Directed isn't installed.
>>
>> Odd, that should be caught out before tests are run.  Needs to be
>> fixed, but one would think this would fail as well under 5.8.
>
>
> Yep, and in a minute here I'll test it under 5.8.
>
>
>
>
>>> - Spidey fails one test.
>>
>> Passes for me.  Is it dependency-related?
>
>
> I don't think so, but I guess we'll see once I finish installing the
> dependencies. Here's what I got:
>
> t/Spidey........................ok 1/26 Can't call method  
> "sub_SeqFeature"
> on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
> # Looks like you planned 26 tests but only ran 3.
> # Looks like your test died just after 3.
> t/Spidey........................dubious
>
>        Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 4-26
>        Failed 23/26 tests, 11.54% okay
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Wed Dec 19 18:42:14 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 17:42:14 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
Message-ID: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>

Hi Chris and everyone,

With most of the optional dependencies installed, I'm seeing  
essentially the same test failures, including the CODE ref thingy.  
I've noted this on the new Wiki page you created.

According to Data::Dumper's documentation,
Data::Dumper cheats with CODE references. If a code reference is  
encountered in the structure being processed (and if you haven't set  
theDeparse flag), an anonymous subroutine that contains the string  
'"DUMMY"' will be inserted in its place, and a warning will be printed  
if Purity is set. You can eval the result, but bear in mind that the  
anonymous sub that gets created is just a placeholder. Someday, perl  
will have a switch to cache-on-demand the string representation of a  
compiled piece of code, I hope. If you have prior knowledge of all the  
code refs that your data structures are likely to have, you can use  
the Seen method to pre-seed the internal reference table and make the  
dumped output point to them, instead. See EXAMPLES above.


So it's not BioPerl per se, but we can probably work around it.


>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>> because Graph::Directed isn't installed.
>>>
>>> Odd, that should be caught out before tests are run.  Needs to be
>>> fixed, but one would think this would fail as well under 5.8.
>>
>>
>> Yep, and in a minute here I'll test it under 5.8.


Strangely, the Ontology tests properly get skipped under 5.8.

Dave


From ki.baik at roche.com  Wed Dec 19 19:58:42 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 19 Dec 2007 16:58:42 -0800
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>

Hello,

 
I'm interested in parsing the output of the CAP contig assembly program
into a format that is more manageable. The CAP output is shown below:

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

            ____________________________________________________________

consensus   CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

Seq2+       ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

            ____________________________________________________________

consensus   ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

Seq2+       CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

            ____________________________________________________________

consensus   CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

Seq2+       TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC

            ____________________________________________________________

consensus   TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG

 
I would like to maintain the alignment with their base positions for
each sequence. A fasta format retaining the alignment position is ideal
such as below:

 
>Seq1+

CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

>Seq2+

------------------------------------------------------------

ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC--------

 
Does anyone have any experience doing this?

 
Regards,

 
KB


From cjfields at uiuc.edu  Wed Dec 19 20:41:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 19:41:51 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
Message-ID: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>

On Dec 19, 2007, at 5:42 PM, Dave Messina wrote:

> Hi Chris and everyone,
>
> With most of the optional dependencies installed, I'm seeing  
> essentially the same test failures, including the CODE ref thingy.  
> I've noted this on the new Wiki page you created.
>
> According to Data::Dumper's documentation,
> Data::Dumper cheats with CODE references. If a code reference is  
> encountered in the structure being processed (and if you haven't set  
> theDeparse flag), an anonymous subroutine that contains the string  
> '"DUMMY"' will be inserted in its place, and a warning will be  
> printed if Purity is set. You can eval the result, but bear in mind  
> that the anonymous sub that gets created is just a placeholder.  
> Someday, perl will have a switch to cache-on-demand the string  
> representation of a compiled piece of code, I hope. If you have  
> prior knowledge of all the code refs that your data structures are  
> likely to have, you can use the Seen method to pre-seed the internal  
> reference table and make the dumped output point to them, instead.  
> See EXAMPLES above.
>
>
> So it's not BioPerl per se, but we can probably work around it.

May be something in Module::Build or Build.PL that needs tweaking.

It looks like EntrezGene parsing is broken for now using perl 5.10;  
the 'pseudohash' warnings with perl 5.8 were indicating something was  
amiss but we could never place it.  Any fixes will have to wait until  
after svn migration.  Not sure what's going on with the others fails  
just yet.

>>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>>> because Graph::Directed isn't installed.
>>>>
>>>> Odd, that should be caught out before tests are run.  Needs to be
>>>> fixed, but one would think this would fail as well under 5.8.
>>>
>>>
>>> Yep, and in a minute here I'll test it under 5.8.
>
>
> Strangely, the Ontology tests properly get skipped under 5.8.
>
> Dave

May be worth looking into.  Have you added it to the wiki?

chris


From David.Messina at sbc.su.se  Wed Dec 19 23:52:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 22:52:16 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
	<980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
Message-ID: <628aabb70712192052p5d9afe3bvf4fa1da872f56355@mail.gmail.com>

>
> May be something in Module::Build or Build.PL that needs tweaking.


I took a quick look-see and I'm pretty sure it's Module::Build.
Specifically, Module::Build::Base::write_config(), where there are three
calls with coderefs as parameters to _write_data() to match the three
coderef errors we are seeing at the end of 'perl Build.PL'.

_write_data() in turn calls Module::Build::Dumper::_data_dump() and uses
some ugly Data::Dumper voodoo to serialize.

I don't understand the voodoo well enough to explain why this appears only
with Perl 5.10, though; it sure looks like it should have with 5.8, too.


> Strangely, the Ontology tests properly get skipped under 5.8.
>
> May be worth looking into.  Have you added it to the wiki?


Uhhh, yeah...of course! (just now)

Should be a simple fix after the post-svn thaw.

Dave


From David.Messina at sbc.su.se  Thu Dec 20 00:39:41 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 23:39:41 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
Message-ID: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>

Hi Ki,

Hopefully someone who (unlike me) uses these modules regularly will chime
in, but in the meantime, here are some ideas:

The Bio::AssemblyIO module can read and write ace files, which CAP3 can
produce as output. I don't think there is an explicit means to dump to a
multi-fasta file like you want.

But you could probably write a Bio::AssemblyIO::Fasta class which could
write the multi-Fasta format you want. Then you could use Bio::AssemblyIO
objects to read in ace files from CAP3 and write out to multi-fasta.

Look at

Bio::AssemblyIO::*
Bio::Assembly::ScaffoldI
Bio::Assembly::Contig
Bio::LocatableSeq
Bio::AlignIO

Assemblies are made of scaffolds, scaffolds are made of contigs, and contigs
are made of sequences which can be manipulated like any old seq in BioPerl.
Bio::AlignIO can read and write multiple sequence alignments and
multi-fastas, so that should help you to get from AssemblyIO to your desired
output format.


Hope this helps,
Dave


From mike.thon at gmail.com  Thu Dec 20 00:59:06 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Thu, 20 Dec 2007 06:59:06 +0100
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>


On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:

> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');

This is just for the sake of curiosity, since you already found a  
solution to your problem, but I wonder how perl will handle a file  
opened this way.  Will it try to suck the whole thing into ram in one  
go?

Mike


From cjfields at uiuc.edu  Thu Dec 20 00:54:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 23:54:36 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
	<628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
Message-ID: <EB4F110F-9F12-4478-89C2-5DDF4FEF07C6@uiuc.edu>


On Dec 19, 2007, at 11:39 PM, Dave Messina wrote:

> Hi Ki,
>
> Hopefully someone who (unlike me) uses these modules regularly will  
> chime
> in, but in the meantime, here are some ideas:
>
> The Bio::AssemblyIO module can read and write ace files, which CAP3  
> can
> produce as output. I don't think there is an explicit means to dump  
> to a
> multi-fasta file like you want.
>
> But you could probably write a Bio::AssemblyIO::Fasta class which  
> could
> write the multi-Fasta format you want. Then you could use  
> Bio::AssemblyIO
> objects to read in ace files from CAP3 and write out to multi-fasta.
>
> Look at
>
> Bio::AssemblyIO::*
> Bio::Assembly::ScaffoldI
> Bio::Assembly::Contig
> Bio::LocatableSeq
> Bio::AlignIO
>
> Assemblies are made of scaffolds, scaffolds are made of contigs, and  
> contigs
> are made of sequences which can be manipulated like any old seq in  
> BioPerl.
> Bio::AlignIO can read and write multiple sequence alignments and
> multi-fastas, so that should help you to get from AssemblyIO to your  
> desired
> output format.
>
>
>
> Hope this helps,
> Dave

What would help is to make Bio::Assembly::Contig implement Bio::AlignI  
correctly, or make it a subclass of Bio::SimpleAlign.  That way one  
could read in Scaffolds in via Bio::Assembly::IO and write out Contigs  
through Bio::AlignIO directly.  In theory that should work but IIRC it  
doesn't.

chris


From jason at bioperl.org  Thu Dec 20 02:13:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Dec 2007 23:13:55 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
Message-ID: <02EC6D6D-F807-492F-B125-9FE0393B1FD9@bioperl.org>

It gets buffered via the OS -- Bio::Root::IO calls next_line  
iteratively, but eventually the whole sequence object will get put  
into RAM as it is built up.
zcat or bzcat can also be used for gzipped and bzipped files  
respectively, I like to use this where I want to disk space footprint  
down.

Because we treat data input usually as from a stream ignoring whether  
it is in a file or not, we have to have a more flexible structure to  
really handle this, although I'd argue the data really belongs in a  
database when it is too big for memory.
More compact Feature/Location objects would probably also help here.   
I would not be surprised if the memory requirement has more to do  
with the number of features than length of the sequence - human chrom  
1 can fit into memory just fine on most machines with 2GB of RAM.

But it would require someone taking an interest in some re- 
architecting here.

-jason

On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:

>
> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>
>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
>> format => 'EMBL');
>
> This is just for the sake of curiosity, since you already found a  
> solution to your problem, but I wonder how perl will handle a file  
> opened this way.  Will it try to suck the whole thing into ram in  
> one go?
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ste.ghi at libero.it  Thu Dec 20 08:57:54 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 20 Dec 2007 14:57:54 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>

I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want...

> It gets buffered via the OS -- Bio::Root::IO calls next_line  
> iteratively, but eventually the whole sequence object will get put  
> into RAM as it is built up.
> zcat or bzcat can also be used for gzipped and bzipped files  
> respectively, I like to use this where I want to disk space footprint  
> down.
> 
> Because we treat data input usually as from a stream ignoring whether  
> it is in a file or not, we have to have a more flexible structure to  
> really handle this, although I'd argue the data really belongs in a  
> database when it is too big for memory.
> More compact Feature/Location objects would probably also help here.   
> I would not be surprised if the memory requirement has more to do  
> with the number of features than length of the sequence - human chrom  
> 1 can fit into memory just fine on most machines with 2GB of RAM.
> 
> But it would require someone taking an interest in some re- 
> architecting here.
> 
> -jason
> 
> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
> 
> >
> > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
> >
> >> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> >> format => 'EMBL');
> >
> > This is just for the sake of curiosity, since you already found a  
> > solution to your problem, but I wonder how perl will handle a file  
> > opened this way.  Will it try to suck the whole thing into ram in  
> > one go?
> >
> > Mike
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From amackey at pcbi.upenn.edu  Thu Dec 20 10:32:19 2007
From: amackey at pcbi.upenn.edu (Aaron Mackey)
Date: Thu, 20 Dec 2007 10:32:19 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <476A7736.109@toulouse.inra.fr>
References: <476A7736.109@toulouse.inra.fr>
Message-ID: <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com>

The NHX writer will only add the [&&NHX] block when there are tags to
be written.  Your code reads in a Newick tree without tags, and then
writes it back out without adding any new tags.  So yes, you need to
1) read the Newick tree, 2) traverse the tree, calling
$node->nhx_tag({T => $taxon_id}) for each node with each corresponding
$taxon_id, and then 3) write out the NHX tree.

-Aaron

On Dec 20, 2007 9:07 AM, Laurence Amilhat
<Laurence.Amilhat at toulouse.inra.fr> wrote:
> Dear Mr MacKey,
>
>
> I am pretty new in Tree parsing and writing with BioPerl.
> I am trying to convert a Newick tree file to a NHX tree file with adding
> the Taxid for the node in the NHX tree file.
>
> I saw the module Bio::Tree::NodeNHX, but very few examples...
>
> I don't know where do i need to start, I tried the easy way with
> Bio::TreeIO,
> but the resulting tree doesn't have the [&&NHX] in the internal node,
> and I don't know how to add the tag [&&NHX:T=xxxx] on the node,
> Do I need to use the nhx_tag method to do this?
>
> Maybe you have an example that use NHX tag in tree node, that might be
> very helpfull for me to get to understand how it works...
>
>
> Have a nice holidays,
>
>
> Best regards,
>
>
> Laurence Amilhat.
>
>
>
>
> This is the simple code that I use to convert a tree from  newick to nhx:
>
> use Bio::TreeIO;
> use Getopt::Long;
> my $tree_file;
> my $outfile;
>
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile);
>
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>
> while (my $tree= $treeio->next_tree)
> {
>    $treeout->write_tree($tree);
> }
>
> --
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan         =
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
>
>
>
>


From cjfields at uiuc.edu  Thu Dec 20 11:14:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 10:14:55 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>

As Jason mentioned, it may be the number of features in the record if  
the record itself is huge (i.e. human chromosome-sized, full  
metagenome, etc).  If (my) memory serves correctly the mem. footprint  
for a perl object is ~10x the actual data, give or take (it depends on  
the complexity of the object itself).  In cases like this indexing may  
not fix the problem, unless you have an object which retains the file  
position of the data instead of the data itself; I don't think we have  
this object type in BioPerl.

The only way I can think of to fix this would be (as Jason also  
suggested) lightweight objects, or something like the lazy sequence  
object ala the SwissKnife suite (which only bring what you want into  
memory).

Related to that, I have been testing something like that, which uses  
iterators to pass in chunks of data from a stream to handlers to build  
a sequence object.  Wouldn't be too hard to reconfigure that to return  
file positions as well.  Maybe for the 1.7 release...

chris

On Dec 20, 2007, at 7:57 AM, Stefano Ghignone wrote:

> I was wandering if, working with so big FILE, should be better first  
> index the database, than query it formatting the sequences as one  
> want...
>
>> It gets buffered via the OS -- Bio::Root::IO calls next_line
>> iteratively, but eventually the whole sequence object will get put
>> into RAM as it is built up.
>> zcat or bzcat can also be used for gzipped and bzipped files
>> respectively, I like to use this where I want to disk space footprint
>> down.
>>
>> Because we treat data input usually as from a stream ignoring whether
>> it is in a file or not, we have to have a more flexible structure to
>> really handle this, although I'd argue the data really belongs in a
>> database when it is too big for memory.
>> More compact Feature/Location objects would probably also help here.
>> I would not be surprised if the memory requirement has more to do
>> with the number of features than length of the sequence - human chrom
>> 1 can fit into memory just fine on most machines with 2GB of RAM.
>>
>> But it would require someone taking an interest in some re-
>> architecting here.
>>
>> -jason
>>
>> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>>
>>>
>>> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>>>
>>>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
>>>> format => 'EMBL');
>>>
>>> This is just for the sake of curiosity, since you already found a
>>> solution to your problem, but I wonder how perl will handle a file
>>> opened this way.  Will it try to suck the whole thing into ram in
>>> one go?
>>>
>>> Mike
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Thu Dec 20 11:26:17 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 20 Dec 2007 10:26:17 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <628aabb70712200826p36d3d451wdcd901f555bc210a@mail.gmail.com>

On 12/20/07, Stefano Ghignone <ste.ghi at libero.it> wrote:
>
> I was wandering if, working with so big FILE, should be better first index
> the database, than query it formatting the sequences as one want...
>

Agreed, but only if you want to randomly access sequences within the file. I
believe the original poster intends to do something with every sequence in
the big file, in which case streaming the file is likely to be much faster.


Dave


From akarger at CGR.Harvard.edu  Thu Dec 20 11:48:58 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 11:48:58 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> 
> 
> On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:
> 
> > At the end, I succeeded in the format conversion using this command:
> >
> > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
> >
> > (Thanks to Riccardo Percudani). It's not bioperl...but it works!
> 
> 
> As this shows, sometimes BioPerl isn't always the best answer 
> (I know,  
> blasphemy...).  As Jason suggested it's quite likely there are large  
> sequence records causing your problems when using BioPerl.  The one- 
> liner works b/c it doesn't retain data (sequence, annotation, 
> etc) in  
> memory as Bio::Seq object; it's a direct conversion.
> 
> It would be nice to code up a lazy sequence object and related  
> parsers; maybe for the next dev release.

Yes!

Also, BLAST parsing. Blasting the proteome against the genome makes for
rather large result files. Right now, if you want to delete queries that
hit, say, more than 1000 times, you still need to wait for Bioperl to
create objects and sub-objects for every single hit. Sadly, this example
isn't hypothetical. I'm going to solve it with something like:

perl -wne 'BEGIN {$/="TBLASTN"} print if length($_) < $some_big_value'
big_blast > filtered_blast

(Not that I'm volunteering to help with the parser writing, so I should
stop complaining.)

-Amir


From bix at sendu.me.uk  Thu Dec 20 12:06:28 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 17:06:28 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
Message-ID: <476AA114.2060201@sendu.me.uk>

Chris Fields wrote:
> The only way I can think of to fix this would be (as Jason also 
> suggested) lightweight objects, or something like the lazy sequence 
> object ala the SwissKnife suite (which only bring what you want into 
> memory).
> 
> Related to that, I have been testing something like that, which uses 
> iterators to pass in chunks of data from a stream to handlers to build a 
> sequence object.  Wouldn't be too hard to reconfigure that to return 
> file positions as well.  Maybe for the 1.7 release...

Bio::PullParserI is your friend.


From bix at sendu.me.uk  Thu Dec 20 13:48:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 18:48:29 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
Message-ID: <476AB8FD.8090108@sendu.me.uk>

Amir Karger wrote:
>> It would be nice to code up a lazy sequence object and related  
>> parsers; maybe for the next dev release.
> 
> Yes!
> 
> Also, BLAST parsing. Blasting the proteome against the genome makes for
> rather large result files.

This has already been done. Use Bio::SearchIO::blast_pull. In a 
situation like yours I dropped run time from 20223s to
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40%
less).


From akarger at CGR.Harvard.edu  Thu Dec 20 13:52:51 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 13:52:51 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AB8FD.8090108@sendu.me.uk>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>

> Amir Karger wrote:
> >> It would be nice to code up a lazy sequence object and related  
> >> parsers; maybe for the next dev release.
> > 
> > Also, BLAST parsing. Blasting the proteome against the 
> genome makes for
> > rather large result files.
> 
> This has already been done. Use Bio::SearchIO::blast_pull. In a 
> situation like yours I dropped run time from 20223s to
> 951s (~20x faster) and memory usage from over 8GB to less 
> than 5GB (~40%
> less).

Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
can put in my own perl lib for this, or does it require large bunches of
new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
here, but I don't see our whole center using CVS Bioperl.

-Amir


From cjfields at uiuc.edu  Thu Dec 20 15:27:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:27:45 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AA114.2060201@sendu.me.uk>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
	<476AA114.2060201@sendu.me.uk>
Message-ID: <29E190AB-8A6C-4F1C-BDD1-6034CFFEEFFF@uiuc.edu>

On Dec 20, 2007, at 11:06 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> The only way I can think of to fix this would be (as Jason also  
>> suggested) lightweight objects, or something like the lazy sequence  
>> object ala the SwissKnife suite (which only bring what you want  
>> into memory).
>> Related to that, I have been testing something like that, which  
>> uses iterators to pass in chunks of data from a stream to handlers  
>> to build a sequence object.  Wouldn't be too hard to reconfigure  
>> that to return file positions as well.  Maybe for the 1.7 release...
>
> Bio::PullParserI is your friend.

I'm looking into that, yes.  I'm thinking of something like a generic  
lazy sequence class with an embedded Handler/PullParser object which  
processes stuff on the fly.

Oh, when I have a bit more time...

chris


From cjfields at uiuc.edu  Thu Dec 20 15:39:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:39:48 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <2EC6A1C2-FBC9-45F6-AD1B-040E29FAFA28@uiuc.edu>


On Dec 20, 2007, at 12:52 PM, Amir Karger wrote:

>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related
>>>> parsers; maybe for the next dev release.
>>>
>>> Also, BLAST parsing. Blasting the proteome against the
>> genome makes for
>>> rather large result files.
>>
>> This has already been done. Use Bio::SearchIO::blast_pull. In a
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less
>> than 5GB (~40%
>> less).
>
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large  
> bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.
>
> -Amir

It's in CVS.

Just to note: there have been a lot of changes between 1.5.1 and  
1.5.2, and probably as many from 1.5.2 to now.  We are cleaning up  
some code introduced prior to the 1.5 release and working on other  
fixes and code docs, with the final aim to be a new 1.6; I'm hoping  
that release will have routine point releases for bug fixes.  Of  
course that'll have to wait until after SVN migration!

There a few discussions on the list about speeding up parsing using  
lightweight/featherweight objects or even straight hashes (for  
instance, Jason has a lightweight seqfeature implementation committed  
on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser  
implementations).  My feeling is that will be part of the next dev  
release, along with GFF3 integration and code cleanup.

chris


From bix at sendu.me.uk  Thu Dec 20 18:29:30 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 23:29:30 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <476AFADA.20604@sendu.me.uk>

Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related  
>>>> parsers; maybe for the next dev release.
>>> Also, BLAST parsing. Blasting the proteome against the 
>>> genome makes for rather large result files.
>> This has already been done. Use Bio::SearchIO::blast_pull. In a 
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less 
>> than 5GB (~40% less).
> 
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.

blast_pull is only in CVS (and needs a whole bunch of associated modules 
to work), though 1.5.2 also contains significant improvements to 
SearchIO generally which should provide you with significant speed 
improvements during blast parsing with the normal Bio::SearchIO::blast.


From abdul.sattar4 at ntlworld.com  Thu Dec 20 19:32:06 2007
From: abdul.sattar4 at ntlworld.com (Abdul Sattar)
Date: Fri, 21 Dec 2007 00:32:06 -0000
Subject: [Bioperl-l]  bioperl-db & biperl version
Message-ID: <000001c84368$ee7872b0$c5836351@owner00d4289a7>

BFG-0DRTGO0EEGREWTYU


From DGroskreutz at twt.com  Fri Dec 21 02:01:27 2007
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Fri, 21 Dec 2007 01:01:27 -0600
Subject: [Bioperl-l] Groskreutz, Deb is out of the office.
Message-ID: <OF1CBDB887.820A02D2-ON862573B8.002695BD-862573B8.002695BD@twt.com>


I will be out of the office starting  12/20/2007 and will not return until
01/01/2008.

I will respond to your message when I return on January 2nd, 2008


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From bug-bioperl at rt.cpan.org  Fri Dec 21 07:07:39 2007
From: bug-bioperl at rt.cpan.org (Brandi Cantarel via RT)
Date: Fri, 21 Dec 2007 07:07:39 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25638-1198238855-470.31796-4-0@rt.cpan.org>


Fri Dec 21 07:07:30 2007: Request 31796 was acted upon.
Transaction: Ticket created by brandi.cantarel at afmb.univ-mrs.fr
       Queue: bioperl
     Subject: SeqIO
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: brandi.cantarel at afmb.univ-mrs.fr
      Status: new
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >


I might have found a bug in SeqIO in bioperl.  Well it is actually a  
memory leak.  When I try to load large file, I can step through the  
first 10K or so sequences (using next_seq) but then it just hangs.....

If this bug is fixed please let me know.

Brandi Cantarel

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug-bioperl at rt.cpan.org  Fri Dec 21 08:57:20 2007
From: bug-bioperl at rt.cpan.org (Sendu Bala via RT)
Date: Fri, 21 Dec 2007 08:57:20 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25615-1198245436-879.31796-5-0@rt.cpan.org>


       Queue: bioperl
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >

On Fri Dec 21 07:07:30 2007, brandi.cantarel at afmb.univ-mrs.fr wrote:
> I might have found a bug in SeqIO in bioperl.  Well it is actually a  
> memory leak.  When I try to load large file, I can step through the  
> first 10K or so sequences (using next_seq) but then it just hangs.....
> 
> If this bug is fixed please let me know.

Please use http://bugzilla.bioperl.org/ to tell us about this bug. 
After creating a bug report you'll be able to attach the script in 
which you encounter the problem, which we need to diagnose this issue.


From susantoroy at gmail.com  Sat Dec 22 07:06:42 2007
From: susantoroy at gmail.com (Susanta Roy)
Date: Sat, 22 Dec 2007 17:36:42 +0530
Subject: [Bioperl-l] Enquiry about bioperl project
Message-ID: <236a58340712220406m3d3f9884h8f7b5e58bdfb356@mail.gmail.com>

Dear Sir,


Most humbly I have to state that I am Susanta Roy, 25 years and I have
done  my masters in bioinformatics. I have more than  nine months of work
experience as Associate Technical Content  Developer. I have also worked
in the journal "Bioinformatics  India" (The first bioinformatics journal
of India, now "Bioinformatics Trends"). My work with  previous employer
was highly appreciated.

This year I have founded Bioexplore, a bioinformatics KPO (Knowledge
Process Outsourcing) due to lack of bioinformatics jobs in India.

Our services include

1. Bioinformatics data mining / programming
2. HR solution
3. Technical writing solution
4. E-learning
5. Abstracing & indexing
6. Business promotion solution

I want to inquire if you can give me a project.

-- Looking forward to your reply.

Kind Regards
Mr. Susanta Roy, MS Bioinformatics
Founder Director
Bioexplore
C-5, Hazipark Market
Dimapur, Nagaland - 797112
India
+ 91 - 9811517324 (Mobile)
susanta.roy at bioexplore.co.in
susantoroy at gmail.com


From alan.bridge at isb-sib.ch  Sun Dec  2 13:29:48 2007
From: alan.bridge at isb-sib.ch (Alan Bridge)
Date: Sun, 02 Dec 2007 19:29:48 +0100
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
Message-ID: <4752F99C.9050504@isb-sib.ch>

Hello,

I was just wondering if, when performing a RemoteBlast, it would be 
possible to specify the entire UniProt database (i.e. Swiss-Prot + 
TrEMBL), or even just TrEMBL.

It seems that currently, you can only specify Swiss-Prot (the annotated 
portion of UniProt, which is much smaller than its automatically 
annotated counterpart, TrEMBL). Any hints on how to expand the search 
space to include TrEMBL would be really appreciated.

Regards, Alan Bridge

            my $prog = 'blastp';
            my $db   = 'swissprot'; # use TrEMBL ?
            my $e_val= '1e-10';

            my @params = ( '-prog' => $prog, '-data' => $db, '-expect' 
=> $e_val, '-readmethod' => 'SearchIO' );

-- 
Alan Bridge PhD
Swiss-Prot annotator
Swiss Institute of Bioinformatics (SIB)
1, rue Michel Servet
CH-1211 Geneva 4  
Switzerland   

Tel: (+41 22) 379 58 90
Fax: (+41 22) 379 58 58 

http://www.expasy.org/ 


From avilella at gmail.com  Mon Dec  3 06:39:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 3 Dec 2007 11:39:59 +0000
Subject: [Bioperl-l] Query about SLAC.pm module
In-Reply-To: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
References: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com>

[CCing to the bioperl ml]

Sorry, there were some bits left in the pod header referring to PAML
objects that aren't quite true.
I've updated now the PODs. The Hyphy executions return hashes:

If you run the SLAC test in t/Hyphy.t you will se that the $results
are something like:

DB<3> x 2 $results
0  HASH(0x8df3110)
   'E[NS Sites]' => ARRAY(0x8e6cff4)
   'E[S Sites]' => ARRAY(0x8e6ceb0)
   'Observed NS Changes' => ARRAY(0x8e7b380)
   'Observed S Changes' => ARRAY(0x8e7b344)
   'Observed S. Prop.' => ARRAY(0x8e6d018)
   'P{S geq. observed}' => ARRAY(0x8e6d360)
   'P{S leq. observed}' => ARRAY(0x8e6d33c)
   'P{S}' => ARRAY(0x8e6d03c)
   'Scaled dN-dS' => ARRAY(0x8e6d384)
   'dN' => ARRAY(0x8e6d084)
   'dN-dS' => ARRAY(0x8e6d0a8)
   'dS' => ARRAY(0x8e6d060)
  DB<4> x $rc

which correspond to the csv file that hyphy produces.

Cheers,

    Albert.

On Dec 3, 2007 10:04 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Dear Dr. Vilella,
>
> Please allow me to introduce myself. My name is Johan Nilsson and I am a
> postdoctoral researcher in bioinformatics.
>
> I was  planning to perform a large-scale analysis for positively selected
> protein coding genes using any appropriate method from the Hyphy package,
> and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very
> useful for this.
>
> IF I interpreted the documents of e.g. the SLAC module correctly, running
> $slac->run($aln,$tree) will return a
> Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results
> from the obtained hashref (running my script on the test files provided
> with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains
> that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed
> reference').
>
> I am fairly new to bioperl, so please appologise if this question was a
> stupid one :)
>
> Thanks in advance!
>
> Yours Sincerely
> /Johan
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>


From cjfields at uiuc.edu  Mon Dec  3 09:04:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 08:04:06 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
In-Reply-To: <4752F99C.9050504@isb-sib.ch>
References: <4752F99C.9050504@isb-sib.ch>
Message-ID: <CF967851-5E6C-448A-87C6-CC3F63A5D9AD@uiuc.edu>

You are limited to the databases hosted on the NCBI server, so it's  
really up to them; RemoteBlast is an interface to NCBI's WebBlast  
using URLAPI.

A list of current databases can be found here:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote:

> Hello,
>
> I was just wondering if, when performing a RemoteBlast, it would be
> possible to specify the entire UniProt database (i.e. Swiss-Prot +
> TrEMBL), or even just TrEMBL.
>
> It seems that currently, you can only specify Swiss-Prot (the  
> annotated
> portion of UniProt, which is much smaller than its automatically
> annotated counterpart, TrEMBL). Any hints on how to expand the search
> space to include TrEMBL would be really appreciated.
>
> Regards, Alan Bridge
>
>            my $prog = 'blastp';
>            my $db   = 'swissprot'; # use TrEMBL ?
>            my $e_val= '1e-10';
>
>            my @params = ( '-prog' => $prog, '-data' => $db, '-expect'
> => $e_val, '-readmethod' => 'SearchIO' );
>
> -- 
> Alan Bridge PhD
> Swiss-Prot annotator
> Swiss Institute of Bioinformatics (SIB)
> 1, rue Michel Servet
> CH-1211 Geneva 4
> Switzerland
>
> Tel: (+41 22) 379 58 90
> Fax: (+41 22) 379 58 58
>
> http://www.expasy.org/


From bioperl at boekhoff.info  Mon Dec  3 14:14:24 2007
From: bioperl at boekhoff.info (Sven Boekhoff)
Date: Mon, 03 Dec 2007 20:14:24 +0100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST
	reload
Message-ID: <47545590.1000703@boekhoff.info>

HI!
I just started working with Perl and BioPerl. I'm quite impressed what 
can be easily done with this module. Today I found that my second CPU 
ist not used, but the first one run's at 100%. I tried to include the 
"-a"-parameter, but I was not successful:

my @params = (
	-database => 'my_db',
	-a => '2',
	-outfile => 'blast1.out'
);

How do I have to use it?

Second question: In my perlscript I start BLAST-searches in a loop. 
Everytime BLAST has finished its search, the memory is cleared and BLAST 
is started again. I think most of the time is used to reload the 
database. Is it somehow possible to keep the database loaded (e.g. by 
starting a second search) or is BLAST reloaded anyway?

Thanks for your help!

Sven


www.boekhoff.info


From bix at sendu.me.uk  Mon Dec  3 19:05:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 00:05:23 +0000
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
 BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <475499C3.20801@sendu.me.uk>

Sven Boekhoff wrote:
> HI!
> I just started working with Perl and BioPerl. I'm quite impressed what 
> can be easily done with this module. Today I found that my second CPU 
> ist not used, but the first one run's at 100%. I tried to include the 
> "-a"-parameter, but I was not successful:
> 
> my @params = (
> 	-database => 'my_db',
> 	-a => '2',
> 	-outfile => 'blast1.out'
> );
> 
> How do I have to use it?

This should work in the CVS version of StandAloneBlast. In other 
versions, perhaps try using $object->a(2);


> Second question: In my perlscript I start BLAST-searches in a loop. 
> Everytime BLAST has finished its search, the memory is cleared and BLAST 
> is started again. I think most of the time is used to reload the 
> database. Is it somehow possible to keep the database loaded (e.g. by 
> starting a second search) or is BLAST reloaded anyway?

I hope someone will correct me for being wrong, but I think you'd have 
to that with a 2-way pipe. StandAloneBlast only uses output to a file 
and input from that file, finishing with the executable inbetween. I've 
thought about improving it with a 2-way pipe, but never got around to 
it, being apprehensive about stability on all platforms.

The more obvious solution, which may be possible depending on exactly 
what you're doing, is to avoid the loop and just supply Blast all your 
input in one go.


From Russell.Smithies at agresearch.co.nz  Mon Dec  3 19:49:21 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 13:49:21 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>

Hi all,

It' trying to read .ace files but keep getting an error that I don't
know the cause of.
Really basic example code:

	#!/usr/local/bin/perl -w

	use lib "/data/home/smithiesr/bioperl-live";
	use Bio::Assembly::IO;
	use Data::Dumper;

	$ace = "CLP0001001240-cE15_20030319.ace";

	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
	$assembly = $io->next_assembly;

	foreach $contig ($assembly->all_contigs) {
      		print Dumper $contig;
	}

Gives this error;
	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
	Can't call method "get_consensus_sequence" on an undefined value
at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
<GEN0> line 42.

Which relates to this bit in ace.pm:
	# Loading contig qualities... (Base Quality field)
	/^BQ/ && do {
	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();

Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace
or is the Contig object not getting created?
Any ideas?

Thanx,

Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Mon Dec  3 21:15:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 20:15:58 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>

This seems similar to the 'too many open filehandles issue' documented  
here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

It unfortunately is due to having an open DB_File for every contig,  
and is a problem with the Bio::Assembly implementation that isn't  
easily fixed.  Changing the open filehandle limit using ulimit is the  
only known fix:

ulimit -n 10000

chris

On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:

> Hi all,
>
> It' trying to read .ace files but keep getting an error that I don't
> know the cause of.
> Really basic example code:
>
> 	#!/usr/local/bin/perl -w
>
> 	use lib "/data/home/smithiesr/bioperl-live";
> 	use Bio::Assembly::IO;
> 	use Data::Dumper;
>
> 	$ace = "CLP0001001240-cE15_20030319.ace";
>
> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> 	$assembly = $io->next_assembly;
>
> 	foreach $contig ($assembly->all_contigs) {
>      		print Dumper $contig;
> 	}
>
> Gives this error;
> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> 	Can't call method "get_consensus_sequence" on an undefined value
> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
> <GEN0> line 42.
>
> Which relates to this bit in ace.pm:
> 	# Loading contig qualities... (Base Quality field)
> 	/^BQ/ && do {
> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>
> Is this caused by a dud ace file or a problem with  
> Bio::Assembly::IO:ace
> or is the Contig object not getting created?
> Any ideas?
>
> Thanx,
>
> Russell Smithies
>
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From florent.angly at gmail.com  Mon Dec  3 21:25:24 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 03 Dec 2007 18:25:24 -0800
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <4754BA94.7090600@gmail.com>

Would this issue cause an excessive memory usage? Because I was getting 
a high memory usage when parsing some TIGR Assembler files and was 
wondering if the tigr parser was responsible for that or the parent 
assembly IO module.
I'd definitely be interested in a fix of the Bio::Assembly 
implementation if it's the assembly IO module's fault....
Florent

Chris Fields wrote:
> This seems similar to the 'too many open filehandles issue' documented  
> here:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>
> It unfortunately is due to having an open DB_File for every contig,  
> and is a problem with the Bio::Assembly implementation that isn't  
> easily fixed.  Changing the open filehandle limit using ulimit is the  
> only known fix:
>
> ulimit -n 10000
>
> chris
>
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>
>   
>> Hi all,
>>
>> It' trying to read .ace files but keep getting an error that I don't
>> know the cause of.
>> Really basic example code:
>>
>> 	#!/usr/local/bin/perl -w
>>
>> 	use lib "/data/home/smithiesr/bioperl-live";
>> 	use Bio::Assembly::IO;
>> 	use Data::Dumper;
>>
>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>
>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>> 	$assembly = $io->next_assembly;
>>
>> 	foreach $contig ($assembly->all_contigs) {
>>      		print Dumper $contig;
>> 	}
>>
>> Gives this error;
>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>> 	Can't call method "get_consensus_sequence" on an undefined value
>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
>> <GEN0> line 42.
>>
>> Which relates to this bit in ace.pm:
>> 	# Loading contig qualities... (Base Quality field)
>> 	/^BQ/ && do {
>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>
>> Is this caused by a dud ace file or a problem with  
>> Bio::Assembly::IO:ace
>> or is the Contig object not getting created?
>> Any ideas?
>>
>> Thanx,
>>
>> Russell Smithies
>>
>> Bioinformatics Software Developer
>> T +64 3 489 9085
>> E  russell.smithies at agresearch.co.nz
>>
>> Invermay  Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T  +64 3 489 3809
>> F  +64 3 489 9174
>> www.agresearch.co.nz
>>
>> = 
>> ======================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use of,  
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> = 
>> ======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From Russell.Smithies at agresearch.co.nz  Mon Dec  3 21:32:43 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 15:32:43 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>

Thanx Chris,
I'm only writing a simple .ace viewer to display assembled contigs in a
Bio::Graphics::Panel so I'll parse the coords from the .ace files
"manually".
Unless anyone else has a better idea ?
(and some example code ;-)

Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, 4 December 2007 3:16 p.m.
> To: Smithies, Russell
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
> 
> This seems similar to the 'too many open filehandles issue' documented
> here:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
> 
> It unfortunately is due to having an open DB_File for every contig,
> and is a problem with the Bio::Assembly implementation that isn't
> easily fixed.  Changing the open filehandle limit using ulimit is the
> only known fix:
> 
> ulimit -n 10000
> 
> chris
> 
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
> 
> > Hi all,
> >
> > It' trying to read .ace files but keep getting an error that I don't
> > know the cause of.
> > Really basic example code:
> >
> > 	#!/usr/local/bin/perl -w
> >
> > 	use lib "/data/home/smithiesr/bioperl-live";
> > 	use Bio::Assembly::IO;
> > 	use Data::Dumper;
> >
> > 	$ace = "CLP0001001240-cE15_20030319.ace";
> >
> > 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> > 	$assembly = $io->next_assembly;
> >
> > 	foreach $contig ($assembly->all_contigs) {
> >      		print Dumper $contig;
> > 	}
> >
> > Gives this error;
> > 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> > 	Can't call method "get_consensus_sequence" on an undefined value
> > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line
170,
> > <GEN0> line 42.
> >
> > Which relates to this bit in ace.pm:
> > 	# Loading contig qualities... (Base Quality field)
> > 	/^BQ/ && do {
> > 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
> >
> > Is this caused by a dud ace file or a problem with
> > Bio::Assembly::IO:ace
> > or is the Contig object not getting created?
> > Any ideas?
> >
> > Thanx,
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> > =
> >
> =============================================================
> =========
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> > =
> >
> =============================================================
> =========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Dec  4 00:10:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:10:57 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <4754BA94.7090600@gmail.com>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<4754BA94.7090600@gmail.com>
Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu>

Yes, it's possible this would cause memory issues as each  
Bio::Assembly::Contig instance would have a  
Bio::SeqFeature::Collection attached (each Collection having a tied DB  
hash, which would be an open filehandle),  So if you had over 1000  
contigs open at any one time (in a parsed scaffold, for instance) you  
would have 1000 open file handles.  Not very efficient.

My thought was to have each Bio::Assembly::Scaffold instance carry a  
single Bio::SeqFeature::CollectionI (it could be a  
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other  
CollectionI, whatever's easiest).  Each Contig would be passed (and  
store) a reference to the Scaffold SF::Collection and pull features  
from there; just haven't had time to mess with it.  I don't think  
anyone's tackling it, so feel free to code away!

chris

On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:

> Would this issue cause an excessive memory usage? Because I was  
> getting a high memory usage when parsing some TIGR Assembler files  
> and was wondering if the tigr parser was responsible for that or the  
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly  
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'  
>> documented  here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every  
>> contig,  and is a problem with the Bio::Assembly implementation  
>> that isn't  easily fixed.  Changing the open filehandle limit using  
>> ulimit is the  only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> 	#!/usr/local/bin/perl -w
>>>
>>> 	use lib "/data/home/smithiesr/bioperl-live";
>>> 	use Bio::Assembly::IO;
>>> 	use Data::Dumper;
>>>
>>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> 	$assembly = $io->next_assembly;
>>>
>>> 	foreach $contig ($assembly->all_contigs) {
>>>   		print Dumper $contig;
>>> 	}
>>>
>>> Gives this error;
>>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> 	Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line  
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> 	# Loading contig qualities... (Base Quality field)
>>> 	/^BQ/ && do {
>>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with   
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>> Attention: The information contained in this message and/or   
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or   
>>> privileged
>>> material. Any review, retransmission, dissemination or other use  
>>> of,  or
>>> taking of any action in reliance upon, this information by persons  
>>> or
>>> entities other than the intended recipients is prohibited by   
>>> AgResearch
>>> Limited. If you have received this message in error, please notify  
>>> the
>>> sender immediately.
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec  4 00:20:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:20:07 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
Message-ID: <C48EC1AC-FEA6-4F60-9791-D4DE449768C2@uiuc.edu>

The ulimit fix usually works but if this is for Gbrowse it probably  
isn't prudent.  It would be nice to get Bio::Assembly working as an  
Bio::AlignI; it would be easier to manipulate for display.  Here's a  
script I wrote up as an example:

http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics

chris

On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote:

> Thanx Chris,
> I'm only writing a simple .ace viewer to display assembled contigs  
> in a
> Bio::Graphics::Panel so I'll parse the coords from the .ace files
> "manually".
> Unless anyone else has a better idea ?
> (and some example code ;-)
>
> Russell


From avilella at gmail.com  Tue Dec  4 06:51:05 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 11:51:05 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the
	SLR program
Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>

Hi all,

There is a new wrapper in bioperl-run for SLR:

http://www.bioperl.org/wiki/SLR

Right now, output parsing is very simple, and I have only tested it on
my linux machine.
Can someone with a Mac give it a try?

update your bioperl-run to cvs head, then:

# try the installer, SLR is option 6
perl scripts/bioperl_application_installer.PLS
# then try to run the tests (should take about a minute)
perl t/SLR.t

Any comments on the code would be appreciated,

Thanks in advance,

Cheers,

    Albert.


From captainrave at hotmail.com  Tue Dec  4 06:04:57 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST)
Subject: [Bioperl-l]  extracting CDS location from Genbank
Message-ID: <14148723.post@talk.nabble.com>


Help.  I'm very new to perl and bioperl.  Basically I need to extract the
location of each CDS in a genbank entry e.g.103...120 and export them to an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 09:48:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 14:48:27 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14148723.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>

>From the SeqIO howto:

#!/bin/perl

use strict;
use Bio::SeqIO;

my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

>From the Feature HOWTO:

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {             
      print "  tag: ", $tag, "\n";             
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";             
      }          
   }       
}

Surely you could have fouind that yourself? ;0 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 11:05
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] extracting CDS location from Genbank


Help.  I'm very new to perl and bioperl.  Basically I need to extract
the
location of each CDS in a genbank entry e.g.103...120 and export them to
an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From captainrave at hotmail.com  Tue Dec  4 10:07:19 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152264.post@talk.nabble.com>


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example, how
do I tell it which input file to read? I know this might sound stupid, but I
dont understand the Biowiki very well!

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:21:34 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:21:34 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>

Post the script that produces that error, and your file's location 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:07
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example,
how
do I tell it which input file to read? I know this might sound stupid,
but I
dont understand the Biowiki very well!

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Tue Dec  4 10:39:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 15:39:31 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <475574B3.8050700@sendu.me.uk>

Captainrave wrote:
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------

The best way to get help is to give us your script and the error 
message, and the command you used to run your script. The less you know, 
the more you should give us (ie. don't edit anything out).


From captainrave at hotmail.com  Tue Dec  4 10:41:37 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152907.post@talk.nabble.com>


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 10:53:22 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:53:22 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk>

Same script as below, but try:

my $file = 'C:\path\to\my\filename.gbk'; 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:42
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For
example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
>
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Dec  4 11:20:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Dec 2007 10:20:34 -0600
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <C2732712-D32B-449A-8BCA-DCB8BBDE9758@uiuc.edu>

The 'my $file = shift;' is a perl idiom.  The built-in 'shift' used  
implicitly in this way uses @ARGV (from command line); the file would  
the be passed as the first arg when running the script:

get_features.pl myfile.gb

This should work for any OS.  Personally, I use something like the  
following to indicate how the script is used in case a file is never  
entered:

my $USAGE = <<END_USE;
USAGE: get_features.pl <file>
Perl script to grab features from a GenBank file and print to a table
END_USE

my $file = shift || die $USAGE;

chris

On Dec 4, 2007, at 9:41 AM, Captainrave wrote:

>
> #!/bin/perl
>
> use strict;
> use Bio::SeqIO;
> my $file = shift; # get the file name, somehow
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>   print "primary tag: ", $feat_object->primary_tag, "\n";
>   for my $tag ($feat_object->get_all_tags) {
>      print "  tag: ", $tag, "\n";
>      for my $value ($feat_object->get_tag_values($tag)) {
>
>         print "    value: ", $value, "\n";
>      }
>   }
> }
>
> exit;
>
> The file is on the same folder.  But how do I tell it to use this  
> file?
>
>
>
> michael watson (IAH-C) wrote:
>>
>> Post the script that produces that error, and your file's location
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>> Captainrave
>> Sent: 04 December 2007 15:07
>> To: Bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
>>
>>
>> Yes but actually implementing it is another story.
>>
>> I get an error:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: file argument provided, but with an undefined value
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
>> STACK: test3.pl:7
>> -----------------------------------------------------------
>>
>> Basically because I dont understand the code well enough.  For  
>> example,
>> how
>> do I tell it which input file to read? I know this might sound  
>> stupid,
>> but I
>> dont understand the Biowiki very well!
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
>> l#a14152264
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Dec  4 11:22:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 16:22:12 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>	<14152264.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <47557EB4.10003@sendu.me.uk>

Captainrave wrote:
> #!/bin/perl
> my $file = shift; # get the file name, somehow
>
> The file is on the same folder.  But how do I tell it to use this file?

http://stein.cshl.org/genome_informatics/perl_intro/command_line.html

Basically, when you run your script add the name of the file to your 
command line.

me% perl myscript.pl myfile

By saying 'my $file = shift' inside myscript.pl, the variable $file now 
contains the filename 'myfile'.

You could also have hardcoded the filename:
my $file = 'myfile';


Anyway, you're going to run into lots of these issues, and they're 
beyond the scope of this mailing list. For basic perl problems seek help 
via www.perl.org. When you have a BioPerl-specific question, don't 
hesitate to post here.


From jason at bioperl.org  Tue Dec  4 12:16:30 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 09:16:30 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>

Excellent - thanks for this !  I'm giving it whirl on linux and the  
SLR.t test is currently taking more than 30 minutes to run -- is it  
possible to cook up an example that is going to finish in a more  
reasonable amount of time?

Also - I would prefer if the default exe could be 'Slr' rather than  
Slr_Linux_static - it seems like it is possible for users to install  
it this way.  Similarly whether or not the Slr_osx or Slr is the  
default name, is it too big of a deal to expect the user to rename it?

I'll give it a whirl on OSX later, but might be easier if the test  
runs shorter.

Thanks!
-jason
On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:

> Hi all,
>
> There is a new wrapper in bioperl-run for SLR:
>
> http://www.bioperl.org/wiki/SLR
>
> Right now, output parsing is very simple, and I have only tested it on
> my linux machine.
> Can someone with a Mac give it a try?
>
> update your bioperl-run to cvs head, then:
>
> # try the installer, SLR is option 6
> perl scripts/bioperl_application_installer.PLS
> # then try to run the tests (should take about a minute)
> perl t/SLR.t
>
> Any comments on the code would be appreciated,
>
> Thanks in advance,
>
> Cheers,
>
>     Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Dec  4 13:17:08 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 04 Dec 2007 10:17:08 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <475599A4.1040500@gmail.com>

Hi all,
I pushed a new module into bioperl-run CVS a few days ago. It's called 
Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an 
open-source software that assembles DNA sequences.
Input is a list of sequence objects and output assembly objects... easy 
enough...
Let me know if you experience problems with it.
Florent


From jason at bioperl.org  Tue Dec  4 13:51:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 10:51:34 -0800
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com>

You can pass in an array reference of sequences instead of a single sequence
object and the module will build a multi-FASTA database.  You can also pass
in a filename instead of a Sequence object and the file can be an already
built multi-FASTA database.  This is described in the documentation:

http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall

You can also just run BLAST without StandAloneBlast part as I do an just
build your multifile ahead of time with SeqIO and do
# wublast
my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |";
# or NCBI blast
# my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |";
my $fh;

open($fh, $cmd)
my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh);

The advantage of StandAloneBlast in theory is it takes care of the temporary
file creation (sequncefiles) and cleanup.  Personally I find I want easier
access to my programs that are simple cmdline like this.  You can do similar
things withe SSEARCH or FASTA searching too.

-jason

On Dec 3, 2007 4:05 PM, Sendu Bala <bix at sendu.me.uk> wrote:

> Sven Boekhoff wrote:
> > HI!
> > I just started working with Perl and BioPerl. I'm quite impressed what
> > can be easily done with this module. Today I found that my second CPU
> > ist not used, but the first one run's at 100%. I tried to include the
> > "-a"-parameter, but I was not successful:
> >
> > my @params = (
> >       -database => 'my_db',
> >       -a => '2',
> >       -outfile => 'blast1.out'
> > );
> >
> > How do I have to use it?
>
> This should work in the CVS version of StandAloneBlast. In other
> versions, perhaps try using $object->a(2);
>
>
> > Second question: In my perlscript I start BLAST-searches in a loop.
> > Everytime BLAST has finished its search, the memory is cleared and BLAST
> > is started again. I think most of the time is used to reload the
> > database. Is it somehow possible to keep the database loaded (e.g. by
> > starting a second search) or is BLAST reloaded anyway?
>
> I hope someone will correct me for being wrong, but I think you'd have
> to that with a 2-way pipe. StandAloneBlast only uses output to a file
> and input from that file, finishing with the executable inbetween. I've
> thought about improving it with a 2-way pipe, but never got around to
> it, being apprehensive about stability on all platforms.
>
> The more obvious solution, which may be possible depending on exactly
> what you're doing, is to avoid the loop and just supply Blast all your
> input in one go.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From stefan.kirov at bms.com  Tue Dec  4 14:25:21 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 14:25:21 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
Message-ID: <4755A9A1.2040608@bms.com>

Jason Stajich wrote:
> PAML4 breaks our PAML parser right now because the order of things in  
> the result file has changed.  Now sequences precede the information  
> about the version or the program run.  This means that $result- 
>  >get_seqs() fails because we don't parse the sequences.
>
> We'll see what we can do, but as usual with supporting 3rd party  
> programs it is brittle when file formats change.  Th
>
> -jason
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
Jason,
I saw a commit after this post on codeml, but not on PAML.pm- I assume
this is not fixed, am I correct?
Thanks!
Stefan


From avilella at gmail.com  Tue Dec  4 15:34:38 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:34:38 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>

hmmm, 30 minutes is quite a lot... it takes much less for me:

avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
1..7
ok 1 - use Bio::Root::IO;
ok 2 - use Bio::Tools::Run::Phylo::SLR;
ok 3 - use Bio::AlignIO;
ok 4 - use Bio::TreeIO;
ok 5
ok 6
ok 7

real    0m21.517s
user    0m20.717s
sys     0m0.100s


On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> Excellent - thanks for this !  I'm giving it whirl on linux and the
> SLR.t test is currently taking more than 30 minutes to run -- is it
> possible to cook up an example that is going to finish in a more
> reasonable amount of time?
>
> Also - I would prefer if the default exe could be 'Slr' rather than
> Slr_Linux_static - it seems like it is possible for users to install
> it this way.  Similarly whether or not the Slr_osx or Slr is the
> default name, is it too big of a deal to expect the user to rename it?
>
> I'll give it a whirl on OSX later, but might be easier if the test
> runs shorter.
>
> Thanks!
> -jason
>
> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>
> > Hi all,
> >
> > There is a new wrapper in bioperl-run for SLR:
> >
> > http://www.bioperl.org/wiki/SLR
> >
> > Right now, output parsing is very simple, and I have only tested it on
> > my linux machine.
> > Can someone with a Mac give it a try?
> >
> > update your bioperl-run to cvs head, then:
> >
> > # try the installer, SLR is option 6
> > perl scripts/bioperl_application_installer.PLS
> > # then try to run the tests (should take about a minute)
> > perl t/SLR.t
> >
> > Any comments on the code would be appreciated,
> >
> > Thanks in advance,
> >
> > Cheers,
> >
> >     Albert.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From avilella at gmail.com  Tue Dec  4 15:39:26 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:39:26 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>

oh, I forgot to mention: SLR uses the lapack and blas libraries if
installed, which makes it a lot faster (according to the author)...
maybe that's the reason...

On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
> hmmm, 30 minutes is quite a lot... it takes much less for me:
>
> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
> 1..7
> ok 1 - use Bio::Root::IO;
> ok 2 - use Bio::Tools::Run::Phylo::SLR;
> ok 3 - use Bio::AlignIO;
> ok 4 - use Bio::TreeIO;
> ok 5
> ok 6
> ok 7
>
> real    0m21.517s
> user    0m20.717s
> sys     0m0.100s
>
>
>
> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> > Excellent - thanks for this !  I'm giving it whirl on linux and the
> > SLR.t test is currently taking more than 30 minutes to run -- is it
> > possible to cook up an example that is going to finish in a more
> > reasonable amount of time?
> >
> > Also - I would prefer if the default exe could be 'Slr' rather than
> > Slr_Linux_static - it seems like it is possible for users to install
> > it this way.  Similarly whether or not the Slr_osx or Slr is the
> > default name, is it too big of a deal to expect the user to rename it?
> >
> > I'll give it a whirl on OSX later, but might be easier if the test
> > runs shorter.
> >
> > Thanks!
> > -jason
> >
> > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
> >
> > > Hi all,
> > >
> > > There is a new wrapper in bioperl-run for SLR:
> > >
> > > http://www.bioperl.org/wiki/SLR
> > >
> > > Right now, output parsing is very simple, and I have only tested it on
> > > my linux machine.
> > > Can someone with a Mac give it a try?
> > >
> > > update your bioperl-run to cvs head, then:
> > >
> > > # try the installer, SLR is option 6
> > > perl scripts/bioperl_application_installer.PLS
> > > # then try to run the tests (should take about a minute)
> > > perl t/SLR.t
> > >
> > > Any comments on the code would be appreciated,
> > >
> > > Thanks in advance,
> > >
> > > Cheers,
> > >
> > >     Albert.
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>


From jason at bioperl.org  Tue Dec  4 16:43:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:43:03 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
	<358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org>

My own icc compiled version seemed to have caused the problem.  
whoops. fixed that.
-jason
On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote:

> oh, I forgot to mention: SLR uses the lapack and blas libraries if
> installed, which makes it a lot faster (according to the author)...
> maybe that's the reason...
>
> On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
>> hmmm, 30 minutes is quite a lot... it takes much less for me:
>>
>> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
>> 1..7
>> ok 1 - use Bio::Root::IO;
>> ok 2 - use Bio::Tools::Run::Phylo::SLR;
>> ok 3 - use Bio::AlignIO;
>> ok 4 - use Bio::TreeIO;
>> ok 5
>> ok 6
>> ok 7
>>
>> real    0m21.517s
>> user    0m20.717s
>> sys     0m0.100s
>>
>>
>>
>> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> Excellent - thanks for this !  I'm giving it whirl on linux and the
>>> SLR.t test is currently taking more than 30 minutes to run -- is it
>>> possible to cook up an example that is going to finish in a more
>>> reasonable amount of time?
>>>
>>> Also - I would prefer if the default exe could be 'Slr' rather than
>>> Slr_Linux_static - it seems like it is possible for users to install
>>> it this way.  Similarly whether or not the Slr_osx or Slr is the
>>> default name, is it too big of a deal to expect the user to  
>>> rename it?
>>>
>>> I'll give it a whirl on OSX later, but might be easier if the test
>>> runs shorter.
>>>
>>> Thanks!
>>> -jason
>>>
>>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is a new wrapper in bioperl-run for SLR:
>>>>
>>>> http://www.bioperl.org/wiki/SLR
>>>>
>>>> Right now, output parsing is very simple, and I have only tested  
>>>> it on
>>>> my linux machine.
>>>> Can someone with a Mac give it a try?
>>>>
>>>> update your bioperl-run to cvs head, then:
>>>>
>>>> # try the installer, SLR is option 6
>>>> perl scripts/bioperl_application_installer.PLS
>>>> # then try to run the tests (should take about a minute)
>>>> perl t/SLR.t
>>>>
>>>> Any comments on the code would be appreciated,
>>>>
>>>> Thanks in advance,
>>>>
>>>> Cheers,
>>>>
>>>>     Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>


From stefan.kirov at bms.com  Tue Dec  4 16:51:51 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 16:51:51 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4755CBF7.5010709@bms.com>

Jason Stajich wrote:
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
Yes, this is the version I have and in some cases the sequences do not
get parsed. I have missed this commit. I will try to assemble a testcase
and send it. Cannot promise when but will try to do it tomorrow. My gut
feeling so far is that the parser works whenever there are gaps in the
alignment, otherwise it does not. PAML surely has very peculiar format.
Thanks again!
Stefan
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From jason at bioperl.org  Tue Dec  4 16:36:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:36:09 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4755A9A1.2040608@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
Message-ID: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>

should be fixed.

$ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
revision 1.56
date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
Parsing PAML4 and PAML3.15 should work now.  Dealing with variable  
order for the sequences and summary results in
the top of the MLC files

On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:

> Jason Stajich wrote:
>> PAML4 breaks our PAML parser right now because the order of things in
>> the result file has changed.  Now sequences precede the information
>> about the version or the program run.  This means that $result-
>>> get_seqs() fails because we don't parse the sequences.
>>
>> We'll see what we can do, but as usual with supporting 3rd party
>> programs it is brittle when file formats change.  Th
>>
>> -jason
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> Jason,
> I saw a commit after this post on codeml, but not on PAML.pm- I assume
> this is not fixed, am I correct?
> Thanks!
> Stefan


From johan.nilsson at sh.se  Wed Dec  5 06:35:58 2007
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Wed, 5 Dec 2007 12:35:58 +0100
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
Message-ID: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>


Hello,

I have a bunch of multiple sequence alignments of protein coding genes,
which I would like to analyse with the SLAC method of the HyPhy package. I
tried using the SLAC.pm module in bioperl-run, but I could not get it to
work properly.

Basically, for each MSA file, I create the Bio::Tree::Tree and
Bio::SimpleAlign objects ($tree and $aln, respectively) required as
arguments to SLAC, and call the method with: "($rc,$result) =
$slac->run($aln,$tree)" in a loop procedure in my script.

When I choose not to save the tmp files (the default option in SLAC.pm),
the program complains that it cannot find the file
"$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
(which works fine). Apparently, it looks for the wrapper.bf file in the
first tmp dir created, which is deleted in the end of the first SLAC call.

If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
all calls to SLAC give returncode 1, and no error message is received.
However, when I look at the resulting $result hashref, it turns out that
all results are for the FIRST alignment read. I've made sure there is
nothing strange with my loop procedure, and I checked that the tree and
alignment objects look OK for each MSA. Apparently, it does create new
"results.tsv" files in the tmp directory after each run, but it is
identical each time it's created. Also, it only creates ONE tmp directory,
no matter how many times SLAC is executed (I would imagine it was supposed
to save each result in separate tmp dirs?)

Thus, it seems to me like the errors occur because something goes wrong in
the creation of temporary files. Have I done something wrong here, or have
any other of you experienced the same problem?

Best regards
/Johan


--
Johan Nilsson, Ph.D.
School of Life Sciences
S?dert?rns University College
S-141 89 Huddinge, Sweden
E-mail: johan.nilsson at sh.se
Phone: +46 8 608 47 05, +46 70 456 10 51


From bernd.web at gmail.com  Wed Dec  5 08:10:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 5 Dec 2007 14:10:04 +0100
Subject: [Bioperl-l] SimpleAlign is_flush
Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>

Hi,

SimpleAlign has an is_flush:
 Function  : Tells you whether the alignment is flush, i.e. all of the
same length
 Returns   : 1 or 0

I  noticed that a file with multiple fasta sequences with different
lengths has an is_flush  value of 1. Printing the "alignment" shows
that sequences are appended with "-" so that the all are the same
length. Does this mean that is_flush for alignments read in via
AlignIO is indeed always true and thus as such a so useful ?

(using bioperl version: 1.005002102)


Regards,
Bernd


From cjfields at uiuc.edu  Wed Dec  5 08:53:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 07:53:59 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>

Yes; it's a convenient way to make sure all seqs have the same length  
(including gaps).  Nice for checking when adding new seqs to an  
alignment or building new parsers.

chris

On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:

> Hi,
>
> SimpleAlign has an is_flush:
> Function  : Tells you whether the alignment is flush, i.e. all of the
> same length
> Returns   : 1 or 0
>
> I  noticed that a file with multiple fasta sequences with different
> lengths has an is_flush  value of 1. Printing the "alignment" shows
> that sequences are appended with "-" so that the all are the same
> length. Does this mean that is_flush for alignments read in via
> AlignIO is indeed always true and thus as such a so useful ?
>
> (using bioperl version: 1.005002102)
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From captainrave at hotmail.com  Wed Dec  5 07:37:02 2007
From: captainrave at hotmail.com (Captainrave)
Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <475574B3.8050700@sendu.me.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk>
Message-ID: <14170499.post@talk.nabble.com>


Thanks, it works great now.

Do any of you know if there is a tag to pull out CDS location. i.e. the
values such as 132...145 etc?  Those are all I need.  Also, is there anyway
to stop it reporting tag and value and literally JUST output the value?

Thanks!!!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From stefan.kirov at bms.com  Wed Dec  5 09:24:20 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:24:20 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4756B494.7020100@bms.com>

Jason,
When there is a gapless alignment we have a differently formatted output
from codeml:
kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc

seed used = 492211105
      3    141

ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA

And parsing this fails...
The next one has gaps and works fine:

kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc

seed used = 492252697

Before deleting alignment gaps
      2    162

ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
CTT GGT TCA GGA GGT CAG TTC CTG
ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
CCT GGT ACA GGA AAC AAG CTT CTG

I will send both whole files as an attachment with another mail (I do
not know if these are going to pass through).
My guess is that the whole _parse_summary method has to be re-worked as
there is no tag to look for before the sequences start. Ugly.
I am not sure what else could become broken if I try to fix it, so I
will leave it to you.
Stefan
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From stefan.kirov at bms.com  Wed Dec  5 09:35:23 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:35:23 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4756B72B.6000103@bms.com>

Here are the files.
Stefan
Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc.tar.gz
Type: application/x-gzip
Size: 3237 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment-0003.gz>

From aaron.j.mackey at gsk.com  Wed Dec  5 09:56:31 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 5 Dec 2007 09:56:31 -0500
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>
Message-ID: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>

Well, if you use AlignIO::fasta to read in a multi-fasta file of 
*unaligned* sequences, AlignIO::fasta makes the assumption that all of 
your sequences are aligned, and pads the ends of shorter sequences with 
gap characters (essentially, enforcing a rather silly, yet valid 
alignment).  The fact that is_flush() then returns 1 is secondary.

If you just want to read in an array of unaligned sequences, use 
SeqIO::fasta instead.  It doesn't really make much sense to use AlignIO 
for sequences that are not aligned ... conversely, if you *do* have 
aligned sequences in a multi-fasta file, then it does make sense to use 
AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences 
with gaps as necessary to get a fully valid, flush multiple sequence 
alignment matrix.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:

> Yes; it's a convenient way to make sure all seqs have the same length 
> (including gaps).  Nice for checking when adding new seqs to an 
> alignment or building new parsers.
> 
> chris
> 
> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
> 
> > Hi,
> >
> > SimpleAlign has an is_flush:
> > Function  : Tells you whether the alignment is flush, i.e. all of the
> > same length
> > Returns   : 1 or 0
> >
> > I  noticed that a file with multiple fasta sequences with different
> > lengths has an is_flush  value of 1. Printing the "alignment" shows
> > that sequences are appended with "-" so that the all are the same
> > length. Does this mean that is_flush for alignments read in via
> > AlignIO is indeed always true and thus as such a so useful ?
> >
> > (using bioperl version: 1.005002102)
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Dec  5 11:22:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 10:22:01 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
References: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
Message-ID: <EC064917-220F-4579-8FA9-934026D7D105@uiuc.edu>

That's true.  I assumed Bernd's seqs were aligned.

chris

On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote:

> Well, if you use AlignIO::fasta to read in a multi-fasta file of
> *unaligned* sequences, AlignIO::fasta makes the assumption that all of
> your sequences are aligned, and pads the ends of shorter sequences  
> with
> gap characters (essentially, enforcing a rather silly, yet valid
> alignment).  The fact that is_flush() then returns 1 is secondary.
>
> If you just want to read in an array of unaligned sequences, use
> SeqIO::fasta instead.  It doesn't really make much sense to use  
> AlignIO
> for sequences that are not aligned ... conversely, if you *do* have
> aligned sequences in a multi-fasta file, then it does make sense to  
> use
> AlignIO, and it also makes sense for AlignIO::fasta to end-pad  
> sequences
> with gaps as necessary to get a fully valid, flush multiple sequence
> alignment matrix.
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:
>
>> Yes; it's a convenient way to make sure all seqs have the same length
>> (including gaps).  Nice for checking when adding new seqs to an
>> alignment or building new parsers.
>>
>> chris
>>
>> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
>>
>>> Hi,
>>>
>>> SimpleAlign has an is_flush:
>>> Function  : Tells you whether the alignment is flush, i.e. all of  
>>> the
>>> same length
>>> Returns   : 1 or 0
>>>
>>> I  noticed that a file with multiple fasta sequences with different
>>> lengths has an is_flush  value of 1. Printing the "alignment" shows
>>> that sequences are appended with "-" so that the all are the same
>>> length. Does this mean that is_flush for alignments read in via
>>> AlignIO is indeed always true and thus as such a so useful ?
>>>
>>> (using bioperl version: 1.005002102)
>>>
>>>
>>> Regards,
>>> Bernd
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Dec  5 14:56:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 14:56:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4757027F.407@bms.com>

Here is a patch that seems to be working and does not break the existing
tests:

--- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
10:16:53.120720000 -0500
+++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm  
2007-12-05 14:46:31.436278000 -0500
@@ -419,7 +419,10 @@
     # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
 
     my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
YN00 )x;
+    my $line;
+    $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0;
     while ($_ = $self->_readline) {
+           $line++;
        if ( m/^($SEQTYPES) \s+                      # seqtype: CODONML,
AAML, BASEML, CODON2AAML, YN00, etc
               (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
3.12 February 2002"; not present < 3.1 or YN00
               (\S+) \s*                             # tree filename
@@ -436,8 +439,11 @@
        } elsif (m/^Data set \d$/) {
            $self->{'_summary'} = {};
            $self->{'_summary'}->{'multidata'}++;
-       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
-           my ($phylip_header) = $self->_readline;
+       }
+       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
+               my ($phylip_header) = $self->_readline;
+               $self->_parse_seqs;
+       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap
            $self->_parse_seqs;
        }
     }
@@ -681,7 +687,6 @@
 }
 
 sub _parse_seqs {
-
     # this should in fact be packed into a Bio::SimpleAlign object
instead of
     # an array but we'll stay with this for now
     my ($self) = @_;


What this does is trigger sequence parsing if the /Before.../ pattern is
not seen until line 4. Since phylip_header seems to be doing nothing one
could completely eliminate the first seq parse elsif (even though
counting lines is not a good thing).
 Since I am not aware of all consequences of changing the sequence
parsing and I have no idea how extensive the tests are, I am not
committing anything, but feel free to use that if you wish.
Stefan

Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From jason at bioperl.org  Wed Dec  5 15:01:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Dec 2007 12:01:29 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4757027F.407@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>

sounds good - can you
- make it as a bug with the patch and sample files in bugzilla
- commit changes and I'll test as well

thanks,
-j

On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:

> Here is a patch that seems to be working and does not break the  
> existing
> tests:
>
> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
> 10:16:53.120720000 -0500
> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
> 2007-12-05 14:46:31.436278000 -0500
> @@ -419,7 +419,10 @@
>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>
>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
> YN00 )x;
> +    my $line;
> +    $self->{'_already_parsed_seqs'}=$self-> 
> {'_already_parsed_seqs'}?1:0;
>      while ($_ = $self->_readline) {
> +           $line++;
>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
> CODONML,
> AAML, BASEML, CODON2AAML, YN00, etc
>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
> 3.12 February 2002"; not present < 3.1 or YN00
>                (\S+) \s*                             # tree filename
> @@ -436,8 +439,11 @@
>         } elsif (m/^Data set \d$/) {
>             $self->{'_summary'} = {};
>             $self->{'_summary'}->{'multidata'}++;
> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
> -           my ($phylip_header) = $self->_readline;
> +       }
> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
> +               my ($phylip_header) = $self->_readline;
> +               $self->_parse_seqs;
> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
> {#No gap
>             $self->_parse_seqs;
>         }
>      }
> @@ -681,7 +687,6 @@
>  }
>
>  sub _parse_seqs {
> -
>      # this should in fact be packed into a Bio::SimpleAlign object
> instead of
>      # an array but we'll stay with this for now
>      my ($self) = @_;
>
>
> What this does is trigger sequence parsing if the /Before.../  
> pattern is
> not seen until line 4. Since phylip_header seems to be doing  
> nothing one
> could completely eliminate the first seq parse elsif (even though
> counting lines is not a good thing).
>  Since I am not aware of all consequences of changing the sequence
> parsing and I have no idea how extensive the tests are, I am not
> committing anything, but feel free to use that if you wish.
> Stefan
>
> Stefan Kirov wrote:
>> Jason,
>> When there is a gapless alignment we have a differently formatted  
>> output
>> from codeml:
>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>
>> seed used = 492211105
>>       3    141
>>
>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>> ACC CAC
>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGT CTG
>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>> ACC CAC
>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGC CTG
>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>> ACC CAC
>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>> AGC ATG
>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>> ACC CTC ATA
>>
>> And parsing this fails...
>> The next one has gaps and works fine:
>>
>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>
>> seed used = 492252697
>>
>> Before deleting alignment gaps
>>       2    162
>>
>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>> GCA GAA
>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>> CCG AAC
>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>> GAT CTC
>> CTT GGT TCA GGA GGT CAG TTC CTG
>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>> GCA GAA
>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>> CCA ACT
>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>> --- ATT
>> CCT GGT ACA GGA AAC AAG CTT CTG
>>
>> I will send both whole files as an attachment with another mail (I do
>> not know if these are going to pass through).
>> My guess is that the whole _parse_summary method has to be re- 
>> worked as
>> there is no tag to look for before the sequences start. Ugly.
>> I am not sure what else could become broken if I try to fix it, so I
>> will leave it to you.
>> Stefan
>>
>>> should be fixed.
>>>
>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>> revision 1.56
>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>> +21 -14
>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>> order for the sequences and summary results in
>>> the top of the MLC files
>>>
>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>
>>>
>>>> Jason Stajich wrote:
>>>>
>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>> things in
>>>>> the result file has changed.  Now sequences precede the  
>>>>> information
>>>>> about the version or the program run.  This means that $result-
>>>>>
>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>
>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>> programs it is brittle when file formats change.  Th
>>>>>
>>>>> -jason
>>>>>
>>>>> -- 
>>>>> Jason Stajich
>>>>> jason at bioperl.org
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> Jason,
>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>> assume
>>>> this is not fixed, am I correct?
>>>> Thanks!
>>>> Stefan
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From stefan.kirov at bms.com  Wed Dec  5 15:33:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 15:33:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
	<8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
Message-ID: <47570B2B.5090602@bms.com>

Done.

Jason Stajich wrote:
> sounds good - can you
> - make it as a bug with the patch and sample files in bugzilla
> - commit changes and I'll test as well
>
> thanks,
> -j
>
> On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:
>
>   
>> Here is a patch that seems to be working and does not break the  
>> existing
>> tests:
>>
>> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
>> 10:16:53.120720000 -0500
>> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
>> 2007-12-05 14:46:31.436278000 -0500
>> @@ -419,7 +419,10 @@
>>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>>
>>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
>> YN00 )x;
>> +    my $line;
>> +    $self->{'_already_parsed_seqs'}=$self-> 
>> {'_already_parsed_seqs'}?1:0;
>>      while ($_ = $self->_readline) {
>> +           $line++;
>>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
>> CODONML,
>> AAML, BASEML, CODON2AAML, YN00, etc
>>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
>> 3.12 February 2002"; not present < 3.1 or YN00
>>                (\S+) \s*                             # tree filename
>> @@ -436,8 +439,11 @@
>>         } elsif (m/^Data set \d$/) {
>>             $self->{'_summary'} = {};
>>             $self->{'_summary'}->{'multidata'}++;
>> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
>> -           my ($phylip_header) = $self->_readline;
>> +       }
>> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
>> +               my ($phylip_header) = $self->_readline;
>> +               $self->_parse_seqs;
>> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
>> {#No gap
>>             $self->_parse_seqs;
>>         }
>>      }
>> @@ -681,7 +687,6 @@
>>  }
>>
>>  sub _parse_seqs {
>> -
>>      # this should in fact be packed into a Bio::SimpleAlign object
>> instead of
>>      # an array but we'll stay with this for now
>>      my ($self) = @_;
>>
>>
>> What this does is trigger sequence parsing if the /Before.../  
>> pattern is
>> not seen until line 4. Since phylip_header seems to be doing  
>> nothing one
>> could completely eliminate the first seq parse elsif (even though
>> counting lines is not a good thing).
>>  Since I am not aware of all consequences of changing the sequence
>> parsing and I have no idea how extensive the tests are, I am not
>> committing anything, but feel free to use that if you wish.
>> Stefan
>>
>> Stefan Kirov wrote:
>>     
>>> Jason,
>>> When there is a gapless alignment we have a differently formatted  
>>> output
>>> from codeml:
>>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>>
>>> seed used = 492211105
>>>       3    141
>>>
>>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>>> ACC CAC
>>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGT CTG
>>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>>> ACC CAC
>>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGC CTG
>>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>>> ACC CAC
>>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>>> AGC ATG
>>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>>> ACC CTC ATA
>>>
>>> And parsing this fails...
>>> The next one has gaps and works fine:
>>>
>>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>>
>>> seed used = 492252697
>>>
>>> Before deleting alignment gaps
>>>       2    162
>>>
>>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>>> GCA GAA
>>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>>> CCG AAC
>>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>>> GAT CTC
>>> CTT GGT TCA GGA GGT CAG TTC CTG
>>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>>> GCA GAA
>>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>>> CCA ACT
>>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>>> --- ATT
>>> CCT GGT ACA GGA AAC AAG CTT CTG
>>>
>>> I will send both whole files as an attachment with another mail (I do
>>> not know if these are going to pass through).
>>> My guess is that the whole _parse_summary method has to be re- 
>>> worked as
>>> there is no tag to look for before the sequences start. Ugly.
>>> I am not sure what else could become broken if I try to fix it, so I
>>> will leave it to you.
>>> Stefan
>>>
>>>       
>>>> should be fixed.
>>>>
>>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>>> revision 1.56
>>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>>> +21 -14
>>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>>> order for the sequences and summary results in
>>>> the top of the MLC files
>>>>
>>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>>
>>>>
>>>>         
>>>>> Jason Stajich wrote:
>>>>>
>>>>>           
>>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>>> things in
>>>>>> the result file has changed.  Now sequences precede the  
>>>>>> information
>>>>>> about the version or the program run.  This means that $result-
>>>>>>
>>>>>>             
>>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>>
>>>>>>>               
>>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>>> programs it is brittle when file formats change.  Th
>>>>>>
>>>>>> -jason
>>>>>>
>>>>>> -- 
>>>>>> Jason Stajich
>>>>>> jason at bioperl.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Jason,
>>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>>> assume
>>>>> this is not fixed, am I correct?
>>>>> Thanks!
>>>>> Stefan
>>>>>
>>>>>           
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From bernd.web at gmail.com  Thu Dec  6 09:58:31 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 6 Dec 2007 15:58:31 +0100
Subject: [Bioperl-l] graphics - Panel
Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com>

Hi,

For map $segstart is available. This holds the left most start of the
feature (The left end of $ref displayed in the detailed view).
However, is it accessible also for track coderefs?
I'd like to access it in add_track, like
  -bgcolor => sub {
 				my $feature = shift;
                                my $start = $feature->segstart;			
                                 ....
                                 do something with the segstart
                                  },

I realize I can add a -tag which holds the left most start of by
segmented feature, and then get it out in from $feature, but I wonder
if the $segstart can also be accessed in the coderef some how.

Does someone know this?

Best regards,
Bernd


From georose at gmail.com  Thu Dec  6 10:28:24 2007
From: georose at gmail.com (geo rose)
Date: Thu, 6 Dec 2007 08:28:24 -0700
Subject: [Bioperl-l] getting sequences from external databank
Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>

Hi Bioperl,

In the past, I have been able to retrieve sequences from an external
databank, but my scripts are not working anymore.
I am afraid that I may have broken my Bioperl installation while updating my
Fedora7 machine with yum update.

Below is an example of what happens.

The script is from
http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and
it works.
(I used it on an older machine with Bioperl and MacOS Tiger)

__________________________________________________________________________________
#!/usr/bin/perl -w

use Bio::SeqIO;
use Bio::DB::GenBank;

$genBank = new Bio::DB::GenBank;  # This object knows how to talk to GenBank

my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by accession


my $seqOut = new Bio::SeqIO(-format => 'genbank');

$seqOut->write_seq($seq);


_________________________________________________________________________________________
This is the error I get
_________________________________________________________________________________________

[home at home Desktop]# perl final-seq-db-test1.pl
Bio::SeqIO: genbank cannot be found
Exception
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not
implemented in the version of perl at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.

STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Root::Root::_load_module
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
STACK: Bio::SeqIO::_load_format_module
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------

For more information about the SeqIO system please see the SeqIO docs.
This includes ways of checking for formats at compile time, not run time

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc AF060485 does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------
[home at home Desktop]# Use of uninitialized value in concatenation (.) or
string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm
line 30.

[home at home Desktop]#


________________________________________________________________________________________


Before I mess things up further I thought I'd ask:
Can I fix this problem by reinstalling some part of Bioperl or Perl?

Thanks,

George


From barry.moore at genetics.utah.edu  Thu Dec  6 12:56:50 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 6 Dec 2007 10:56:50 -0700
Subject: [Bioperl-l] getting sequences from external databank
In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
Message-ID: <B13872F3-4591-4FB6-B057-9215C5DA9059@genetics.utah.edu>

George,

This is a hideous little bug in Red Hat/Fedora installations of  
perl.  It's happened to me a couple time on upgrades, but it's always  
fixed with

perl -MCPAN -e shell
force install Scalar::Util

http://www.perlmonks.org/?node_id=460411

Barry

On Dec 6, 2007, at 8:28 AM, geo rose wrote:

> Hi Bioperl,
>
> In the past, I have been able to retrieve sequences from an external
> databank, but my scripts are not working anymore.
> I am afraid that I may have broken my Bioperl installation while  
> updating my
> Fedora7 machine with yum update.
>
> Below is an example of what happens.
>
> The script is from
> http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ 
> node2.html and
> it works.
> (I used it on an older machine with Bioperl and MacOS Tiger)
>
> ______________________________________________________________________ 
> ____________
> #!/usr/bin/perl -w
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $genBank = new Bio::DB::GenBank;  # This object knows how to talk  
> to GenBank
>
> my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by  
> accession
>
>
> my $seqOut = new Bio::SeqIO(-format => 'genbank');
>
> $seqOut->write_seq($seq);
>
>
> ______________________________________________________________________ 
> ___________________
> This is the error I get
> ______________________________________________________________________ 
> ___________________
>
> [home at home Desktop]# perl final-seq-db-test1.pl
> Bio::SeqIO: genbank cannot be found
> Exception
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are  
> not
> implemented in the version of perl at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Root::Root::_load_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
> STACK: Bio::SeqIO::_load_format_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
> STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
>
> For more information about the SeqIO system please see the SeqIO docs.
> This includes ways of checking for formats at compile time, not run  
> time
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc AF060485 does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
> [home at home Desktop]# Use of uninitialized value in concatenation  
> (.) or
> string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ 
> Util.pm
> line 30.
>
> [home at home Desktop]#
>
>
> ______________________________________________________________________ 
> __________________
>
>
> Before I mess things up further I thought I'd ask:
> Can I fix this problem by reinstalling some part of Bioperl or Perl?
>
> Thanks,
>
> George
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Dec  6 18:58:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 7 Dec 2007 10:58:02 +1100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>

Sven,

> I just started working with Perl and BioPerl. I'm quite impressed what
> can be easily done with this module. Today I found that my second CPU
> ist not used, but the first one run's at 100%. I tried to include the
> "-a"-parameter, but I was not successful:

My experience agrees with you, in that "-a" does not seem to work with
the pre-compiled BLAST binaries you get from NCBI on a multi-core
system.

I'm not sure why, as "ldd blastall" shows it links against
"/lib64/tls/libpthread.so.0".

Any others have any ideas?

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University
--Tel +61 3 9905 9010


From lzhtom at hotmail.com  Thu Dec  6 23:25:42 2007
From: lzhtom at hotmail.com (zhihuali)
Date: Fri, 7 Dec 2007 04:25:42 +0000
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
Message-ID: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>


Hi netters,
 
I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object:
 
use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',                             -user=>'annoymous',                             -dbname=>'bioseqdb');
 
my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -id=>"test1",                      -seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit;
 
It's successful because I found corresponding rows in the bioseqdb tables.
 
Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object?
 
Thanks a lot!
 
Zhihua Li
_________________________________________________________________
?? Live Search ??????????????
http://www.live.com/?searchOnly=true


From Marc.Logghe at ablynx.com  Fri Dec  7 03:33:17 2007
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Dec 2007 09:33:17 +0100
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>
Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com>

Hi,
The BOSC presentation of Hilmar is a very good way to start with.
Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf
Slide 18 for instance.
Regards,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of zhihuali
> Sent: vrijdag 7 december 2007 5:26
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
> 
> 
> Hi netters,
> 
> I've installed BioSQL and bioperl-db, and successfully created and stored
> a persistent object:
> 
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',
> -user=>'annoymous',                             -dbname=>'bioseqdb');
> 
> my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -
> id=>"test1",                      -seq=>"AGCTAGCT",                      -
> version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj-
> >create;$dbobj->commit;
> 
> It's successful because I found corresponding rows in the bioseqdb tables.
> 
> Now I want to retrieve the object back from the database. There's not much
> documents available and I've tried find_by_unique_key/primary_key but all
> failed. Maybe I didn't use them correctly. Could anyone give me an example
> as how to retrieve the stored Bio::Seq object?
> 
> Thanks a lot!
> 
> Zhihua Li
> _________________________________________________________________
> ?? Live Search ??????????????
> http://www.live.com/?searchOnly=true


From avilella at gmail.com  Fri Dec  7 05:32:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 7 Dec 2007 10:32:43 +0000
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
In-Reply-To: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
References: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com>

Hi Johan,

It would be great if you could upload an example reproducible case:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

Maybe simply doing a tar.gz of the directory with the sample files and
the script, and a simple
explanation on how to run it. If you have any special "env" vars
regarding tmp files, could you
specify those as well?

Thanks,

    Albert.

On Dec 5, 2007 11:35 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Hello,
>
> I have a bunch of multiple sequence alignments of protein coding genes,
> which I would like to analyse with the SLAC method of the HyPhy package. I
> tried using the SLAC.pm module in bioperl-run, but I could not get it to
> work properly.
>
> Basically, for each MSA file, I create the Bio::Tree::Tree and
> Bio::SimpleAlign objects ($tree and $aln, respectively) required as
> arguments to SLAC, and call the method with: "($rc,$result) =
> $slac->run($aln,$tree)" in a loop procedure in my script.
>
> When I choose not to save the tmp files (the default option in SLAC.pm),
> the program complains that it cannot find the file
> "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
> (which works fine). Apparently, it looks for the wrapper.bf file in the
> first tmp dir created, which is deleted in the end of the first SLAC call.
>
> If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
> all calls to SLAC give returncode 1, and no error message is received.
> However, when I look at the resulting $result hashref, it turns out that
> all results are for the FIRST alignment read. I've made sure there is
> nothing strange with my loop procedure, and I checked that the tree and
> alignment objects look OK for each MSA. Apparently, it does create new
> "results.tsv" files in the tmp directory after each run, but it is
> identical each time it's created. Also, it only creates ONE tmp directory,
> no matter how many times SLAC is executed (I would imagine it was supposed
> to save each result in separate tmp dirs?)
>
> Thus, it seems to me like the errors occur because something goes wrong in
> the creation of temporary files. Have I done something wrong here, or have
> any other of you experienced the same problem?
>
> Best regards
> /Johan
>
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From J.Hane at murdoch.edu.au  Mon Dec 10 02:31:17 2007
From: J.Hane at murdoch.edu.au (James Hane)
Date: Mon, 10 Dec 2007 16:31:17 +0900
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>

I've been trying to compile some bioperl based scripts for win32 using
perl2exe which have worked out really well - except I've noticed I
cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic
despite requiring perl2exe to include them.  Anyone have any suggestions
how to get these to compile?


From Kevin.M.Brown at asu.edu  Mon Dec 10 10:34:35 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 08:34:35 -0700
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
	<477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu>

I use PAR to create exe's for windows users and it works fine with
bioperl. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane
> Sent: Monday, December 10, 2007 12:31 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
> 
> I've been trying to compile some bioperl based scripts for win32 using
> perl2exe which have worked out really well - except I've noticed I
> cannot compile Align::IO, Bio::Location::Simple or 
> Bio::Location::Atomic
> despite requiring perl2exe to include them.  Anyone have any 
> suggestions
> how to get these to compile?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Kevin.M.Brown at asu.edu  Mon Dec 10 13:23:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 11:23:01 -0700
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU +
	avoidBLAST reload
In-Reply-To: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
References: <47545590.1000703@boekhoff.info>
	<a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu>

I use the -a option with blast all the time and it works, even on
multicore systems. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Torsten Seemann
> Sent: Thursday, December 06, 2007 4:58 PM
> To: Sven Boekhoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one 
> CPU + avoidBLAST reload
> 
> Sven,
> 
> > I just started working with Perl and BioPerl. I'm quite 
> impressed what
> > can be easily done with this module. Today I found that my 
> second CPU
> > ist not used, but the first one run's at 100%. I tried to 
> include the
> > "-a"-parameter, but I was not successful:
> 
> My experience agrees with you, in that "-a" does not seem to work with
> the pre-compiled BLAST binaries you get from NCBI on a multi-core
> system.
> 
> I'm not sure why, as "ldd blastall" shows it links against
> "/lib64/tls/libpthread.so.0".
> 
> Any others have any ideas?
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> --Tel +61 3 9905 9010
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From nadav.denekamp at gmail.com  Wed Dec 12 08:29:18 2007
From: nadav.denekamp at gmail.com (Nadav Y. Denekamp)
Date: Wed, 12 Dec 2007 15:29:18 +0200
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>

Hello,

I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic
thanks - Nadav


------------------------------------------------------------------------------------------------------------
Nadav Y. Denekamp, Ph.D.,
Israel Oceanographic and Limnological Research,
National Institute for Oceanography 
Tel-Shikmona, Haifa, 31080.
Tel: 972-4-8565259
Fax: 972-4-8511911
mobile: 972-50-2167318
Skype: nadavden
Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;

Visit the ?Sleeping Beauty? website: 
http://www.gmm.gu.se/SB


From biojoiner at gmail.com  Wed Dec 12 08:06:42 2007
From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=)
Date: Wed, 12 Dec 2007 21:06:42 +0800
Subject: [Bioperl-l] problem_About_Bioperl_Installation
Message-ID: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>

Dear Admin:

    I have a computer which out of network service, but wanted to have
bioperl installed in it.
    I found the installation method all need net to link CPAN to get the
pakage needed, so is there some complete installation program for me to
install it in a net-isolated computer, or some other method to solve the
problom?
    Wait for your kindful answer.
     Thanks very much!

-- 

============================================================
????

??????????????????????????HapMap??
??????????????????????B??6????
??????+86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/

***********************************************************************************************
Feng Cheng

Division of HapMap Project
Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
Beijing Airport Industrial Zone B-6, Beijing, 101318, China
Tel: +86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/
============================================================


From avilella at gmail.com  Wed Dec 12 09:50:16 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 12 Dec 2007 14:50:16 +0000
Subject: [Bioperl-l] problem_About_Bioperl_Installation
In-Reply-To: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
References: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com>

You can also download the tar.gz packages from the bioperl.org
website, and copy them to the computer. Then unpack
the tar.gzs, and update your PERL5LIB env var.

On Dec 12, 2007 1:06 PM, ???? <biojoiner at gmail.com> wrote:
> Dear Admin:
>
>     I have a computer which out of network service, but wanted to have
> bioperl installed in it.
>     I found the installation method all need net to link CPAN to get the
> pakage needed, so is there some complete installation program for me to
> install it in a net-isolated computer, or some other method to solve the
> problom?
>     Wait for your kindful answer.
>      Thanks very much!
>
> --
>
> ============================================================
> ????
>
> ??????????????????????????HapMap??
> ??????????????????????B??6????
> ??????+86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
>
> ***********************************************************************************************
> Feng Cheng
>
> Division of HapMap Project
> Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
> Beijing Airport Industrial Zone B-6, Beijing, 101318, China
> Tel: +86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
> ============================================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 12 10:22:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 12 Dec 2007 09:22:45 -0600
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
Message-ID: <E95ADE14-FF71-4068-B958-60BD1EEEBF3C@uiuc.edu>

If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA  
files) then you can write up your own script.  From 'perldoc  
Bio::Index::Fasta':

# Once the index is made it can accessed, either in the
# same script or a different one
use Bio::Index::Fasta;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name);
my $out = Bio::SeqIO?>new(?format => ?Fasta?,
                           ?fh => \*STDOUT);

foreach my $id (@ARGV) {
     my $seq = $inx?>fetch($id); # Returns Bio::Seq object
          $out?>write_seq($seq);
}

# or, alternatively
my $id;
my $seq = $inx?>get_Seq_by_id($id); # identical to fetch()


....

chris

On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote:

> Hello,
>
> I am trying to retrieve a list of sequences from an indexed flast  
> FASTA file. I tried to use the script bp_fetch.pl but I could only  
> retrieve one sequence for one identifier. I am looking for a way to  
> provide a list of accession numbers to a script and to retrieve the  
> sequences. I don't have much experience with perl so I appologize if  
> this question is very basic
> thanks - Nadav
>
>
> ------------------------------------------------------------------------------------------------------------
> Nadav Y. Denekamp, Ph.D.,
> Israel Oceanographic and Limnological Research,
> National Institute for Oceanography
> Tel-Shikmona, Haifa, 31080.
> Tel: 972-4-8565259
> Fax: 972-4-8511911
> mobile: 972-50-2167318
> Skype: nadavden
> Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;
>
> Visit the ?Sleeping Beauty? website:
> http://www.gmm.gu.se/SB
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From karchana at ibab.ac.in  Thu Dec 13 22:56:14 2007
From: karchana at ibab.ac.in (Information_details)
Date: Thu, 13 Dec 2007 19:56:14 -0800 (PST)
Subject: [Bioperl-l]  How to get the contents?
Message-ID: <14329679.post@talk.nabble.com>


Hi,

I am new to bioperl.

I am using module  Bio::SeqIO;

I have genbank file. http://www.nabble.com/file/p14329679/seq.gb seq.gb 

In this file i have to match gene tag and get all its contents.

which function i have to use?

The gene portion look like this

 gene            1..485
                     /gene="PRM1"
                     /note="Derived by automated computational analysis
using
                     gene prediction method: BestRefseq. Supporting evidence
                     includes similarity to: 1 mRNA"
                     /db_xref="GeneID:5619"
                     /db_xref="HGNC:9447"

i have to match gene tag and get its contents?

[CODE]
$seq=$seqobj->next_seq();

foreach $feat ($seq->get_all_SeqFeatures())
 {
        if($feat->primary_tag eq "mRNA")
        {
                foreach $tag ($feat->get_all_tags())
                {
                        if($tag eq "gene")
                        {
                            #here i have to retrieve the information like
this.
                           1..485
                         /gene="PRM1"
                        }
                 }
         }
[/CODE]
How do i do that?  

with regards
Archana


-- 
View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From mike.thon at gmail.com  Fri Dec 14 12:41:44 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 14 Dec 2007 18:41:44 +0100
Subject: [Bioperl-l] How to get the contents?
In-Reply-To: <14329679.post@talk.nabble.com>
References: <14329679.post@talk.nabble.com>
Message-ID: <9F93893E-182A-4A5F-B27C-089521CAA355@gmail.com>

Hi Information_details, a.k.a. Archana :)

"1", and "485" can be retrieved with something like:

$feat->start();
$feat->end();

if you want start and end of each exon then you need:
my $location = $feat->location();

which returns a Bio::LocationI object.

I think the 'gene' tag is a tag-value pair that  can be retrieved with:

my @values = $feat->get_tag_values("gene");

-Mike


On Dec 14, 2007, at 4:56 AM, Information_details wrote:

>
> Hi,
>
> I am new to bioperl.
>
> I am using module  Bio::SeqIO;
>
> I have genbank file. http://www.nabble.com/file/p14329679/seq.gb  
> seq.gb
>
> In this file i have to match gene tag and get all its contents.
>
> which function i have to use?
>
> The gene portion look like this
>
> gene            1..485
>                     /gene="PRM1"
>                     /note="Derived by automated computational analysis
> using
>                     gene prediction method: BestRefseq. Supporting  
> evidence
>                     includes similarity to: 1 mRNA"
>                     /db_xref="GeneID:5619"
>                     /db_xref="HGNC:9447"
>
> i have to match gene tag and get its contents?
>
> [CODE]
> $seq=$seqobj->next_seq();
>
> foreach $feat ($seq->get_all_SeqFeatures())
> {
>        if($feat->primary_tag eq "mRNA")
>        {
>                foreach $tag ($feat->get_all_tags())
>                {
>                        if($tag eq "gene")
>                        {
>                            #here i have to retrieve the information  
> like
> this.
>                           1..485
>                         /gene="PRM1"
>                        }
>                 }
>         }
> [/CODE]
> How do i do that?
>
> with regards
> Archana
>
>
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Dec 15 10:15:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 15 Dec 2007 09:15:00 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] CVS freeze
Message-ID: <9FE0873D-E009-42E6-B37A-32584655ED06@uiuc.edu>

All,

We are in the midst of switching over BioPerl from CVS to SVN.  We are  
tentatively freezing the bioperl CVS repository Dec. 19 in order to  
prepare for the switch.  At that time we plan on building and setting  
up the SVN repository, running some remedial tests (commit messages,  
etc), then announcing the switch on the list.  Soon after we will try  
getting a sync'ed read-only CVS set up for legacy purposes.

If anyone has any commits to add to the repository we suggest making  
them as soon as possible.

chris


From margots at mail.nih.gov  Tue Dec 18 10:00:11 2007
From: margots at mail.nih.gov (Margot Sunshine)
Date: Tue, 18 Dec 2007 15:00:11 +0000 (UTC)
Subject: [Bioperl-l] bio-perl cvs freeze
Message-ID: <loom.20071218T145502-552@post.gmane.org>

Hi,

I have been trying to checkout bio-perl from cvs since yesterday afternoon 
(Dec 17). My request just hangs. I can login but I cannot checkout anything. 
My reading of your posting of the planned switch from CVS to SVN seemed to 
indicate that this was not to take place until tomorrow. Help!

Thanks,
Margot Sunshine


From ste.ghi at libero.it  Tue Dec 18 13:04:21 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Tue, 18 Dec 2007 19:04:21 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>

Dear all,
  I'm facing with a really annoying problem regarding large files handling.
I wrote a script (below) which should keep sequences from an embl formatted file and write out the sequences in a customized fasta format. The script works, but since the input file is rather big 5.6 GB unzipped (987 MB zipped), after a while all the physical and virtual memories of my workstation (4GB RAM) are filled and the script is killed...
I really don't know how to avoid this huge memory usage...and now I'm wondering if this is the right approach....
Please help me!
Best wishes,
Stefano 


#################
#!/usr/bin/perl -w

use strict;

use warnings;

use Fcntl;
use Cwd;

use Bio::SeqIO;

my $infile = $ARGV[0];
my $outfile = "$ARGV[0].fasta";
my $organism;
my $count;
my $path = cwd()."/$outfile";

print "Working dir is: ".cwd().".\nCreating file: $path\n";

my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -format => 'EMBL');

while ( my $seq = $in->next_seq() ) {
	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);  
	my $id = $seq->accession_number();	
	my $desc = $seq->desc(); chop $desc;
	my $species = $seq->species->binomial();
	my $subspecies = $seq->species->sub_species();
	if ($seq->species->sub_species()) {chop $subspecies; $organism = $species." ".$subspecies;}
		else {$organism = $species;}
	my $sequence = $seq->seq();
	print TO ">$id $desc [$organism]\n$sequence\n";
    	$count++;
	warn $@ if $@;
	close TO;
}

print "Done!\n\t$count sequences have been treated. The file $ARGV[0].fasta is ready.\n";


From jason at bioperl.org  Tue Dec 18 13:22:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:22:07 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <loom.20071218T145502-552@post.gmane.org>
References: <loom.20071218T145502-552@post.gmane.org>
Message-ID: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>

Margot -
The code freeze won't affect the the anonymous cvs, and we'll likely  
keep anonymous CVS as is (and maybe even figure out how to keep it  
updated with the SVN) since external tools depend on it and have  
published CVS instructions.

I was able to do an anonymous checkout fine on my machine just now --  
if the problem persists please send a message to support at open-bio.org  
and the support volunteers will track it from there.

-jason
On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:

> Hi,
>
> I have been trying to checkout bio-perl from cvs since yesterday  
> afternoon
> (Dec 17). My request just hangs. I can login but I cannot checkout  
> anything.
> My reading of your posting of the planned switch from CVS to SVN  
> seemed to
> indicate that this was not to take place until tomorrow. Help!
>
> Thanks,
> Margot Sunshine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Dec 18 13:31:39 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:31:39 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>

Not exactly clear why you aren't using Bio::SeqIO to write the  
sequence back out in FASTA format and why you are re-opening the file  
each time?

Did you look at the examples that show how to convert file formats?
http://bioperl.org/wiki/HOWTO:SeqIO

You can set the description with
$seq->description($newdescription);
and the ID with
$seq->display_id($newid);
before writing.

It isn't clear to me from your code why it would be leaking memory  
and causing a problem - is it possible that you have a huge sequence  
in the EMBL file?

-jason
On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:

> Dear all,
>   I'm facing with a really annoying problem regarding large files  
> handling.
> I wrote a script (below) which should keep sequences from an embl  
> formatted file and write out the sequences in a customized fasta  
> format. The script works, but since the input file is rather big  
> 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> virtual memories of my workstation (4GB RAM) are filled and the  
> script is killed...
> I really don't know how to avoid this huge memory usage...and now  
> I'm wondering if this is the right approach....
> Please help me!
> Best wishes,
> Stefano
>
>
>
> #################
> #!/usr/bin/perl -w
>
> use strict;
>
> use warnings;
>
> use Fcntl;
> use Cwd;
>
> use Bio::SeqIO;
>
> my $infile = $ARGV[0];
> my $outfile = "$ARGV[0].fasta";
> my $organism;
> my $count;
> my $path = cwd()."/$outfile";
>
> print "Working dir is: ".cwd().".\nCreating file: $path\n";
>
> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> 	my $id = $seq->accession_number();	
> 	my $desc = $seq->desc(); chop $desc;
> 	my $species = $seq->species->binomial();
> 	my $subspecies = $seq->species->sub_species();
> 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> $species." ".$subspecies;}
> 		else {$organism = $species;}
> 	my $sequence = $seq->seq();
> 	print TO ">$id $desc [$organism]\n$sequence\n";
>     	$count++;
> 	warn $@ if $@;
> 	close TO;
> }
>
> print "Done!\n\t$count sequences have been treated. The file $ARGV 
> [0].fasta is ready.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cain.cshl at gmail.com  Tue Dec 18 14:04:11 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:04:11 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
Message-ID: <1198004651.11000.19.camel@frissell>

Hi Jason and all,

Does the fact that cvs is sticking around (read only) mean that viewcvs
(the web interface) will stick around too?  I was thinking about
modifying the GBrowse net installer to use the 'automatic' tarball of
bioperl-live to download and install via nmake on Windows since it
doesn't have cvs support built in.  Also, with cvs sticking around, I
don't need to rewrite the installer to use svn (yeah!).

Thanks,
Scott

On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> Margot -
> The code freeze won't affect the the anonymous cvs, and we'll likely  
> keep anonymous CVS as is (and maybe even figure out how to keep it  
> updated with the SVN) since external tools depend on it and have  
> published CVS instructions.
> 
> I was able to do an anonymous checkout fine on my machine just now --  
> if the problem persists please send a message to support at open-bio.org  
> and the support volunteers will track it from there.
> 
> -jason
> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> 
> > Hi,
> >
> > I have been trying to checkout bio-perl from cvs since yesterday  
> > afternoon
> > (Dec 17). My request just hangs. I can login but I cannot checkout  
> > anything.
> > My reading of your posting of the planned switch from CVS to SVN  
> > seemed to
> > indicate that this was not to take place until tomorrow. Help!
> >
> > Thanks,
> > Margot Sunshine
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From jason at bioperl.org  Tue Dec 18 14:20:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 11:20:11 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <1198004651.11000.19.camel@frissell>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
Message-ID: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>


On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:

> Hi Jason and all,
>
> Does the fact that cvs is sticking around (read only) mean that  
> viewcvs
> (the web interface) will stick around too?  I was thinking about
> modifying the GBrowse net installer to use the 'automatic' tarball of
> bioperl-live to download and install via nmake on Windows since it
> doesn't have cvs support built in.  Also, with cvs sticking around, I
> don't need to rewrite the installer to use svn (yeah!).
>
Hey Scott -

Perhaps, there may be better tools with SVN anyways, we could also  
just instantiate a script that tarballed the already auto-updated  
code here (i think it syncs every hour):
http://bioperl.org/SRC/

We'll still playing around with this and I can't guarantee that we'll  
get the SVN commits back to CVS to work.

-jason
> Thanks,
> Scott
>
> On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
>> Margot -
>> The code freeze won't affect the the anonymous cvs, and we'll likely
>> keep anonymous CVS as is (and maybe even figure out how to keep it
>> updated with the SVN) since external tools depend on it and have
>> published CVS instructions.
>>
>> I was able to do an anonymous checkout fine on my machine just now --
>> if the problem persists please send a message to support at open-bio.org
>> and the support volunteers will track it from there.
>>
>> -jason
>> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
>>
>>> Hi,
>>>
>>> I have been trying to checkout bio-perl from cvs since yesterday
>>> afternoon
>>> (Dec 17). My request just hangs. I can login but I cannot checkout
>>> anything.
>>> My reading of your posting of the planned switch from CVS to SVN
>>> seemed to
>>> indicate that this was not to take place until tomorrow. Help!
>>>
>>> Thanks,
>>> Margot Sunshine
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD


From cain.cshl at gmail.com  Tue Dec 18 14:31:23 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:31:23 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
	<4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
Message-ID: <1198006283.11000.20.camel@frissell>

Cool.  For the moment, I'll just wait and see what happens :-)

Thanks,
Scott

On Tue, 2007-12-18 at 11:20 -0800, Jason Stajich wrote:
> On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:
> 
> > Hi Jason and all,
> >
> > Does the fact that cvs is sticking around (read only) mean that  
> > viewcvs
> > (the web interface) will stick around too?  I was thinking about
> > modifying the GBrowse net installer to use the 'automatic' tarball of
> > bioperl-live to download and install via nmake on Windows since it
> > doesn't have cvs support built in.  Also, with cvs sticking around, I
> > don't need to rewrite the installer to use svn (yeah!).
> >
> Hey Scott -
> 
> Perhaps, there may be better tools with SVN anyways, we could also  
> just instantiate a script that tarballed the already auto-updated  
> code here (i think it syncs every hour):
> http://bioperl.org/SRC/
> 
> We'll still playing around with this and I can't guarantee that we'll  
> get the SVN commits back to CVS to work.
> 
> -jason
> > Thanks,
> > Scott
> >
> > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> >> Margot -
> >> The code freeze won't affect the the anonymous cvs, and we'll likely
> >> keep anonymous CVS as is (and maybe even figure out how to keep it
> >> updated with the SVN) since external tools depend on it and have
> >> published CVS instructions.
> >>
> >> I was able to do an anonymous checkout fine on my machine just now --
> >> if the problem persists please send a message to support at open-bio.org
> >> and the support volunteers will track it from there.
> >>
> >> -jason
> >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> >>
> >>> Hi,
> >>>
> >>> I have been trying to checkout bio-perl from cvs since yesterday
> >>> afternoon
> >>> (Dec 17). My request just hangs. I can login but I cannot checkout
> >>> anything.
> >>> My reading of your posting of the planned switch from CVS to SVN
> >>> seemed to
> >>> indicate that this was not to take place until tomorrow. Help!
> >>>
> >>> Thanks,
> >>> Margot Sunshine
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From avilella at gmail.com  Tue Dec 18 15:33:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 18 Dec 2007 20:33:43 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
Message-ID: <358f4d650712181233q2a1627c3v6fb4e3e20b9f6c78@mail.gmail.com>

There is a Bio::SeqIO "largefasta" object that will use the hard-disk
for very large fasta files.

On Dec 18, 2007 6:31 PM, Jason Stajich <jason at bioperl.org> wrote:
> Not exactly clear why you aren't using Bio::SeqIO to write the
> sequence back out in FASTA format and why you are re-opening the file
> each time?
>
> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
>
> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
>
> It isn't clear to me from your code why it would be leaking memory
> and causing a problem - is it possible that you have a huge sequence
> in the EMBL file?
>
> -jason
>
> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
>
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files
> > handling.
> > I wrote a script (below) which should keep sequences from an embl
> > formatted file and write out the sequences in a customized fasta
> > format. The script works, but since the input file is rather big
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and
> > virtual memories of my workstation (4GB RAM) are filled and the
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> >       sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> >       my $id = $seq->accession_number();
> >       my $desc = $seq->desc(); chop $desc;
> >       my $species = $seq->species->binomial();
> >       my $subspecies = $seq->species->sub_species();
> >       if ($seq->species->sub_species()) {chop $subspecies; $organism =
> > $species." ".$subspecies;}
> >               else {$organism = $species;}
> >       my $sequence = $seq->seq();
> >       print TO ">$id $desc [$organism]\n$sequence\n";
> >       $count++;
> >       warn $@ if $@;
> >       close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Dec 18 21:29:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 18 Dec 2007 20:29:19 -0600
Subject: [Bioperl-l] perl 5.10 released
Message-ID: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>

The next major perl release, perl 5.10, has officially been released:

http://use.perl.org/article.pl?sid=07/12/18/195247

I'll try testing BioPerl with perl 5.10 and any relevant modules when  
I can; this may have to wait until after SVN migration.  If there are  
any interested parties who want to bioperl compatibility with perl  
5.10 feel free to post your results!

chris


From David.Messina at sbc.su.se  Wed Dec 19 11:44:06 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 10:44:06 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
Message-ID: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>

Hi everyone,

Perl 5.10 builds fine and passes all tests on my PB G4 running OS X 10.5.1.
Piece o' cake.

Here are results of testing BioPerl on this virgin install:

I downloaded the latest CVS tarball. I did 'perl Build.PL', which used CPAN
to install a bunch of dependencies. I then did 'Build test'. For the most
part everything was fine.

- Bio::Biblio::IO::medlinexml throws an exception because XML::Parser isn't
installed.

- RNA_SearchIO fails a few tests.

- Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception because
Graph::Directed isn't installed.

- Spidey fails one test.

And of course without the optional dependencies installed, many tests were
skipped.

I'll now go back and install the optional dependencies and do the network
tests, but it looks like for the most part we play nice with the new Perl.

Dave


From ste.ghi at libero.it  Wed Dec 19 11:45:15 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Wed, 19 Dec 2007 17:45:15 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>

> Not exactly clear why you aren't using Bio::SeqIO to write the  
> sequence back out in FASTA format and why you are re-opening the file  
> each time?
It was to avoid tho keep the out file always opened...

> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
yes I did...but I didn't realized how to set a customized description...

> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
Thanks for the hint. Anyway, just using the simple code reported to convert embl to fasta format, the results are the same...I remember you that I'm using a huge input file: the uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!

> It isn't clear to me from your code why it would be leaking memory  
> and causing a problem - is it possible that you have a huge sequence  
> in the EMBL file?
> -jason

At the end, I succeeded in the format conversion using this command:

gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
(/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
(/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'

(Thanks to Riccardo Percudani). It's not bioperl...but it works!

My best wishes,
Stefano


> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
> 
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files  
> > handling.
> > I wrote a script (below) which should keep sequences from an embl  
> > formatted file and write out the sequences in a customized fasta  
> > format. The script works, but since the input file is rather big  
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> > virtual memories of my workstation (4GB RAM) are filled and the  
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now  
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> > 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> > 	my $id = $seq->accession_number();	
> > 	my $desc = $seq->desc(); chop $desc;
> > 	my $species = $seq->species->binomial();
> > 	my $subspecies = $seq->species->sub_species();
> > 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> > $species." ".$subspecies;}
> > 		else {$organism = $species;}
> > 	my $sequence = $seq->seq();
> > 	print TO ">$id $desc [$organism]\n$sequence\n";
> >     	$count++;
> > 	warn $@ if $@;
> > 	close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV 
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Wed Dec 19 12:17:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:17:28 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
Message-ID: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>


On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:

>> Not exactly clear why you aren't using Bio::SeqIO to write the
>> sequence back out in FASTA format and why you are re-opening the file
>> each time?
> It was to avoid tho keep the out file always opened...
>
>> Did you look at the examples that show how to convert file formats?
>> http://bioperl.org/wiki/HOWTO:SeqIO
> yes I did...but I didn't realized how to set a customized  
> description...
>
>> You can set the description with
>> $seq->description($newdescription);
>> and the ID with
>> $seq->display_id($newid);
>> before writing.
> Thanks for the hint. Anyway, just using the simple code reported to  
> convert embl to fasta format, the results are the same...I remember  
> you that I'm using a huge input file: the  
> uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!
>
>> It isn't clear to me from your code why it would be leaking memory
>> and causing a problem - is it possible that you have a huge sequence
>> in the EMBL file?
>> -jason
>
> At the end, I succeeded in the format conversion using this command:
>
> gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
>
> (Thanks to Riccardo Percudani). It's not bioperl...but it works!
>
> My best wishes,
> Stefano


As this shows, sometimes BioPerl isn't always the best answer (I know,  
blasphemy...).  As Jason suggested it's quite likely there are large  
sequence records causing your problems when using BioPerl.  The one- 
liner works b/c it doesn't retain data (sequence, annotation, etc) in  
memory as Bio::Seq object; it's a direct conversion.

It would be nice to code up a lazy sequence object and related  
parsers; maybe for the next dev release.

chris


From cjfields at uiuc.edu  Wed Dec 19 12:08:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:08:31 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
Message-ID: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>


On Dec 19, 2007, at 10:44 AM, Dave Messina wrote:

> Hi everyone,
>
>
> Perl 5.10 builds fine and passes all tests on my PB G4 running OS X  
> 10.5.1. Piece o' cake.
>
> Here are results of testing BioPerl on this virgin install:
>
> I downloaded the latest CVS tarball. I did 'perl Build.PL', which  
> used CPAN to install a bunch of dependencies. I then did 'Build  
> test'. For the most part everything was fine.
>
> - Bio::Biblio::IO::medlinexml throws an exception because  
> XML::Parser isn't installed.

XML::Parser used to be shipped with a number of perl distros even  
though it isn't core.  We should add a require to these.

> - RNA_SearchIO fails a few tests.

These are very likely from recent commits I made re:GenericHSP and use  
of bits(), raw_score(), etc. (the fails look like missing/switched  
vals with these method tests).  I'll fix these post-svn migration, but  
I don't think these are related to 5.10.

> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception  
> because Graph::Directed isn't installed.

Odd, that should be caught out before tests are run.  Needs to be  
fixed, but one would think this would fail as well under 5.8.

> - Spidey fails one test.

Passes for me.  Is it dependency-related?

> And of course without the optional dependencies installed, many  
> tests were skipped.
>
> I'll now go back and install the optional dependencies and do the  
> network tests, but it looks like for the most part we play nice with  
> the new Perl.
>
> Dave

Not sure, but it seems a bit faster.  Maybe it's just me but it would  
be nice to see some benchmarks comparing perl 5.8 vs 5.10.  I agree,  
it was a very fast and easy install.

I'll start a page on the wiki for test fails using perl 5.10.  I'm  
seeing a few fails;  I'm getting the following with everything  
installed (including DBD::mysql, DBI, etc) using perl 5.10, Mac OS X  
10.5.1 (note Test::Harness now gives TODO's, so some of these are  
actually passing).  Note the entrezgene.t and DB.t fails; I looked  
into these and I think they are related to the odd 'pseudohashes are  
deprecated' warnings we were getting in perl 5.8 tests, so there may  
be something legitimately buggy.

Test Summary Report
-------------------
t/Annotation.t                (Wstat: 0 Tests: 112 Failed: 0)
   TODO passed:   96
t/BioGraphics.t               (Wstat: 256 Tests: 35 Failed: 1)
   Failed test number(s):  4
   Non-zero exit status: 1
t/DB.t                        (Wstat: 65280 Tests: 106 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 116 tests but ran 106.
t/DBCUTG.t                    (Wstat: 1024 Tests: 33 Failed: 4)
   Failed test number(s):  29-31, 33
   Non-zero exit status: 4
t/RNA_SearchIO.t              (Wstat: 2048 Tests: 496 Failed: 8)
   Failed test number(s):  291, 338, 372-374, 395, 455, 486
   Non-zero exit status: 8
t/entrezgene.t                (Wstat: 65280 Tests: 648 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 1422 tests but ran 648.
Files=255, Tests=15066, 435 wallclock secs ( 3.15 usr  1.72 sys +  
124.87 cusr 13.29 csys = 143.03 CPU)
Result: FAIL
Failed 5/255 test programs. 13/15066 subtests failed.


chris


From David.Messina at sbc.su.se  Wed Dec 19 12:49:32 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 11:49:32 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
Message-ID: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>

>
> XML::Parser used to be shipped with a number of perl distros even
> though it isn't core.  We should add a require to these.


Agreed.


> - RNA_SearchIO fails a few tests.
>
> These are very likely from recent commits I made re:GenericHSP and use
> of bits(), raw_score(), etc. (the fails look like missing/switched
> vals with these method tests).  I'll fix these post-svn migration, but
> I don't think these are related to 5.10.


Agreed -- I doubt this is 5.10-specific.


> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
> > because Graph::Directed isn't installed.
>
> Odd, that should be caught out before tests are run.  Needs to be
> fixed, but one would think this would fail as well under 5.8.


Yep, and in a minute here I'll test it under 5.8.


> > - Spidey fails one test.
>
> Passes for me.  Is it dependency-related?


I don't think so, but I guess we'll see once I finish installing the
dependencies. Here's what I got:

t/Spidey........................ok 1/26 Can't call method "sub_SeqFeature"
on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
# Looks like you planned 26 tests but only ran 3.
# Looks like your test died just after 3.
t/Spidey........................dubious

        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 4-26
        Failed 23/26 tests, 11.54% okay


Dave


From cjfields at uiuc.edu  Wed Dec 19 14:19:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 13:19:10 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
Message-ID: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>

Just updated from CVS and reran tests, Spidey.t is failing now.  This  
may be from a recent commit:

http://lists.open-bio.org/pipermail/bioperl-guts-l/2007-December/026854.html

I'm updating the following page on the wiki for tracking.  There are a  
few more we should look into at some point:

http://www.bioperl.org/w/index.php?title=Bioperl_and_Perl_5.10

chris

On Dec 19, 2007, at 11:49 AM, Dave Messina wrote:

>>
>> XML::Parser used to be shipped with a number of perl distros even
>> though it isn't core.  We should add a require to these.
>
>
> Agreed.
>
>
>> - RNA_SearchIO fails a few tests.
>>
>> These are very likely from recent commits I made re:GenericHSP and  
>> use
>> of bits(), raw_score(), etc. (the fails look like missing/switched
>> vals with these method tests).  I'll fix these post-svn migration,  
>> but
>> I don't think these are related to 5.10.
>
>
> Agreed -- I doubt this is 5.10-specific.
>
>
>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>> because Graph::Directed isn't installed.
>>
>> Odd, that should be caught out before tests are run.  Needs to be
>> fixed, but one would think this would fail as well under 5.8.
>
>
> Yep, and in a minute here I'll test it under 5.8.
>
>
>
>
>>> - Spidey fails one test.
>>
>> Passes for me.  Is it dependency-related?
>
>
> I don't think so, but I guess we'll see once I finish installing the
> dependencies. Here's what I got:
>
> t/Spidey........................ok 1/26 Can't call method  
> "sub_SeqFeature"
> on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
> # Looks like you planned 26 tests but only ran 3.
> # Looks like your test died just after 3.
> t/Spidey........................dubious
>
>        Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 4-26
>        Failed 23/26 tests, 11.54% okay
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Wed Dec 19 18:42:14 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 17:42:14 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
Message-ID: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>

Hi Chris and everyone,

With most of the optional dependencies installed, I'm seeing  
essentially the same test failures, including the CODE ref thingy.  
I've noted this on the new Wiki page you created.

According to Data::Dumper's documentation,
Data::Dumper cheats with CODE references. If a code reference is  
encountered in the structure being processed (and if you haven't set  
theDeparse flag), an anonymous subroutine that contains the string  
'"DUMMY"' will be inserted in its place, and a warning will be printed  
if Purity is set. You can eval the result, but bear in mind that the  
anonymous sub that gets created is just a placeholder. Someday, perl  
will have a switch to cache-on-demand the string representation of a  
compiled piece of code, I hope. If you have prior knowledge of all the  
code refs that your data structures are likely to have, you can use  
the Seen method to pre-seed the internal reference table and make the  
dumped output point to them, instead. See EXAMPLES above.


So it's not BioPerl per se, but we can probably work around it.


>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>> because Graph::Directed isn't installed.
>>>
>>> Odd, that should be caught out before tests are run.  Needs to be
>>> fixed, but one would think this would fail as well under 5.8.
>>
>>
>> Yep, and in a minute here I'll test it under 5.8.


Strangely, the Ontology tests properly get skipped under 5.8.

Dave


From ki.baik at roche.com  Wed Dec 19 19:58:42 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 19 Dec 2007 16:58:42 -0800
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>

Hello,

 
I'm interested in parsing the output of the CAP contig assembly program
into a format that is more manageable. The CAP output is shown below:

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

            ____________________________________________________________

consensus   CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

Seq2+       ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

            ____________________________________________________________

consensus   ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

Seq2+       CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

            ____________________________________________________________

consensus   CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

Seq2+       TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC

            ____________________________________________________________

consensus   TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG

 
I would like to maintain the alignment with their base positions for
each sequence. A fasta format retaining the alignment position is ideal
such as below:

 
>Seq1+

CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

>Seq2+

------------------------------------------------------------

ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC--------

 
Does anyone have any experience doing this?

 
Regards,

 
KB


From cjfields at uiuc.edu  Wed Dec 19 20:41:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 19:41:51 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
Message-ID: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>

On Dec 19, 2007, at 5:42 PM, Dave Messina wrote:

> Hi Chris and everyone,
>
> With most of the optional dependencies installed, I'm seeing  
> essentially the same test failures, including the CODE ref thingy.  
> I've noted this on the new Wiki page you created.
>
> According to Data::Dumper's documentation,
> Data::Dumper cheats with CODE references. If a code reference is  
> encountered in the structure being processed (and if you haven't set  
> theDeparse flag), an anonymous subroutine that contains the string  
> '"DUMMY"' will be inserted in its place, and a warning will be  
> printed if Purity is set. You can eval the result, but bear in mind  
> that the anonymous sub that gets created is just a placeholder.  
> Someday, perl will have a switch to cache-on-demand the string  
> representation of a compiled piece of code, I hope. If you have  
> prior knowledge of all the code refs that your data structures are  
> likely to have, you can use the Seen method to pre-seed the internal  
> reference table and make the dumped output point to them, instead.  
> See EXAMPLES above.
>
>
> So it's not BioPerl per se, but we can probably work around it.

May be something in Module::Build or Build.PL that needs tweaking.

It looks like EntrezGene parsing is broken for now using perl 5.10;  
the 'pseudohash' warnings with perl 5.8 were indicating something was  
amiss but we could never place it.  Any fixes will have to wait until  
after svn migration.  Not sure what's going on with the others fails  
just yet.

>>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>>> because Graph::Directed isn't installed.
>>>>
>>>> Odd, that should be caught out before tests are run.  Needs to be
>>>> fixed, but one would think this would fail as well under 5.8.
>>>
>>>
>>> Yep, and in a minute here I'll test it under 5.8.
>
>
> Strangely, the Ontology tests properly get skipped under 5.8.
>
> Dave

May be worth looking into.  Have you added it to the wiki?

chris


From David.Messina at sbc.su.se  Wed Dec 19 23:52:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 22:52:16 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
	<980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
Message-ID: <628aabb70712192052p5d9afe3bvf4fa1da872f56355@mail.gmail.com>

>
> May be something in Module::Build or Build.PL that needs tweaking.


I took a quick look-see and I'm pretty sure it's Module::Build.
Specifically, Module::Build::Base::write_config(), where there are three
calls with coderefs as parameters to _write_data() to match the three
coderef errors we are seeing at the end of 'perl Build.PL'.

_write_data() in turn calls Module::Build::Dumper::_data_dump() and uses
some ugly Data::Dumper voodoo to serialize.

I don't understand the voodoo well enough to explain why this appears only
with Perl 5.10, though; it sure looks like it should have with 5.8, too.


> Strangely, the Ontology tests properly get skipped under 5.8.
>
> May be worth looking into.  Have you added it to the wiki?


Uhhh, yeah...of course! (just now)

Should be a simple fix after the post-svn thaw.

Dave


From David.Messina at sbc.su.se  Thu Dec 20 00:39:41 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 23:39:41 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
Message-ID: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>

Hi Ki,

Hopefully someone who (unlike me) uses these modules regularly will chime
in, but in the meantime, here are some ideas:

The Bio::AssemblyIO module can read and write ace files, which CAP3 can
produce as output. I don't think there is an explicit means to dump to a
multi-fasta file like you want.

But you could probably write a Bio::AssemblyIO::Fasta class which could
write the multi-Fasta format you want. Then you could use Bio::AssemblyIO
objects to read in ace files from CAP3 and write out to multi-fasta.

Look at

Bio::AssemblyIO::*
Bio::Assembly::ScaffoldI
Bio::Assembly::Contig
Bio::LocatableSeq
Bio::AlignIO

Assemblies are made of scaffolds, scaffolds are made of contigs, and contigs
are made of sequences which can be manipulated like any old seq in BioPerl.
Bio::AlignIO can read and write multiple sequence alignments and
multi-fastas, so that should help you to get from AssemblyIO to your desired
output format.


Hope this helps,
Dave


From mike.thon at gmail.com  Thu Dec 20 00:59:06 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Thu, 20 Dec 2007 06:59:06 +0100
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>


On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:

> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');

This is just for the sake of curiosity, since you already found a  
solution to your problem, but I wonder how perl will handle a file  
opened this way.  Will it try to suck the whole thing into ram in one  
go?

Mike


From cjfields at uiuc.edu  Thu Dec 20 00:54:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 23:54:36 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
	<628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
Message-ID: <EB4F110F-9F12-4478-89C2-5DDF4FEF07C6@uiuc.edu>


On Dec 19, 2007, at 11:39 PM, Dave Messina wrote:

> Hi Ki,
>
> Hopefully someone who (unlike me) uses these modules regularly will  
> chime
> in, but in the meantime, here are some ideas:
>
> The Bio::AssemblyIO module can read and write ace files, which CAP3  
> can
> produce as output. I don't think there is an explicit means to dump  
> to a
> multi-fasta file like you want.
>
> But you could probably write a Bio::AssemblyIO::Fasta class which  
> could
> write the multi-Fasta format you want. Then you could use  
> Bio::AssemblyIO
> objects to read in ace files from CAP3 and write out to multi-fasta.
>
> Look at
>
> Bio::AssemblyIO::*
> Bio::Assembly::ScaffoldI
> Bio::Assembly::Contig
> Bio::LocatableSeq
> Bio::AlignIO
>
> Assemblies are made of scaffolds, scaffolds are made of contigs, and  
> contigs
> are made of sequences which can be manipulated like any old seq in  
> BioPerl.
> Bio::AlignIO can read and write multiple sequence alignments and
> multi-fastas, so that should help you to get from AssemblyIO to your  
> desired
> output format.
>
>
>
> Hope this helps,
> Dave

What would help is to make Bio::Assembly::Contig implement Bio::AlignI  
correctly, or make it a subclass of Bio::SimpleAlign.  That way one  
could read in Scaffolds in via Bio::Assembly::IO and write out Contigs  
through Bio::AlignIO directly.  In theory that should work but IIRC it  
doesn't.

chris


From jason at bioperl.org  Thu Dec 20 02:13:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Dec 2007 23:13:55 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
Message-ID: <02EC6D6D-F807-492F-B125-9FE0393B1FD9@bioperl.org>

It gets buffered via the OS -- Bio::Root::IO calls next_line  
iteratively, but eventually the whole sequence object will get put  
into RAM as it is built up.
zcat or bzcat can also be used for gzipped and bzipped files  
respectively, I like to use this where I want to disk space footprint  
down.

Because we treat data input usually as from a stream ignoring whether  
it is in a file or not, we have to have a more flexible structure to  
really handle this, although I'd argue the data really belongs in a  
database when it is too big for memory.
More compact Feature/Location objects would probably also help here.   
I would not be surprised if the memory requirement has more to do  
with the number of features than length of the sequence - human chrom  
1 can fit into memory just fine on most machines with 2GB of RAM.

But it would require someone taking an interest in some re- 
architecting here.

-jason

On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:

>
> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>
>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
>> format => 'EMBL');
>
> This is just for the sake of curiosity, since you already found a  
> solution to your problem, but I wonder how perl will handle a file  
> opened this way.  Will it try to suck the whole thing into ram in  
> one go?
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ste.ghi at libero.it  Thu Dec 20 08:57:54 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 20 Dec 2007 14:57:54 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>

I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want...

> It gets buffered via the OS -- Bio::Root::IO calls next_line  
> iteratively, but eventually the whole sequence object will get put  
> into RAM as it is built up.
> zcat or bzcat can also be used for gzipped and bzipped files  
> respectively, I like to use this where I want to disk space footprint  
> down.
> 
> Because we treat data input usually as from a stream ignoring whether  
> it is in a file or not, we have to have a more flexible structure to  
> really handle this, although I'd argue the data really belongs in a  
> database when it is too big for memory.
> More compact Feature/Location objects would probably also help here.   
> I would not be surprised if the memory requirement has more to do  
> with the number of features than length of the sequence - human chrom  
> 1 can fit into memory just fine on most machines with 2GB of RAM.
> 
> But it would require someone taking an interest in some re- 
> architecting here.
> 
> -jason
> 
> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
> 
> >
> > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
> >
> >> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> >> format => 'EMBL');
> >
> > This is just for the sake of curiosity, since you already found a  
> > solution to your problem, but I wonder how perl will handle a file  
> > opened this way.  Will it try to suck the whole thing into ram in  
> > one go?
> >
> > Mike
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From amackey at pcbi.upenn.edu  Thu Dec 20 10:32:19 2007
From: amackey at pcbi.upenn.edu (Aaron Mackey)
Date: Thu, 20 Dec 2007 10:32:19 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <476A7736.109@toulouse.inra.fr>
References: <476A7736.109@toulouse.inra.fr>
Message-ID: <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com>

The NHX writer will only add the [&&NHX] block when there are tags to
be written.  Your code reads in a Newick tree without tags, and then
writes it back out without adding any new tags.  So yes, you need to
1) read the Newick tree, 2) traverse the tree, calling
$node->nhx_tag({T => $taxon_id}) for each node with each corresponding
$taxon_id, and then 3) write out the NHX tree.

-Aaron

On Dec 20, 2007 9:07 AM, Laurence Amilhat
<Laurence.Amilhat at toulouse.inra.fr> wrote:
> Dear Mr MacKey,
>
>
> I am pretty new in Tree parsing and writing with BioPerl.
> I am trying to convert a Newick tree file to a NHX tree file with adding
> the Taxid for the node in the NHX tree file.
>
> I saw the module Bio::Tree::NodeNHX, but very few examples...
>
> I don't know where do i need to start, I tried the easy way with
> Bio::TreeIO,
> but the resulting tree doesn't have the [&&NHX] in the internal node,
> and I don't know how to add the tag [&&NHX:T=xxxx] on the node,
> Do I need to use the nhx_tag method to do this?
>
> Maybe you have an example that use NHX tag in tree node, that might be
> very helpfull for me to get to understand how it works...
>
>
> Have a nice holidays,
>
>
> Best regards,
>
>
> Laurence Amilhat.
>
>
>
>
> This is the simple code that I use to convert a tree from  newick to nhx:
>
> use Bio::TreeIO;
> use Getopt::Long;
> my $tree_file;
> my $outfile;
>
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile);
>
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>
> while (my $tree= $treeio->next_tree)
> {
>    $treeout->write_tree($tree);
> }
>
> --
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan         =
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
>
>
>
>


From cjfields at uiuc.edu  Thu Dec 20 11:14:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 10:14:55 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>

As Jason mentioned, it may be the number of features in the record if  
the record itself is huge (i.e. human chromosome-sized, full  
metagenome, etc).  If (my) memory serves correctly the mem. footprint  
for a perl object is ~10x the actual data, give or take (it depends on  
the complexity of the object itself).  In cases like this indexing may  
not fix the problem, unless you have an object which retains the file  
position of the data instead of the data itself; I don't think we have  
this object type in BioPerl.

The only way I can think of to fix this would be (as Jason also  
suggested) lightweight objects, or something like the lazy sequence  
object ala the SwissKnife suite (which only bring what you want into  
memory).

Related to that, I have been testing something like that, which uses  
iterators to pass in chunks of data from a stream to handlers to build  
a sequence object.  Wouldn't be too hard to reconfigure that to return  
file positions as well.  Maybe for the 1.7 release...

chris

On Dec 20, 2007, at 7:57 AM, Stefano Ghignone wrote:

> I was wandering if, working with so big FILE, should be better first  
> index the database, than query it formatting the sequences as one  
> want...
>
>> It gets buffered via the OS -- Bio::Root::IO calls next_line
>> iteratively, but eventually the whole sequence object will get put
>> into RAM as it is built up.
>> zcat or bzcat can also be used for gzipped and bzipped files
>> respectively, I like to use this where I want to disk space footprint
>> down.
>>
>> Because we treat data input usually as from a stream ignoring whether
>> it is in a file or not, we have to have a more flexible structure to
>> really handle this, although I'd argue the data really belongs in a
>> database when it is too big for memory.
>> More compact Feature/Location objects would probably also help here.
>> I would not be surprised if the memory requirement has more to do
>> with the number of features than length of the sequence - human chrom
>> 1 can fit into memory just fine on most machines with 2GB of RAM.
>>
>> But it would require someone taking an interest in some re-
>> architecting here.
>>
>> -jason
>>
>> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>>
>>>
>>> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>>>
>>>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
>>>> format => 'EMBL');
>>>
>>> This is just for the sake of curiosity, since you already found a
>>> solution to your problem, but I wonder how perl will handle a file
>>> opened this way.  Will it try to suck the whole thing into ram in
>>> one go?
>>>
>>> Mike
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Thu Dec 20 11:26:17 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 20 Dec 2007 10:26:17 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <628aabb70712200826p36d3d451wdcd901f555bc210a@mail.gmail.com>

On 12/20/07, Stefano Ghignone <ste.ghi at libero.it> wrote:
>
> I was wandering if, working with so big FILE, should be better first index
> the database, than query it formatting the sequences as one want...
>

Agreed, but only if you want to randomly access sequences within the file. I
believe the original poster intends to do something with every sequence in
the big file, in which case streaming the file is likely to be much faster.


Dave


From akarger at CGR.Harvard.edu  Thu Dec 20 11:48:58 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 11:48:58 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> 
> 
> On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:
> 
> > At the end, I succeeded in the format conversion using this command:
> >
> > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
> >
> > (Thanks to Riccardo Percudani). It's not bioperl...but it works!
> 
> 
> As this shows, sometimes BioPerl isn't always the best answer 
> (I know,  
> blasphemy...).  As Jason suggested it's quite likely there are large  
> sequence records causing your problems when using BioPerl.  The one- 
> liner works b/c it doesn't retain data (sequence, annotation, 
> etc) in  
> memory as Bio::Seq object; it's a direct conversion.
> 
> It would be nice to code up a lazy sequence object and related  
> parsers; maybe for the next dev release.

Yes!

Also, BLAST parsing. Blasting the proteome against the genome makes for
rather large result files. Right now, if you want to delete queries that
hit, say, more than 1000 times, you still need to wait for Bioperl to
create objects and sub-objects for every single hit. Sadly, this example
isn't hypothetical. I'm going to solve it with something like:

perl -wne 'BEGIN {$/="TBLASTN"} print if length($_) < $some_big_value'
big_blast > filtered_blast

(Not that I'm volunteering to help with the parser writing, so I should
stop complaining.)

-Amir


From bix at sendu.me.uk  Thu Dec 20 12:06:28 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 17:06:28 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
Message-ID: <476AA114.2060201@sendu.me.uk>

Chris Fields wrote:
> The only way I can think of to fix this would be (as Jason also 
> suggested) lightweight objects, or something like the lazy sequence 
> object ala the SwissKnife suite (which only bring what you want into 
> memory).
> 
> Related to that, I have been testing something like that, which uses 
> iterators to pass in chunks of data from a stream to handlers to build a 
> sequence object.  Wouldn't be too hard to reconfigure that to return 
> file positions as well.  Maybe for the 1.7 release...

Bio::PullParserI is your friend.


From bix at sendu.me.uk  Thu Dec 20 13:48:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 18:48:29 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
Message-ID: <476AB8FD.8090108@sendu.me.uk>

Amir Karger wrote:
>> It would be nice to code up a lazy sequence object and related  
>> parsers; maybe for the next dev release.
> 
> Yes!
> 
> Also, BLAST parsing. Blasting the proteome against the genome makes for
> rather large result files.

This has already been done. Use Bio::SearchIO::blast_pull. In a 
situation like yours I dropped run time from 20223s to
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40%
less).


From akarger at CGR.Harvard.edu  Thu Dec 20 13:52:51 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 13:52:51 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AB8FD.8090108@sendu.me.uk>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>

> Amir Karger wrote:
> >> It would be nice to code up a lazy sequence object and related  
> >> parsers; maybe for the next dev release.
> > 
> > Also, BLAST parsing. Blasting the proteome against the 
> genome makes for
> > rather large result files.
> 
> This has already been done. Use Bio::SearchIO::blast_pull. In a 
> situation like yours I dropped run time from 20223s to
> 951s (~20x faster) and memory usage from over 8GB to less 
> than 5GB (~40%
> less).

Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
can put in my own perl lib for this, or does it require large bunches of
new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
here, but I don't see our whole center using CVS Bioperl.

-Amir


From cjfields at uiuc.edu  Thu Dec 20 15:27:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:27:45 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AA114.2060201@sendu.me.uk>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
	<476AA114.2060201@sendu.me.uk>
Message-ID: <29E190AB-8A6C-4F1C-BDD1-6034CFFEEFFF@uiuc.edu>

On Dec 20, 2007, at 11:06 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> The only way I can think of to fix this would be (as Jason also  
>> suggested) lightweight objects, or something like the lazy sequence  
>> object ala the SwissKnife suite (which only bring what you want  
>> into memory).
>> Related to that, I have been testing something like that, which  
>> uses iterators to pass in chunks of data from a stream to handlers  
>> to build a sequence object.  Wouldn't be too hard to reconfigure  
>> that to return file positions as well.  Maybe for the 1.7 release...
>
> Bio::PullParserI is your friend.

I'm looking into that, yes.  I'm thinking of something like a generic  
lazy sequence class with an embedded Handler/PullParser object which  
processes stuff on the fly.

Oh, when I have a bit more time...

chris


From cjfields at uiuc.edu  Thu Dec 20 15:39:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:39:48 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <2EC6A1C2-FBC9-45F6-AD1B-040E29FAFA28@uiuc.edu>


On Dec 20, 2007, at 12:52 PM, Amir Karger wrote:

>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related
>>>> parsers; maybe for the next dev release.
>>>
>>> Also, BLAST parsing. Blasting the proteome against the
>> genome makes for
>>> rather large result files.
>>
>> This has already been done. Use Bio::SearchIO::blast_pull. In a
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less
>> than 5GB (~40%
>> less).
>
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large  
> bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.
>
> -Amir

It's in CVS.

Just to note: there have been a lot of changes between 1.5.1 and  
1.5.2, and probably as many from 1.5.2 to now.  We are cleaning up  
some code introduced prior to the 1.5 release and working on other  
fixes and code docs, with the final aim to be a new 1.6; I'm hoping  
that release will have routine point releases for bug fixes.  Of  
course that'll have to wait until after SVN migration!

There a few discussions on the list about speeding up parsing using  
lightweight/featherweight objects or even straight hashes (for  
instance, Jason has a lightweight seqfeature implementation committed  
on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser  
implementations).  My feeling is that will be part of the next dev  
release, along with GFF3 integration and code cleanup.

chris


From bix at sendu.me.uk  Thu Dec 20 18:29:30 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 23:29:30 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <476AFADA.20604@sendu.me.uk>

Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related  
>>>> parsers; maybe for the next dev release.
>>> Also, BLAST parsing. Blasting the proteome against the 
>>> genome makes for rather large result files.
>> This has already been done. Use Bio::SearchIO::blast_pull. In a 
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less 
>> than 5GB (~40% less).
> 
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.

blast_pull is only in CVS (and needs a whole bunch of associated modules 
to work), though 1.5.2 also contains significant improvements to 
SearchIO generally which should provide you with significant speed 
improvements during blast parsing with the normal Bio::SearchIO::blast.


From abdul.sattar4 at ntlworld.com  Thu Dec 20 19:32:06 2007
From: abdul.sattar4 at ntlworld.com (Abdul Sattar)
Date: Fri, 21 Dec 2007 00:32:06 -0000
Subject: [Bioperl-l]  bioperl-db & biperl version
Message-ID: <000001c84368$ee7872b0$c5836351@owner00d4289a7>

BFG-0DRTGO0EEGREWTYU


From DGroskreutz at twt.com  Fri Dec 21 02:01:27 2007
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Fri, 21 Dec 2007 01:01:27 -0600
Subject: [Bioperl-l] Groskreutz, Deb is out of the office.
Message-ID: <OF1CBDB887.820A02D2-ON862573B8.002695BD-862573B8.002695BD@twt.com>


I will be out of the office starting  12/20/2007 and will not return until
01/01/2008.

I will respond to your message when I return on January 2nd, 2008


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From bug-bioperl at rt.cpan.org  Fri Dec 21 07:07:39 2007
From: bug-bioperl at rt.cpan.org (Brandi Cantarel via RT)
Date: Fri, 21 Dec 2007 07:07:39 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25638-1198238855-470.31796-4-0@rt.cpan.org>


Fri Dec 21 07:07:30 2007: Request 31796 was acted upon.
Transaction: Ticket created by brandi.cantarel at afmb.univ-mrs.fr
       Queue: bioperl
     Subject: SeqIO
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: brandi.cantarel at afmb.univ-mrs.fr
      Status: new
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >


I might have found a bug in SeqIO in bioperl.  Well it is actually a  
memory leak.  When I try to load large file, I can step through the  
first 10K or so sequences (using next_seq) but then it just hangs.....

If this bug is fixed please let me know.

Brandi Cantarel

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug-bioperl at rt.cpan.org  Fri Dec 21 08:57:20 2007
From: bug-bioperl at rt.cpan.org (Sendu Bala via RT)
Date: Fri, 21 Dec 2007 08:57:20 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25615-1198245436-879.31796-5-0@rt.cpan.org>


       Queue: bioperl
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >

On Fri Dec 21 07:07:30 2007, brandi.cantarel at afmb.univ-mrs.fr wrote:
> I might have found a bug in SeqIO in bioperl.  Well it is actually a  
> memory leak.  When I try to load large file, I can step through the  
> first 10K or so sequences (using next_seq) but then it just hangs.....
> 
> If this bug is fixed please let me know.

Please use http://bugzilla.bioperl.org/ to tell us about this bug. 
After creating a bug report you'll be able to attach the script in 
which you encounter the problem, which we need to diagnose this issue.


From susantoroy at gmail.com  Sat Dec 22 07:06:42 2007
From: susantoroy at gmail.com (Susanta Roy)
Date: Sat, 22 Dec 2007 17:36:42 +0530
Subject: [Bioperl-l] Enquiry about bioperl project
Message-ID: <236a58340712220406m3d3f9884h8f7b5e58bdfb356@mail.gmail.com>

Dear Sir,


Most humbly I have to state that I am Susanta Roy, 25 years and I have
done  my masters in bioinformatics. I have more than  nine months of work
experience as Associate Technical Content  Developer. I have also worked
in the journal "Bioinformatics  India" (The first bioinformatics journal
of India, now "Bioinformatics Trends"). My work with  previous employer
was highly appreciated.

This year I have founded Bioexplore, a bioinformatics KPO (Knowledge
Process Outsourcing) due to lack of bioinformatics jobs in India.

Our services include

1. Bioinformatics data mining / programming
2. HR solution
3. Technical writing solution
4. E-learning
5. Abstracing & indexing
6. Business promotion solution

I want to inquire if you can give me a project.

-- Looking forward to your reply.

Kind Regards
Mr. Susanta Roy, MS Bioinformatics
Founder Director
Bioexplore
C-5, Hazipark Market
Dimapur, Nagaland - 797112
India
+ 91 - 9811517324 (Mobile)
susanta.roy at bioexplore.co.in
susantoroy at gmail.com


From cjfields at uiuc.edu  Sat Dec  1 04:37:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 30 Nov 2007 22:37:50 -0600
Subject: [Bioperl-l] Problems installing bioperl (bioperl-live tarball
	from	CVS)
In-Reply-To: <000901c833bf$33d53500$0a02a8c0@AWALL>
References: <000901c833bf$33d53500$0a02a8c0@AWALL>
Message-ID: <75FF7E93-1633-4D43-9BC0-8BE2A6A7711D@uiuc.edu>

Make sure to keep this on the list.

ncbi_gi() is only in bioperl-live (CVS); my guess is you either  
somehow got 1.5.2 instead or the bioperl-live version is not found in  
your path.  It's very likely the latter, as perl's looking for  
whatever else is present (which appears to be an older version of  
bioperl). That should give you a hint that the problem may be with  
your lib path.  Try changing the 'Use lib '/home/awaller/bioperl-live/ 
Bio'' to:

use lib '/home/awaller/bioperl-live';

chris

On Nov 30, 2007, at 8:09 PM, alison waller wrote:

> Okay so Now I'm really confused.
> I edited > #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live/Bio.
> I ran the script below with the *special hit->ncbi from Chris.  It  
> worked,
> it was great, I got the gi! No errors, no bugs that I saw in  
> checking the
> output.  Then I went back in, edited the script to retrieve further  
> info
> (specifically the strand).  Saved it, now when I try to run it I get  
> the
> same error message that I was previously getting.
>
> barrett ~ $ perl blast_parse_awcf.pl OldMoBlastxGiTest.txt 1
> Can't locate object method "ncbi_gi" via package
> "Bio::Search::Hit::BlastHit" at blast_parse_awcf.pl line 50, <GEN1>  
> line
> 189.
>
> Thanks soo much,
>
>
> #!usr/bin/perl
>
> use strict;
> use warnings;
> use lib "/home/awaller/bioperl-live/Bio";
> use Bio::Perl;
> use Bio::SearchIO;
>
> my $usage = "to run type: blast_parse_aw.pl <blast report> <# of  
> hits per
> query> \n"; if (@ARGV != 2) { die $usage; }
>
> my $infile  = $ARGV[0];
> my $outfile = $infile . '.parsed';
> my $tophit  = $ARGV[1]; # to specify in the command line how many hits
>                      # to report for each query
>
> open( OUT, ">$outfile" ) || die "Can't open outputfile $outfile! $! 
> \n";
>
> my $report = Bio::SearchIO->new(
>  -file   => $infile,
>  -format => "blast"
> );
>
> print OUT join("\t",qw(
>              Query
>              HitDesc
>              HitAccess
>              HitGi
> 		HitBits
>              Evalue
>              %id
>              AlignLen
>              NumIdent
>              NumPos
>              gaps
>              Qframe
>              Qstrand
>              Hframe
> 		Hstrand))."\n";
>
> # Go through BLAST reports one by one
> while ( my $result = $report->next_result ) {
>  my $ct = 0;
>  my @tophits = grep {$ct++ < $tophit } $result->hits;
>  if (scalar(@tophits) == 0) {
>     print OUT "no hits\n";
>  }
>  for my $hit (@tophits) {
>     my $tophsp=$hit->hsp('best');
>     # Print some tab-delimited data about this hit
>     print OUT join("\t",
>                    $result->query_name,
>                    $hit->description,
>                    $hit->accession,
>                    $hit->ncbi_gi,
>                    $hit->bits,
>                    $tophsp->evalue,
>                    $tophsp->percent_identity,
>                    $tophsp->length('total'),
>                    $tophsp->num_identical,
>                    $tophsp->num_conserved,
>                    $tophsp->gaps,
>                    $tophsp->query->frame,
> 		      $tophsp->strand('query'),	
>                    $tophsp->hit->frame,
> 		      $tophsp->strand('hit'),	
>                   )."\n";
>  }
> }
>
>
>
>
> -----Original Message-----
> From: Sendu Bala [mailto:bix at sendu.me.uk]
> Sent: Friday, November 30, 2007 6:24 PM
> To: alison waller
> Subject: Re: [Bioperl-l] Problems installing bioperl (bioperl-live  
> tarball
> from CVS)
>
> alison waller wrote:
>> Thank you Sendu,
>>
>> So I'm trying the second option.  I have downloaded the bioperl-live
> tarball
>> from the CVS on my windows laptop, and then moved it to my home  
>> directory
> in
>> the linux cluster where I unzipped and tared it.  So I now have a
> directory
>> /home/awaller/bioperl-live.
>>
>> I edited my .bashrc file as below:
>> Export PERL5LIB='/home/awaller/bioperl-live'
>>
>> I also edited a sample script to include:
>> #!usr/bin/perl
>> Use lib '/home/awaller/bioperl-live'
>
> Does this directory contain a 'Bio' directory with all the BioPerl
> modules inside it?
>
>
>> But it still isn't working.
>> At the prompt I typed$ perl script.pl
>> It gave me the warning - can't locate object method ncbi_gi which  
>> is why
> I'm
>> trying to download the CVS version as Chris Fields added code to  
>> make the
>> ncbi-gi object.
>
> You'll have to give me the complete, unedited error message and  
> ideally
> the script itself before I can help you further.
>
>
>> Don't I have to do something similar to what the Build.PL file does?
>
> Probably not. It doesn't matter where your perl executable is, btw, as
> long as the system knows how to run perl, which it obviously does.
> <OldMoBlastxGiTest.txt.parsed><OldMoBlastxGiTest.txt>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From alan.bridge at isb-sib.ch  Sun Dec  2 18:29:48 2007
From: alan.bridge at isb-sib.ch (Alan Bridge)
Date: Sun, 02 Dec 2007 19:29:48 +0100
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
Message-ID: <4752F99C.9050504@isb-sib.ch>

Hello,

I was just wondering if, when performing a RemoteBlast, it would be 
possible to specify the entire UniProt database (i.e. Swiss-Prot + 
TrEMBL), or even just TrEMBL.

It seems that currently, you can only specify Swiss-Prot (the annotated 
portion of UniProt, which is much smaller than its automatically 
annotated counterpart, TrEMBL). Any hints on how to expand the search 
space to include TrEMBL would be really appreciated.

Regards, Alan Bridge

            my $prog = 'blastp';
            my $db   = 'swissprot'; # use TrEMBL ?
            my $e_val= '1e-10';

            my @params = ( '-prog' => $prog, '-data' => $db, '-expect' 
=> $e_val, '-readmethod' => 'SearchIO' );

-- 
Alan Bridge PhD
Swiss-Prot annotator
Swiss Institute of Bioinformatics (SIB)
1, rue Michel Servet
CH-1211 Geneva 4  
Switzerland   

Tel: (+41 22) 379 58 90
Fax: (+41 22) 379 58 58 

http://www.expasy.org/ 


From avilella at gmail.com  Mon Dec  3 11:39:59 2007
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 3 Dec 2007 11:39:59 +0000
Subject: [Bioperl-l] Query about SLAC.pm module
In-Reply-To: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
References: <OF3E7AF746.CBFC96D8-ONC12573A6.00374CAE-C12573A6.00374CB1@sh.se>
Message-ID: <358f4d650712030339w2f3de057ge5614e60a3f6658c@mail.gmail.com>

[CCing to the bioperl ml]

Sorry, there were some bits left in the pod header referring to PAML
objects that aren't quite true.
I've updated now the PODs. The Hyphy executions return hashes:

If you run the SLAC test in t/Hyphy.t you will se that the $results
are something like:

DB<3> x 2 $results
0  HASH(0x8df3110)
   'E[NS Sites]' => ARRAY(0x8e6cff4)
   'E[S Sites]' => ARRAY(0x8e6ceb0)
   'Observed NS Changes' => ARRAY(0x8e7b380)
   'Observed S Changes' => ARRAY(0x8e7b344)
   'Observed S. Prop.' => ARRAY(0x8e6d018)
   'P{S geq. observed}' => ARRAY(0x8e6d360)
   'P{S leq. observed}' => ARRAY(0x8e6d33c)
   'P{S}' => ARRAY(0x8e6d03c)
   'Scaled dN-dS' => ARRAY(0x8e6d384)
   'dN' => ARRAY(0x8e6d084)
   'dN-dS' => ARRAY(0x8e6d0a8)
   'dS' => ARRAY(0x8e6d060)
  DB<4> x $rc

which correspond to the csv file that hyphy produces.

Cheers,

    Albert.

On Dec 3, 2007 10:04 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Dear Dr. Vilella,
>
> Please allow me to introduce myself. My name is Johan Nilsson and I am a
> postdoctoral researcher in bioinformatics.
>
> I was  planning to perform a large-scale analysis for positively selected
> protein coding genes using any appropriate method from the Hyphy package,
> and I thought your bioperl wrappers 'SLAC.pm', 'FEL.pm' etc. should be very
> useful for this.
>
> IF I interpreted the documents of e.g. the SLAC module correctly, running
> $slac->run($aln,$tree) will return a
> Bio::Tools::Phylo::PAML object. However, when I try to retrieve any results
> from the obtained hashref (running my script on the test files provided
> with bioperl ...t/hyphy1.tree and ...t/hyphy1.fasta), the script complains
> that it is not blessed (e.g. 'Can't call method "get_seqs" on unblessed
> reference').
>
> I am fairly new to bioperl, so please appologise if this question was a
> stupid one :)
>
> Thanks in advance!
>
> Yours Sincerely
> /Johan
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>


From cjfields at uiuc.edu  Mon Dec  3 14:04:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 08:04:06 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast
In-Reply-To: <4752F99C.9050504@isb-sib.ch>
References: <4752F99C.9050504@isb-sib.ch>
Message-ID: <CF967851-5E6C-448A-87C6-CC3F63A5D9AD@uiuc.edu>

You are limited to the databases hosted on the NCBI server, so it's  
really up to them; RemoteBlast is an interface to NCBI's WebBlast  
using URLAPI.

A list of current databases can be found here:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html

chris

On Dec 2, 2007, at 12:29 PM, Alan Bridge wrote:

> Hello,
>
> I was just wondering if, when performing a RemoteBlast, it would be
> possible to specify the entire UniProt database (i.e. Swiss-Prot +
> TrEMBL), or even just TrEMBL.
>
> It seems that currently, you can only specify Swiss-Prot (the  
> annotated
> portion of UniProt, which is much smaller than its automatically
> annotated counterpart, TrEMBL). Any hints on how to expand the search
> space to include TrEMBL would be really appreciated.
>
> Regards, Alan Bridge
>
>            my $prog = 'blastp';
>            my $db   = 'swissprot'; # use TrEMBL ?
>            my $e_val= '1e-10';
>
>            my @params = ( '-prog' => $prog, '-data' => $db, '-expect'
> => $e_val, '-readmethod' => 'SearchIO' );
>
> -- 
> Alan Bridge PhD
> Swiss-Prot annotator
> Swiss Institute of Bioinformatics (SIB)
> 1, rue Michel Servet
> CH-1211 Geneva 4
> Switzerland
>
> Tel: (+41 22) 379 58 90
> Fax: (+41 22) 379 58 58
>
> http://www.expasy.org/


From bioperl at boekhoff.info  Mon Dec  3 19:14:24 2007
From: bioperl at boekhoff.info (Sven Boekhoff)
Date: Mon, 03 Dec 2007 20:14:24 +0100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid BLAST
	reload
Message-ID: <47545590.1000703@boekhoff.info>

HI!
I just started working with Perl and BioPerl. I'm quite impressed what 
can be easily done with this module. Today I found that my second CPU 
ist not used, but the first one run's at 100%. I tried to include the 
"-a"-parameter, but I was not successful:

my @params = (
	-database => 'my_db',
	-a => '2',
	-outfile => 'blast1.out'
);

How do I have to use it?

Second question: In my perlscript I start BLAST-searches in a loop. 
Everytime BLAST has finished its search, the memory is cleared and BLAST 
is started again. I think most of the time is used to reload the 
database. Is it somehow possible to keep the database loaded (e.g. by 
starting a second search) or is BLAST reloaded anyway?

Thanks for your help!

Sven


www.boekhoff.info


From bix at sendu.me.uk  Tue Dec  4 00:05:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 00:05:23 +0000
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
 BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <475499C3.20801@sendu.me.uk>

Sven Boekhoff wrote:
> HI!
> I just started working with Perl and BioPerl. I'm quite impressed what 
> can be easily done with this module. Today I found that my second CPU 
> ist not used, but the first one run's at 100%. I tried to include the 
> "-a"-parameter, but I was not successful:
> 
> my @params = (
> 	-database => 'my_db',
> 	-a => '2',
> 	-outfile => 'blast1.out'
> );
> 
> How do I have to use it?

This should work in the CVS version of StandAloneBlast. In other 
versions, perhaps try using $object->a(2);


> Second question: In my perlscript I start BLAST-searches in a loop. 
> Everytime BLAST has finished its search, the memory is cleared and BLAST 
> is started again. I think most of the time is used to reload the 
> database. Is it somehow possible to keep the database loaded (e.g. by 
> starting a second search) or is BLAST reloaded anyway?

I hope someone will correct me for being wrong, but I think you'd have 
to that with a 2-way pipe. StandAloneBlast only uses output to a file 
and input from that file, finishing with the executable inbetween. I've 
thought about improving it with a 2-way pipe, but never got around to 
it, being apprehensive about stability on all platforms.

The more obvious solution, which may be possible depending on exactly 
what you're doing, is to avoid the loop and just supply Blast all your 
input in one go.


From Russell.Smithies at agresearch.co.nz  Tue Dec  4 00:49:21 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 13:49:21 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>

Hi all,

It' trying to read .ace files but keep getting an error that I don't
know the cause of.
Really basic example code:

	#!/usr/local/bin/perl -w

	use lib "/data/home/smithiesr/bioperl-live";
	use Bio::Assembly::IO;
	use Data::Dumper;

	$ace = "CLP0001001240-cE15_20030319.ace";

	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
	$assembly = $io->next_assembly;

	foreach $contig ($assembly->all_contigs) {
      		print Dumper $contig;
	}

Gives this error;
	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
	Can't call method "get_consensus_sequence" on an undefined value
at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
<GEN0> line 42.

Which relates to this bit in ace.pm:
	# Loading contig qualities... (Base Quality field)
	/^BQ/ && do {
	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();

Is this caused by a dud ace file or a problem with Bio::Assembly::IO:ace
or is the Contig object not getting created?
Any ideas?

Thanx,

Russell Smithies

Bioinformatics Software Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz

Invermay  Research Centre
Puddle Alley, 
Mosgiel, 
New Zealand
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Dec  4 02:15:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 20:15:58 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
Message-ID: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>

This seems similar to the 'too many open filehandles issue' documented  
here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

It unfortunately is due to having an open DB_File for every contig,  
and is a problem with the Bio::Assembly implementation that isn't  
easily fixed.  Changing the open filehandle limit using ulimit is the  
only known fix:

ulimit -n 10000

chris

On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:

> Hi all,
>
> It' trying to read .ace files but keep getting an error that I don't
> know the cause of.
> Really basic example code:
>
> 	#!/usr/local/bin/perl -w
>
> 	use lib "/data/home/smithiesr/bioperl-live";
> 	use Bio::Assembly::IO;
> 	use Data::Dumper;
>
> 	$ace = "CLP0001001240-cE15_20030319.ace";
>
> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> 	$assembly = $io->next_assembly;
>
> 	foreach $contig ($assembly->all_contigs) {
>      		print Dumper $contig;
> 	}
>
> Gives this error;
> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> 	Can't call method "get_consensus_sequence" on an undefined value
> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
> <GEN0> line 42.
>
> Which relates to this bit in ace.pm:
> 	# Loading contig qualities... (Base Quality field)
> 	/^BQ/ && do {
> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>
> Is this caused by a dud ace file or a problem with  
> Bio::Assembly::IO:ace
> or is the Contig object not getting created?
> Any ideas?
>
> Thanx,
>
> Russell Smithies
>
> Bioinformatics Software Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From florent.angly at gmail.com  Tue Dec  4 02:25:24 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Mon, 03 Dec 2007 18:25:24 -0800
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <4754BA94.7090600@gmail.com>

Would this issue cause an excessive memory usage? Because I was getting 
a high memory usage when parsing some TIGR Assembler files and was 
wondering if the tigr parser was responsible for that or the parent 
assembly IO module.
I'd definitely be interested in a fix of the Bio::Assembly 
implementation if it's the assembly IO module's fault....
Florent

Chris Fields wrote:
> This seems similar to the 'too many open filehandles issue' documented  
> here:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>
> It unfortunately is due to having an open DB_File for every contig,  
> and is a problem with the Bio::Assembly implementation that isn't  
> easily fixed.  Changing the open filehandle limit using ulimit is the  
> only known fix:
>
> ulimit -n 10000
>
> chris
>
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>
>   
>> Hi all,
>>
>> It' trying to read .ace files but keep getting an error that I don't
>> know the cause of.
>> Really basic example code:
>>
>> 	#!/usr/local/bin/perl -w
>>
>> 	use lib "/data/home/smithiesr/bioperl-live";
>> 	use Bio::Assembly::IO;
>> 	use Data::Dumper;
>>
>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>
>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>> 	$assembly = $io->next_assembly;
>>
>> 	foreach $contig ($assembly->all_contigs) {
>>      		print Dumper $contig;
>> 	}
>>
>> Gives this error;
>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>> 	Can't call method "get_consensus_sequence" on an undefined value
>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line 170,
>> <GEN0> line 42.
>>
>> Which relates to this bit in ace.pm:
>> 	# Loading contig qualities... (Base Quality field)
>> 	/^BQ/ && do {
>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>
>> Is this caused by a dud ace file or a problem with  
>> Bio::Assembly::IO:ace
>> or is the Contig object not getting created?
>> Any ideas?
>>
>> Thanx,
>>
>> Russell Smithies
>>
>> Bioinformatics Software Developer
>> T +64 3 489 9085
>> E  russell.smithies at agresearch.co.nz
>>
>> Invermay  Research Centre
>> Puddle Alley,
>> Mosgiel,
>> New Zealand
>> T  +64 3 489 3809
>> F  +64 3 489 9174
>> www.agresearch.co.nz
>>
>> = 
>> ======================================================================
>> Attention: The information contained in this message and/or  
>> attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or  
>> privileged
>> material. Any review, retransmission, dissemination or other use of,  
>> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by  
>> AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> = 
>> ======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From Russell.Smithies at agresearch.co.nz  Tue Dec  4 02:32:43 2007
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 4 Dec 2007 15:32:43 +1300
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
Message-ID: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>

Thanx Chris,
I'm only writing a simple .ace viewer to display assembled contigs in a
Bio::Graphics::Panel so I'll parse the coords from the .ace files
"manually".
Unless anyone else has a better idea ?
(and some example code ;-)

Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, 4 December 2007 3:16 p.m.
> To: Smithies, Russell
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
> 
> This seems similar to the 'too many open filehandles issue' documented
> here:
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
> 
> It unfortunately is due to having an open DB_File for every contig,
> and is a problem with the Bio::Assembly implementation that isn't
> easily fixed.  Changing the open filehandle limit using ulimit is the
> only known fix:
> 
> ulimit -n 10000
> 
> chris
> 
> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
> 
> > Hi all,
> >
> > It' trying to read .ace files but keep getting an error that I don't
> > know the cause of.
> > Really basic example code:
> >
> > 	#!/usr/local/bin/perl -w
> >
> > 	use lib "/data/home/smithiesr/bioperl-live";
> > 	use Bio::Assembly::IO;
> > 	use Data::Dumper;
> >
> > 	$ace = "CLP0001001240-cE15_20030319.ace";
> >
> > 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
> > 	$assembly = $io->next_assembly;
> >
> > 	foreach $contig ($assembly->all_contigs) {
> >      		print Dumper $contig;
> > 	}
> >
> > Gives this error;
> > 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
> > 	Can't call method "get_consensus_sequence" on an undefined value
> > at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line
170,
> > <GEN0> line 42.
> >
> > Which relates to this bit in ace.pm:
> > 	# Loading contig qualities... (Base Quality field)
> > 	/^BQ/ && do {
> > 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
> >
> > Is this caused by a dud ace file or a problem with
> > Bio::Assembly::IO:ace
> > or is the Contig object not getting created?
> > Any ideas?
> >
> > Thanx,
> >
> > Russell Smithies
> >
> > Bioinformatics Software Developer
> > T +64 3 489 9085
> > E  russell.smithies at agresearch.co.nz
> >
> > Invermay  Research Centre
> > Puddle Alley,
> > Mosgiel,
> > New Zealand
> > T  +64 3 489 3809
> > F  +64 3 489 9174
> > www.agresearch.co.nz
> >
> > =
> >
> =============================================================
> =========
> > Attention: The information contained in this message and/or
> > attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or
> > privileged
> > material. Any review, retransmission, dissemination or other use of,
> > or
> > taking of any action in reliance upon, this information by persons
or
> > entities other than the intended recipients is prohibited by
> > AgResearch
> > Limited. If you have received this message in error, please notify
the
> > sender immediately.
> > =
> >
> =============================================================
> =========
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at uiuc.edu  Tue Dec  4 05:10:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:10:57 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <4754BA94.7090600@gmail.com>
References: <47545590.1000703@boekhoff.info>
	<475499C3.20801@sendu.me.uk>	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<4754BA94.7090600@gmail.com>
Message-ID: <4F867A88-C0DC-4DF7-9F47-C38712920183@uiuc.edu>

Yes, it's possible this would cause memory issues as each  
Bio::Assembly::Contig instance would have a  
Bio::SeqFeature::Collection attached (each Collection having a tied DB  
hash, which would be an open filehandle),  So if you had over 1000  
contigs open at any one time (in a parsed scaffold, for instance) you  
would have 1000 open file handles.  Not very efficient.

My thought was to have each Bio::Assembly::Scaffold instance carry a  
single Bio::SeqFeature::CollectionI (it could be a  
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other  
CollectionI, whatever's easiest).  Each Contig would be passed (and  
store) a reference to the Scaffold SF::Collection and pull features  
from there; just haven't had time to mess with it.  I don't think  
anyone's tackling it, so feel free to code away!

chris

On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:

> Would this issue cause an excessive memory usage? Because I was  
> getting a high memory usage when parsing some TIGR Assembler files  
> and was wondering if the tigr parser was responsible for that or the  
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly  
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'  
>> documented  here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every  
>> contig,  and is a problem with the Bio::Assembly implementation  
>> that isn't  easily fixed.  Changing the open filehandle limit using  
>> ulimit is the  only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> 	#!/usr/local/bin/perl -w
>>>
>>> 	use lib "/data/home/smithiesr/bioperl-live";
>>> 	use Bio::Assembly::IO;
>>> 	use Data::Dumper;
>>>
>>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> 	$assembly = $io->next_assembly;
>>>
>>> 	foreach $contig ($assembly->all_contigs) {
>>>   		print Dumper $contig;
>>> 	}
>>>
>>> Gives this error;
>>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> 	Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line  
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> 	# Loading contig qualities... (Base Quality field)
>>> 	/^BQ/ && do {
>>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with   
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>> Attention: The information contained in this message and/or   
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or   
>>> privileged
>>> material. Any review, retransmission, dissemination or other use  
>>> of,  or
>>> taking of any action in reliance upon, this information by persons  
>>> or
>>> entities other than the intended recipients is prohibited by   
>>> AgResearch
>>> Limited. If you have received this message in error, please notify  
>>> the
>>> sender immediately.
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Dec  4 05:20:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Dec 2007 23:20:07 -0600
Subject: [Bioperl-l] Bio::Assembly::IO problems reading .ace files
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
	<D5DBA313349A4B458528BE63B387F36C062D5E2A@imail.agresearch.co.nz>
	<692A2BDA-048B-49C6-A101-C13A1DAB9B69@uiuc.edu>
	<D5DBA313349A4B458528BE63B387F36C062D5E91@imail.agresearch.co.nz>
Message-ID: <C48EC1AC-FEA6-4F60-9791-D4DE449768C2@uiuc.edu>

The ulimit fix usually works but if this is for Gbrowse it probably  
isn't prudent.  It would be nice to get Bio::Assembly working as an  
Bio::AlignI; it would be easier to manipulate for display.  Here's a  
script I wrote up as an example:

http://www.bioperl.org/wiki/HOWTO_Discussion:Graphics

chris

On Dec 3, 2007, at 8:32 PM, Smithies, Russell wrote:

> Thanx Chris,
> I'm only writing a simple .ace viewer to display assembled contigs  
> in a
> Bio::Graphics::Panel so I'll parse the coords from the .ace files
> "manually".
> Unless anyone else has a better idea ?
> (and some example code ;-)
>
> Russell


From avilella at gmail.com  Tue Dec  4 11:51:05 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 11:51:05 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around the
	SLR program
Message-ID: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>

Hi all,

There is a new wrapper in bioperl-run for SLR:

http://www.bioperl.org/wiki/SLR

Right now, output parsing is very simple, and I have only tested it on
my linux machine.
Can someone with a Mac give it a try?

update your bioperl-run to cvs head, then:

# try the installer, SLR is option 6
perl scripts/bioperl_application_installer.PLS
# then try to run the tests (should take about a minute)
perl t/SLR.t

Any comments on the code would be appreciated,

Thanks in advance,

Cheers,

    Albert.


From captainrave at hotmail.com  Tue Dec  4 11:04:57 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 03:04:57 -0800 (PST)
Subject: [Bioperl-l]  extracting CDS location from Genbank
Message-ID: <14148723.post@talk.nabble.com>


Help.  I'm very new to perl and bioperl.  Basically I need to extract the
location of each CDS in a genbank entry e.g.103...120 and export them to an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 14:48:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 14:48:27 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14148723.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>

>From the SeqIO howto:

#!/bin/perl

use strict;
use Bio::SeqIO;

my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

>From the Feature HOWTO:

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {             
      print "  tag: ", $tag, "\n";             
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";             
      }          
   }       
}

Surely you could have fouind that yourself? ;0 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 11:05
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] extracting CDS location from Genbank


Help.  I'm very new to perl and bioperl.  Basically I need to extract
the
location of each CDS in a genbank entry e.g.103...120 and export them to
an
output file as a list.  How would I do this?

Your help would be much appreciated!
-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14148723
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From captainrave at hotmail.com  Tue Dec  4 15:07:19 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:07:19 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152264.post@talk.nabble.com>


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example, how
do I tell it which input file to read? I know this might sound stupid, but I
dont understand the Biowiki very well!

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 15:21:34 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:21:34 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>

Post the script that produces that error, and your file's location 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:07
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


Yes but actually implementing it is another story.

I get an error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: file argument provided, but with an undefined value
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
STACK: test3.pl:7
-----------------------------------------------------------

Basically because I dont understand the code well enough.  For example,
how
do I tell it which input file to read? I know this might sound stupid,
but I
dont understand the Biowiki very well!

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152264
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Tue Dec  4 15:39:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 15:39:31 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152264.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
Message-ID: <475574B3.8050700@sendu.me.uk>

Captainrave wrote:
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------

The best way to get help is to give us your script and the error 
message, and the command you used to run your script. The less you know, 
the more you should give us (ie. don't edit anything out).


From captainrave at hotmail.com  Tue Dec  4 15:41:37 2007
From: captainrave at hotmail.com (Captainrave)
Date: Tue, 4 Dec 2007 07:41:37 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <14152907.post@talk.nabble.com>


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From michael.watson at bbsrc.ac.uk  Tue Dec  4 15:53:22 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 4 Dec 2007 15:53:22 -0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk><14152264.post@talk.nabble.com><8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9505A4F77A@iahce2ksrv1.iah.bbsrc.ac.uk>

Same script as below, but try:

my $file = 'C:\path\to\my\filename.gbk'; 

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
Sent: 04 December 2007 15:42
To: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] extracting CDS location from Genbank


#!/bin/perl

use strict;
use Bio::SeqIO;
my $file = shift; # get the file name, somehow
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;

for my $feat_object ($seq_object->get_SeqFeatures) {          
   print "primary tag: ", $feat_object->primary_tag, "\n";          
   for my $tag ($feat_object->get_all_tags) {            
      print "  tag: ", $tag, "\n";            
      for my $value ($feat_object->get_tag_values($tag)) {

         print "    value: ", $value, "\n";            
      }          
   }      
}

exit;

The file is on the same folder.  But how do I tell it to use this file?


michael watson (IAH-C) wrote:
> 
> Post the script that produces that error, and your file's location 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Captainrave
> Sent: 04 December 2007 15:07
> To: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
> 
> 
> Yes but actually implementing it is another story.
> 
> I get an error:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: file argument provided, but with an undefined value
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
> STACK: test3.pl:7
> -----------------------------------------------------------
> 
> Basically because I dont understand the code well enough.  For
example,
> how
> do I tell it which input file to read? I know this might sound stupid,
> but I
> dont understand the Biowiki very well!
> 
> -- 
> View this message in context:
>
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
> l#a14152264
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context:
http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
l#a14152907
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Tue Dec  4 16:20:34 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Dec 2007 10:20:34 -0600
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <C2732712-D32B-449A-8BCA-DCB8BBDE9758@uiuc.edu>

The 'my $file = shift;' is a perl idiom.  The built-in 'shift' used  
implicitly in this way uses @ARGV (from command line); the file would  
the be passed as the first arg when running the script:

get_features.pl myfile.gb

This should work for any OS.  Personally, I use something like the  
following to indicate how the script is used in case a file is never  
entered:

my $USAGE = <<END_USE;
USAGE: get_features.pl <file>
Perl script to grab features from a GenBank file and print to a table
END_USE

my $file = shift || die $USAGE;

chris

On Dec 4, 2007, at 9:41 AM, Captainrave wrote:

>
> #!/bin/perl
>
> use strict;
> use Bio::SeqIO;
> my $file = shift; # get the file name, somehow
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>   print "primary tag: ", $feat_object->primary_tag, "\n";
>   for my $tag ($feat_object->get_all_tags) {
>      print "  tag: ", $tag, "\n";
>      for my $value ($feat_object->get_tag_values($tag)) {
>
>         print "    value: ", $value, "\n";
>      }
>   }
> }
>
> exit;
>
> The file is on the same folder.  But how do I tell it to use this  
> file?
>
>
>
> michael watson (IAH-C) wrote:
>>
>> Post the script that produces that error, and your file's location
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>> Captainrave
>> Sent: 04 December 2007 15:07
>> To: Bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] extracting CDS location from Genbank
>>
>>
>> Yes but actually implementing it is another story.
>>
>> I get an error:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: file argument provided, but with an undefined value
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: Bio::SeqIO::new C:/Perl/site/lib/Bio/SeqIO.pm:359
>> STACK: test3.pl:7
>> -----------------------------------------------------------
>>
>> Basically because I dont understand the code well enough.  For  
>> example,
>> how
>> do I tell it which input file to read? I know this might sound  
>> stupid,
>> but I
>> dont understand the Biowiki very well!
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.htm
>> l#a14152264
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14152907
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Dec  4 16:22:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Dec 2007 16:22:12 +0000
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <14152907.post@talk.nabble.com>
References: <14148723.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>	<14152264.post@talk.nabble.com>	<8975119BCD0AC5419D61A9CF1A923E9505A4F771@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152907.post@talk.nabble.com>
Message-ID: <47557EB4.10003@sendu.me.uk>

Captainrave wrote:
> #!/bin/perl
> my $file = shift; # get the file name, somehow
>
> The file is on the same folder.  But how do I tell it to use this file?

http://stein.cshl.org/genome_informatics/perl_intro/command_line.html

Basically, when you run your script add the name of the file to your 
command line.

me% perl myscript.pl myfile

By saying 'my $file = shift' inside myscript.pl, the variable $file now 
contains the filename 'myfile'.

You could also have hardcoded the filename:
my $file = 'myfile';


Anyway, you're going to run into lots of these issues, and they're 
beyond the scope of this mailing list. For basic perl problems seek help 
via www.perl.org. When you have a BioPerl-specific question, don't 
hesitate to post here.


From jason at bioperl.org  Tue Dec  4 17:16:30 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 09:16:30 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
Message-ID: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>

Excellent - thanks for this !  I'm giving it whirl on linux and the  
SLR.t test is currently taking more than 30 minutes to run -- is it  
possible to cook up an example that is going to finish in a more  
reasonable amount of time?

Also - I would prefer if the default exe could be 'Slr' rather than  
Slr_Linux_static - it seems like it is possible for users to install  
it this way.  Similarly whether or not the Slr_osx or Slr is the  
default name, is it too big of a deal to expect the user to rename it?

I'll give it a whirl on OSX later, but might be easier if the test  
runs shorter.

Thanks!
-jason
On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:

> Hi all,
>
> There is a new wrapper in bioperl-run for SLR:
>
> http://www.bioperl.org/wiki/SLR
>
> Right now, output parsing is very simple, and I have only tested it on
> my linux machine.
> Can someone with a Mac give it a try?
>
> update your bioperl-run to cvs head, then:
>
> # try the installer, SLR is option 6
> perl scripts/bioperl_application_installer.PLS
> # then try to run the tests (should take about a minute)
> perl t/SLR.t
>
> Any comments on the code would be appreciated,
>
> Thanks in advance,
>
> Cheers,
>
>     Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From florent.angly at gmail.com  Tue Dec  4 18:17:08 2007
From: florent.angly at gmail.com (Florent Angly)
Date: Tue, 04 Dec 2007 10:17:08 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::TigrAssembler
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <475599A4.1040500@gmail.com>

Hi all,
I pushed a new module into bioperl-run CVS a few days ago. It's called 
Bio::Tools::Run::TigrAssembler. It is a wrapper for TIGR Assembler, an 
open-source software that assembles DNA sequences.
Input is a list of sequence objects and output assembly objects... easy 
enough...
Let me know if you experience problems with it.
Florent


From jason at bioperl.org  Tue Dec  4 18:51:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 10:51:34 -0800
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <475499C3.20801@sendu.me.uk>
References: <47545590.1000703@boekhoff.info> <475499C3.20801@sendu.me.uk>
Message-ID: <8273f6c20712041051k2bfe36efgb2ae40550df9341@mail.gmail.com>

You can pass in an array reference of sequences instead of a single sequence
object and the module will build a multi-FASTA database.  You can also pass
in a filename instead of a Sequence object and the file can be an already
built multi-FASTA database.  This is described in the documentation:

http://search.cpan.org/~birney/bioperl-1.4/Bio/Tools/Run/StandAloneBlast.pm#blastall

You can also just run BLAST without StandAloneBlast part as I do an just
build your multifile ahead of time with SeqIO and do
# wublast
my $cmd = "blastp -i MULTIFASTA -d DATABASE --cpus 2 |";
# or NCBI blast
# my $cmd = "blastall -a 2 -i MULTIFASTA -p blastp -d DATABASE |";
my $fh;

open($fh, $cmd)
my $searchio = Bio::SearchIO->new(-format => 'blast', -fh => $fh);

The advantage of StandAloneBlast in theory is it takes care of the temporary
file creation (sequncefiles) and cleanup.  Personally I find I want easier
access to my programs that are simple cmdline like this.  You can do similar
things withe SSEARCH or FASTA searching too.

-jason

On Dec 3, 2007 4:05 PM, Sendu Bala <bix at sendu.me.uk> wrote:

> Sven Boekhoff wrote:
> > HI!
> > I just started working with Perl and BioPerl. I'm quite impressed what
> > can be easily done with this module. Today I found that my second CPU
> > ist not used, but the first one run's at 100%. I tried to include the
> > "-a"-parameter, but I was not successful:
> >
> > my @params = (
> >       -database => 'my_db',
> >       -a => '2',
> >       -outfile => 'blast1.out'
> > );
> >
> > How do I have to use it?
>
> This should work in the CVS version of StandAloneBlast. In other
> versions, perhaps try using $object->a(2);
>
>
> > Second question: In my perlscript I start BLAST-searches in a loop.
> > Everytime BLAST has finished its search, the memory is cleared and BLAST
> > is started again. I think most of the time is used to reload the
> > database. Is it somehow possible to keep the database loaded (e.g. by
> > starting a second search) or is BLAST reloaded anyway?
>
> I hope someone will correct me for being wrong, but I think you'd have
> to that with a 2-way pipe. StandAloneBlast only uses output to a file
> and input from that file, finishing with the executable inbetween. I've
> thought about improving it with a 2-way pipe, but never got around to
> it, being apprehensive about stability on all platforms.
>
> The more obvious solution, which may be possible depending on exactly
> what you're doing, is to avoid the loop and just supply Blast all your
> input in one go.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From stefan.kirov at bms.com  Tue Dec  4 19:25:21 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 14:25:21 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
Message-ID: <4755A9A1.2040608@bms.com>

Jason Stajich wrote:
> PAML4 breaks our PAML parser right now because the order of things in  
> the result file has changed.  Now sequences precede the information  
> about the version or the program run.  This means that $result- 
>  >get_seqs() fails because we don't parse the sequences.
>
> We'll see what we can do, but as usual with supporting 3rd party  
> programs it is brittle when file formats change.  Th
>
> -jason
>
> --
> Jason Stajich
> jason at bioperl.org
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   
Jason,
I saw a commit after this post on codeml, but not on PAML.pm- I assume
this is not fixed, am I correct?
Thanks!
Stefan


From avilella at gmail.com  Tue Dec  4 20:34:38 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:34:38 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
Message-ID: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>

hmmm, 30 minutes is quite a lot... it takes much less for me:

avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
1..7
ok 1 - use Bio::Root::IO;
ok 2 - use Bio::Tools::Run::Phylo::SLR;
ok 3 - use Bio::AlignIO;
ok 4 - use Bio::TreeIO;
ok 5
ok 6
ok 7

real    0m21.517s
user    0m20.717s
sys     0m0.100s


On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> Excellent - thanks for this !  I'm giving it whirl on linux and the
> SLR.t test is currently taking more than 30 minutes to run -- is it
> possible to cook up an example that is going to finish in a more
> reasonable amount of time?
>
> Also - I would prefer if the default exe could be 'Slr' rather than
> Slr_Linux_static - it seems like it is possible for users to install
> it this way.  Similarly whether or not the Slr_osx or Slr is the
> default name, is it too big of a deal to expect the user to rename it?
>
> I'll give it a whirl on OSX later, but might be easier if the test
> runs shorter.
>
> Thanks!
> -jason
>
> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>
> > Hi all,
> >
> > There is a new wrapper in bioperl-run for SLR:
> >
> > http://www.bioperl.org/wiki/SLR
> >
> > Right now, output parsing is very simple, and I have only tested it on
> > my linux machine.
> > Can someone with a Mac give it a try?
> >
> > update your bioperl-run to cvs head, then:
> >
> > # try the installer, SLR is option 6
> > perl scripts/bioperl_application_installer.PLS
> > # then try to run the tests (should take about a minute)
> > perl t/SLR.t
> >
> > Any comments on the code would be appreciated,
> >
> > Thanks in advance,
> >
> > Cheers,
> >
> >     Albert.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From avilella at gmail.com  Tue Dec  4 20:39:26 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 4 Dec 2007 20:39:26 +0000
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
Message-ID: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>

oh, I forgot to mention: SLR uses the lapack and blas libraries if
installed, which makes it a lot faster (according to the author)...
maybe that's the reason...

On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
> hmmm, 30 minutes is quite a lot... it takes much less for me:
>
> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
> 1..7
> ok 1 - use Bio::Root::IO;
> ok 2 - use Bio::Tools::Run::Phylo::SLR;
> ok 3 - use Bio::AlignIO;
> ok 4 - use Bio::TreeIO;
> ok 5
> ok 6
> ok 7
>
> real    0m21.517s
> user    0m20.717s
> sys     0m0.100s
>
>
>
> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
> > Excellent - thanks for this !  I'm giving it whirl on linux and the
> > SLR.t test is currently taking more than 30 minutes to run -- is it
> > possible to cook up an example that is going to finish in a more
> > reasonable amount of time?
> >
> > Also - I would prefer if the default exe could be 'Slr' rather than
> > Slr_Linux_static - it seems like it is possible for users to install
> > it this way.  Similarly whether or not the Slr_osx or Slr is the
> > default name, is it too big of a deal to expect the user to rename it?
> >
> > I'll give it a whirl on OSX later, but might be easier if the test
> > runs shorter.
> >
> > Thanks!
> > -jason
> >
> > On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
> >
> > > Hi all,
> > >
> > > There is a new wrapper in bioperl-run for SLR:
> > >
> > > http://www.bioperl.org/wiki/SLR
> > >
> > > Right now, output parsing is very simple, and I have only tested it on
> > > my linux machine.
> > > Can someone with a Mac give it a try?
> > >
> > > update your bioperl-run to cvs head, then:
> > >
> > > # try the installer, SLR is option 6
> > > perl scripts/bioperl_application_installer.PLS
> > > # then try to run the tests (should take about a minute)
> > > perl t/SLR.t
> > >
> > > Any comments on the code would be appreciated,
> > >
> > > Thanks in advance,
> > >
> > > Cheers,
> > >
> > >     Albert.
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>


From jason at bioperl.org  Tue Dec  4 21:43:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:43:03 -0800
Subject: [Bioperl-l] New Bio::Tools::Run::Phylo::SLR - Wrapper around
	the SLR program
In-Reply-To: <358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
References: <358f4d650712040351g4bef4417l4197d06454049140@mail.gmail.com>
	<18ABB052-2539-4932-A7AA-BB6D194BF8C3@bioperl.org>
	<358f4d650712041234n70004aedqa3dc07fb3f6f2e08@mail.gmail.com>
	<358f4d650712041239w7e6dee29lbb13cc2e30a6bce1@mail.gmail.com>
Message-ID: <2CF76A38-5A9E-4A4E-8C9F-29EDD732BDDF@bioperl.org>

My own icc compiled version seemed to have caused the problem.  
whoops. fixed that.
-jason
On Dec 4, 2007, at 12:39 PM, Albert Vilella wrote:

> oh, I forgot to mention: SLR uses the lapack and blas libraries if
> installed, which makes it a lot faster (according to the author)...
> maybe that's the reason...
>
> On Dec 4, 2007 8:34 PM, Albert Vilella <avilella at gmail.com> wrote:
>> hmmm, 30 minutes is quite a lot... it takes much less for me:
>>
>> avilella at magneto:~/bioperl/vanilla/bioperl-run$ time perl t/SLR.t
>> 1..7
>> ok 1 - use Bio::Root::IO;
>> ok 2 - use Bio::Tools::Run::Phylo::SLR;
>> ok 3 - use Bio::AlignIO;
>> ok 4 - use Bio::TreeIO;
>> ok 5
>> ok 6
>> ok 7
>>
>> real    0m21.517s
>> user    0m20.717s
>> sys     0m0.100s
>>
>>
>>
>> On Dec 4, 2007 5:16 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> Excellent - thanks for this !  I'm giving it whirl on linux and the
>>> SLR.t test is currently taking more than 30 minutes to run -- is it
>>> possible to cook up an example that is going to finish in a more
>>> reasonable amount of time?
>>>
>>> Also - I would prefer if the default exe could be 'Slr' rather than
>>> Slr_Linux_static - it seems like it is possible for users to install
>>> it this way.  Similarly whether or not the Slr_osx or Slr is the
>>> default name, is it too big of a deal to expect the user to  
>>> rename it?
>>>
>>> I'll give it a whirl on OSX later, but might be easier if the test
>>> runs shorter.
>>>
>>> Thanks!
>>> -jason
>>>
>>> On Dec 4, 2007, at 3:51 AM, Albert Vilella wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is a new wrapper in bioperl-run for SLR:
>>>>
>>>> http://www.bioperl.org/wiki/SLR
>>>>
>>>> Right now, output parsing is very simple, and I have only tested  
>>>> it on
>>>> my linux machine.
>>>> Can someone with a Mac give it a try?
>>>>
>>>> update your bioperl-run to cvs head, then:
>>>>
>>>> # try the installer, SLR is option 6
>>>> perl scripts/bioperl_application_installer.PLS
>>>> # then try to run the tests (should take about a minute)
>>>> perl t/SLR.t
>>>>
>>>> Any comments on the code would be appreciated,
>>>>
>>>> Thanks in advance,
>>>>
>>>> Cheers,
>>>>
>>>>     Albert.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>


From stefan.kirov at bms.com  Tue Dec  4 21:51:51 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 04 Dec 2007 16:51:51 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4755CBF7.5010709@bms.com>

Jason Stajich wrote:
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
Yes, this is the version I have and in some cases the sequences do not
get parsed. I have missed this commit. I will try to assemble a testcase
and send it. Cannot promise when but will try to do it tomorrow. My gut
feeling so far is that the parser works whenever there are gaps in the
alignment, otherwise it does not. PAML surely has very peculiar format.
Thanks again!
Stefan
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From jason at bioperl.org  Tue Dec  4 21:36:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 4 Dec 2007 13:36:09 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4755A9A1.2040608@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
Message-ID: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>

should be fixed.

$ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
revision 1.56
date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
Parsing PAML4 and PAML3.15 should work now.  Dealing with variable  
order for the sequences and summary results in
the top of the MLC files

On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:

> Jason Stajich wrote:
>> PAML4 breaks our PAML parser right now because the order of things in
>> the result file has changed.  Now sequences precede the information
>> about the version or the program run.  This means that $result-
>>> get_seqs() fails because we don't parse the sequences.
>>
>> We'll see what we can do, but as usual with supporting 3rd party
>> programs it is brittle when file formats change.  Th
>>
>> -jason
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> Jason,
> I saw a commit after this post on codeml, but not on PAML.pm- I assume
> this is not fixed, am I correct?
> Thanks!
> Stefan


From johan.nilsson at sh.se  Wed Dec  5 11:35:58 2007
From: johan.nilsson at sh.se (Johan Nilsson)
Date: Wed, 5 Dec 2007 12:35:58 +0100
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
Message-ID: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>


Hello,

I have a bunch of multiple sequence alignments of protein coding genes,
which I would like to analyse with the SLAC method of the HyPhy package. I
tried using the SLAC.pm module in bioperl-run, but I could not get it to
work properly.

Basically, for each MSA file, I create the Bio::Tree::Tree and
Bio::SimpleAlign objects ($tree and $aln, respectively) required as
arguments to SLAC, and call the method with: "($rc,$result) =
$slac->run($aln,$tree)" in a loop procedure in my script.

When I choose not to save the tmp files (the default option in SLAC.pm),
the program complains that it cannot find the file
"$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
(which works fine). Apparently, it looks for the wrapper.bf file in the
first tmp dir created, which is deleted in the end of the first SLAC call.

If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
all calls to SLAC give returncode 1, and no error message is received.
However, when I look at the resulting $result hashref, it turns out that
all results are for the FIRST alignment read. I've made sure there is
nothing strange with my loop procedure, and I checked that the tree and
alignment objects look OK for each MSA. Apparently, it does create new
"results.tsv" files in the tmp directory after each run, but it is
identical each time it's created. Also, it only creates ONE tmp directory,
no matter how many times SLAC is executed (I would imagine it was supposed
to save each result in separate tmp dirs?)

Thus, it seems to me like the errors occur because something goes wrong in
the creation of temporary files. Have I done something wrong here, or have
any other of you experienced the same problem?

Best regards
/Johan


--
Johan Nilsson, Ph.D.
School of Life Sciences
S?dert?rns University College
S-141 89 Huddinge, Sweden
E-mail: johan.nilsson at sh.se
Phone: +46 8 608 47 05, +46 70 456 10 51


From bernd.web at gmail.com  Wed Dec  5 13:10:04 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Wed, 5 Dec 2007 14:10:04 +0100
Subject: [Bioperl-l] SimpleAlign is_flush
Message-ID: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>

Hi,

SimpleAlign has an is_flush:
 Function  : Tells you whether the alignment is flush, i.e. all of the
same length
 Returns   : 1 or 0

I  noticed that a file with multiple fasta sequences with different
lengths has an is_flush  value of 1. Printing the "alignment" shows
that sequences are appended with "-" so that the all are the same
length. Does this mean that is_flush for alignments read in via
AlignIO is indeed always true and thus as such a so useful ?

(using bioperl version: 1.005002102)


Regards,
Bernd


From cjfields at uiuc.edu  Wed Dec  5 13:53:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 07:53:59 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
References: <716af09c0712050510h62aa106cla7011a75c93091a5@mail.gmail.com>
Message-ID: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>

Yes; it's a convenient way to make sure all seqs have the same length  
(including gaps).  Nice for checking when adding new seqs to an  
alignment or building new parsers.

chris

On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:

> Hi,
>
> SimpleAlign has an is_flush:
> Function  : Tells you whether the alignment is flush, i.e. all of the
> same length
> Returns   : 1 or 0
>
> I  noticed that a file with multiple fasta sequences with different
> lengths has an is_flush  value of 1. Printing the "alignment" shows
> that sequences are appended with "-" so that the all are the same
> length. Does this mean that is_flush for alignments read in via
> AlignIO is indeed always true and thus as such a so useful ?
>
> (using bioperl version: 1.005002102)
>
>
> Regards,
> Bernd
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From captainrave at hotmail.com  Wed Dec  5 12:37:02 2007
From: captainrave at hotmail.com (Captainrave)
Date: Wed, 5 Dec 2007 04:37:02 -0800 (PST)
Subject: [Bioperl-l] extracting CDS location from Genbank
In-Reply-To: <475574B3.8050700@sendu.me.uk>
References: <14148723.post@talk.nabble.com>
	<8975119BCD0AC5419D61A9CF1A923E9505A4F76E@iahce2ksrv1.iah.bbsrc.ac.uk>
	<14152264.post@talk.nabble.com> <475574B3.8050700@sendu.me.uk>
Message-ID: <14170499.post@talk.nabble.com>


Thanks, it works great now.

Do any of you know if there is a tag to pull out CDS location. i.e. the
values such as 132...145 etc?  Those are all I need.  Also, is there anyway
to stop it reporting tag and value and literally JUST output the value?

Thanks!!!
-- 
View this message in context: http://www.nabble.com/extracting-CDS-location-from-Genbank-tf4942483.html#a14170499
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From stefan.kirov at bms.com  Wed Dec  5 14:24:20 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:24:20 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
Message-ID: <4756B494.7020100@bms.com>

Jason,
When there is a gapless alignment we have a differently formatted output
from codeml:
kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc

seed used = 492211105
      3    141

ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA

And parsing this fails...
The next one has gaps and works fine:

kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc

seed used = 492252697

Before deleting alignment gaps
      2    162

ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
CTT GGT TCA GGA GGT CAG TTC CTG
ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
CCT GGT ACA GGA AAC AAG CTT CTG

I will send both whole files as an attachment with another mail (I do
not know if these are going to pass through).
My guess is that the whole _parse_summary method has to be re-worked as
there is no tag to look for before the sequences start. Ugly.
I am not sure what else could become broken if I try to fix it, so I
will leave it to you.
Stefan
> should be fixed.
>
> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
> revision 1.56
> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
> order for the sequences and summary results in
> the top of the MLC files
>
> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>
>> Jason Stajich wrote:
>>> PAML4 breaks our PAML parser right now because the order of things in
>>> the result file has changed.  Now sequences precede the information
>>> about the version or the program run.  This means that $result-
>>>> get_seqs() fails because we don't parse the sequences.
>>>
>>> We'll see what we can do, but as usual with supporting 3rd party
>>> programs it is brittle when file formats change.  Th
>>>
>>> -jason
>>>
>>> -- 
>>> Jason Stajich
>>> jason at bioperl.org
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> Jason,
>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>> this is not fixed, am I correct?
>> Thanks!
>> Stefan
>
>


From stefan.kirov at bms.com  Wed Dec  5 14:35:23 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 09:35:23 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4756B72B.6000103@bms.com>

Here are the files.
Stefan
Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mlc.tar.gz
Type: application/x-gzip
Size: 3237 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20071205/bd77cde1/attachment-0004.gz>

From aaron.j.mackey at gsk.com  Wed Dec  5 14:56:31 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Wed, 5 Dec 2007 09:56:31 -0500
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <9E4F2A25-ACDE-4BFD-9026-FDF7251B588B@uiuc.edu>
Message-ID: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>

Well, if you use AlignIO::fasta to read in a multi-fasta file of 
*unaligned* sequences, AlignIO::fasta makes the assumption that all of 
your sequences are aligned, and pads the ends of shorter sequences with 
gap characters (essentially, enforcing a rather silly, yet valid 
alignment).  The fact that is_flush() then returns 1 is secondary.

If you just want to read in an array of unaligned sequences, use 
SeqIO::fasta instead.  It doesn't really make much sense to use AlignIO 
for sequences that are not aligned ... conversely, if you *do* have 
aligned sequences in a multi-fasta file, then it does make sense to use 
AlignIO, and it also makes sense for AlignIO::fasta to end-pad sequences 
with gaps as necessary to get a fully valid, flush multiple sequence 
alignment matrix.

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:

> Yes; it's a convenient way to make sure all seqs have the same length 
> (including gaps).  Nice for checking when adding new seqs to an 
> alignment or building new parsers.
> 
> chris
> 
> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
> 
> > Hi,
> >
> > SimpleAlign has an is_flush:
> > Function  : Tells you whether the alignment is flush, i.e. all of the
> > same length
> > Returns   : 1 or 0
> >
> > I  noticed that a file with multiple fasta sequences with different
> > lengths has an is_flush  value of 1. Printing the "alignment" shows
> > that sequences are appended with "-" so that the all are the same
> > length. Does this mean that is_flush for alignments read in via
> > AlignIO is indeed always true and thus as such a so useful ?
> >
> > (using bioperl version: 1.005002102)
> >
> >
> > Regards,
> > Bernd
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Dec  5 16:22:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Dec 2007 10:22:01 -0600
Subject: [Bioperl-l] SimpleAlign is_flush
In-Reply-To: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
References: <OF9A252048.A821FA6A-ON852573A8.0051A1FE-852573A8.0052140C@gsk.com>
Message-ID: <EC064917-220F-4579-8FA9-934026D7D105@uiuc.edu>

That's true.  I assumed Bernd's seqs were aligned.

chris

On Dec 5, 2007, at 8:56 AM, aaron.j.mackey at gsk.com wrote:

> Well, if you use AlignIO::fasta to read in a multi-fasta file of
> *unaligned* sequences, AlignIO::fasta makes the assumption that all of
> your sequences are aligned, and pads the ends of shorter sequences  
> with
> gap characters (essentially, enforcing a rather silly, yet valid
> alignment).  The fact that is_flush() then returns 1 is secondary.
>
> If you just want to read in an array of unaligned sequences, use
> SeqIO::fasta instead.  It doesn't really make much sense to use  
> AlignIO
> for sequences that are not aligned ... conversely, if you *do* have
> aligned sequences in a multi-fasta file, then it does make sense to  
> use
> AlignIO, and it also makes sense for AlignIO::fasta to end-pad  
> sequences
> with gaps as necessary to get a fully valid, flush multiple sequence
> alignment matrix.
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 12/05/2007 08:53:59 AM:
>
>> Yes; it's a convenient way to make sure all seqs have the same length
>> (including gaps).  Nice for checking when adding new seqs to an
>> alignment or building new parsers.
>>
>> chris
>>
>> On Dec 5, 2007, at 7:10 AM, Bernd Web wrote:
>>
>>> Hi,
>>>
>>> SimpleAlign has an is_flush:
>>> Function  : Tells you whether the alignment is flush, i.e. all of  
>>> the
>>> same length
>>> Returns   : 1 or 0
>>>
>>> I  noticed that a file with multiple fasta sequences with different
>>> lengths has an is_flush  value of 1. Printing the "alignment" shows
>>> that sequences are appended with "-" so that the all are the same
>>> length. Does this mean that is_flush for alignments read in via
>>> AlignIO is indeed always true and thus as such a so useful ?
>>>
>>> (using bioperl version: 1.005002102)
>>>
>>>
>>> Regards,
>>> Bernd
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Dec  5 19:56:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 14:56:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4756B494.7020100@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com>
Message-ID: <4757027F.407@bms.com>

Here is a patch that seems to be working and does not break the existing
tests:

--- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
10:16:53.120720000 -0500
+++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm  
2007-12-05 14:46:31.436278000 -0500
@@ -419,7 +419,10 @@
     # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
 
     my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
YN00 )x;
+    my $line;
+    $self->{'_already_parsed_seqs'}=$self->{'_already_parsed_seqs'}?1:0;
     while ($_ = $self->_readline) {
+           $line++;
        if ( m/^($SEQTYPES) \s+                      # seqtype: CODONML,
AAML, BASEML, CODON2AAML, YN00, etc
               (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
3.12 February 2002"; not present < 3.1 or YN00
               (\S+) \s*                             # tree filename
@@ -436,8 +439,11 @@
        } elsif (m/^Data set \d$/) {
            $self->{'_summary'} = {};
            $self->{'_summary'}->{'multidata'}++;
-       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
-           my ($phylip_header) = $self->_readline;
+       }
+       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
+               my ($phylip_header) = $self->_readline;
+               $self->_parse_seqs;
+       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1)) {#No gap
            $self->_parse_seqs;
        }
     }
@@ -681,7 +687,6 @@
 }
 
 sub _parse_seqs {
-
     # this should in fact be packed into a Bio::SimpleAlign object
instead of
     # an array but we'll stay with this for now
     my ($self) = @_;


What this does is trigger sequence parsing if the /Before.../ pattern is
not seen until line 4. Since phylip_header seems to be doing nothing one
could completely eliminate the first seq parse elsif (even though
counting lines is not a good thing).
 Since I am not aware of all consequences of changing the sequence
parsing and I have no idea how extensive the tests are, I am not
committing anything, but feel free to use that if you wish.
Stefan

Stefan Kirov wrote:
> Jason,
> When there is a gapless alignment we have a differently formatted output
> from codeml:
> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>
> seed used = 492211105
>       3    141
>
> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC ACC CAC
> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGT CTG
> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC ACC CAC
> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC AGC CTG
> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC ACC CTC ATA
> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC ACC CAC
> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC AGC ATG
> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC ACC CTC ATA
>
> And parsing this fails...
> The next one has gaps and works fine:
>
> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>
> seed used = 492252697
>
> Before deleting alignment gaps
>       2    162
>
> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG GCA GAA
> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA CCG AAC
> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA GAT CTC
> CTT GGT TCA GGA GGT CAG TTC CTG
> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA GCA GAA
> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC CCA ACT
> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- --- --- ATT
> CCT GGT ACA GGA AAC AAG CTT CTG
>
> I will send both whole files as an attachment with another mail (I do
> not know if these are going to pass through).
> My guess is that the whole _parse_summary method has to be re-worked as
> there is no tag to look for before the sequences start. Ugly.
> I am not sure what else could become broken if I try to fix it, so I
> will leave it to you.
> Stefan
>   
>> should be fixed.
>>
>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>> revision 1.56
>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines: +21 -14
>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>> order for the sequences and summary results in
>> the top of the MLC files
>>
>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>
>>     
>>> Jason Stajich wrote:
>>>       
>>>> PAML4 breaks our PAML parser right now because the order of things in
>>>> the result file has changed.  Now sequences precede the information
>>>> about the version or the program run.  This means that $result-
>>>>         
>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>           
>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>> programs it is brittle when file formats change.  Th
>>>>
>>>> -jason
>>>>
>>>> -- 
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>         
>>> Jason,
>>> I saw a commit after this post on codeml, but not on PAML.pm- I assume
>>> this is not fixed, am I correct?
>>> Thanks!
>>> Stefan
>>>       
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From jason at bioperl.org  Wed Dec  5 20:01:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Dec 2007 12:01:29 -0800
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <4757027F.407@bms.com>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
Message-ID: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>

sounds good - can you
- make it as a bug with the patch and sample files in bugzilla
- commit changes and I'll test as well

thanks,
-j

On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:

> Here is a patch that seems to be working and does not break the  
> existing
> tests:
>
> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
> 10:16:53.120720000 -0500
> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
> 2007-12-05 14:46:31.436278000 -0500
> @@ -419,7 +419,10 @@
>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>
>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
> YN00 )x;
> +    my $line;
> +    $self->{'_already_parsed_seqs'}=$self-> 
> {'_already_parsed_seqs'}?1:0;
>      while ($_ = $self->_readline) {
> +           $line++;
>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
> CODONML,
> AAML, BASEML, CODON2AAML, YN00, etc
>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
> 3.12 February 2002"; not present < 3.1 or YN00
>                (\S+) \s*                             # tree filename
> @@ -436,8 +439,11 @@
>         } elsif (m/^Data set \d$/) {
>             $self->{'_summary'} = {};
>             $self->{'_summary'}->{'multidata'}++;
> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
> -           my ($phylip_header) = $self->_readline;
> +       }
> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
> +               my ($phylip_header) = $self->_readline;
> +               $self->_parse_seqs;
> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
> {#No gap
>             $self->_parse_seqs;
>         }
>      }
> @@ -681,7 +687,6 @@
>  }
>
>  sub _parse_seqs {
> -
>      # this should in fact be packed into a Bio::SimpleAlign object
> instead of
>      # an array but we'll stay with this for now
>      my ($self) = @_;
>
>
> What this does is trigger sequence parsing if the /Before.../  
> pattern is
> not seen until line 4. Since phylip_header seems to be doing  
> nothing one
> could completely eliminate the first seq parse elsif (even though
> counting lines is not a good thing).
>  Since I am not aware of all consequences of changing the sequence
> parsing and I have no idea how extensive the tests are, I am not
> committing anything, but feel free to use that if you wish.
> Stefan
>
> Stefan Kirov wrote:
>> Jason,
>> When there is a gapless alignment we have a differently formatted  
>> output
>> from codeml:
>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>
>> seed used = 492211105
>>       3    141
>>
>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>> ACC CAC
>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGT CTG
>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>> ACC CAC
>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>> AGC CTG
>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>> ACC CTC ATA
>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>> ACC CAC
>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>> AGC ATG
>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>> ACC CTC ATA
>>
>> And parsing this fails...
>> The next one has gaps and works fine:
>>
>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>
>> seed used = 492252697
>>
>> Before deleting alignment gaps
>>       2    162
>>
>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>> GCA GAA
>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>> CCG AAC
>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>> GAT CTC
>> CTT GGT TCA GGA GGT CAG TTC CTG
>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>> GCA GAA
>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>> CCA ACT
>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>> --- ATT
>> CCT GGT ACA GGA AAC AAG CTT CTG
>>
>> I will send both whole files as an attachment with another mail (I do
>> not know if these are going to pass through).
>> My guess is that the whole _parse_summary method has to be re- 
>> worked as
>> there is no tag to look for before the sequences start. Ugly.
>> I am not sure what else could become broken if I try to fix it, so I
>> will leave it to you.
>> Stefan
>>
>>> should be fixed.
>>>
>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>> revision 1.56
>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>> +21 -14
>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>> order for the sequences and summary results in
>>> the top of the MLC files
>>>
>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>
>>>
>>>> Jason Stajich wrote:
>>>>
>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>> things in
>>>>> the result file has changed.  Now sequences precede the  
>>>>> information
>>>>> about the version or the program run.  This means that $result-
>>>>>
>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>
>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>> programs it is brittle when file formats change.  Th
>>>>>
>>>>> -jason
>>>>>
>>>>> -- 
>>>>> Jason Stajich
>>>>> jason at bioperl.org
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>> Jason,
>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>> assume
>>>> this is not fixed, am I correct?
>>>> Thanks!
>>>> Stefan
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From stefan.kirov at bms.com  Wed Dec  5 20:33:47 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 05 Dec 2007 15:33:47 -0500
Subject: [Bioperl-l] PAML/Codeml parsing
In-Reply-To: <8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
References: <FD74BB3A-C8F7-453E-915E-FD5541CE59CB@bioperl.org>
	<4755A9A1.2040608@bms.com>
	<A9B43240-2601-4C3E-9870-F32A6918A657@bioperl.org>
	<4756B494.7020100@bms.com> <4757027F.407@bms.com>
	<8562ED51-7DEC-4EB2-AC3F-A14C6497E0A2@bioperl.org>
Message-ID: <47570B2B.5090602@bms.com>

Done.

Jason Stajich wrote:
> sounds good - can you
> - make it as a bug with the patch and sample files in bugzilla
> - commit changes and I'll test as well
>
> thanks,
> -j
>
> On Dec 5, 2007, at 11:56 AM, Stefan Kirov wrote:
>
>   
>> Here is a patch that seems to be working and does not break the  
>> existing
>> tests:
>>
>> --- /home/kirovs/bioperl-live/Bio/Tools/Phylo/PAML.pm   2007-12-05
>> 10:16:53.120720000 -0500
>> +++ /home/kirovs/bioperl/bioperl-live/Bio/Tools/Phylo/PAML.pm
>> 2007-12-05 14:46:31.436278000 -0500
>> @@ -419,7 +419,10 @@
>>      # CODONML (in paml 3.12 February 2002)  <<-- what we want to see!
>>
>>      my $SEQTYPES = qr( (?: (?: CODON | AA | BASE | CODON2AA ) ML ) |
>> YN00 )x;
>> +    my $line;
>> +    $self->{'_already_parsed_seqs'}=$self-> 
>> {'_already_parsed_seqs'}?1:0;
>>      while ($_ = $self->_readline) {
>> +           $line++;
>>         if ( m/^($SEQTYPES) \s+                      # seqtype:  
>> CODONML,
>> AAML, BASEML, CODON2AAML, YN00, etc
>>                (?: \(in \s+ ([^\)]+?) \s* \) \s* )?  # version: "paml
>> 3.12 February 2002"; not present < 3.1 or YN00
>>                (\S+) \s*                             # tree filename
>> @@ -436,8 +439,11 @@
>>         } elsif (m/^Data set \d$/) {
>>             $self->{'_summary'} = {};
>>             $self->{'_summary'}->{'multidata'}++;
>> -       } elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {
>> -           my ($phylip_header) = $self->_readline;
>> +       }
>> +       elsif( m/^Before\s+deleting\s+alignment\s+gaps/ ) {#Gap
>> +               my ($phylip_header) = $self->_readline;
>> +               $self->_parse_seqs;
>> +       } elsif (($line>3)&&($self->{'_already_parsed_seqs'}!=1))  
>> {#No gap
>>             $self->_parse_seqs;
>>         }
>>      }
>> @@ -681,7 +687,6 @@
>>  }
>>
>>  sub _parse_seqs {
>> -
>>      # this should in fact be packed into a Bio::SimpleAlign object
>> instead of
>>      # an array but we'll stay with this for now
>>      my ($self) = @_;
>>
>>
>> What this does is trigger sequence parsing if the /Before.../  
>> pattern is
>> not seen until line 4. Since phylip_header seems to be doing  
>> nothing one
>> could completely eliminate the first seq parse elsif (even though
>> counting lines is not a good thing).
>>  Since I am not aware of all consequences of changing the sequence
>> parsing and I have no idea how extensive the tests are, I am not
>> committing anything, but feel free to use that if you wish.
>> Stefan
>>
>> Stefan Kirov wrote:
>>     
>>> Jason,
>>> When there is a gapless alignment we have a differently formatted  
>>> output
>>> from codeml:
>>> kirovs at horta:~/AESIG> head -n 10 feJRfxQl8D/mlc
>>>
>>> seed used = 492211105
>>>       3    141
>>>
>>> ENSRNOE00000058637               GCG AGC AAG TGT GAC AGC CAT GGC  
>>> ACC CAC
>>> CTA GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGT CTG
>>> CAC AGT CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSMUSE00000366347               GCG AGC AAG TGT GAC AGC CAC GGC  
>>> ACC CAC
>>> CTG GCA GGT GTG GTC AGC GGC CGG GAT GCT GGT GTG GCC AAG GGC ACC  
>>> AGC CTG
>>> CAC AGC CTG CGT GTG CTC AAC TGT CAA GGG AAG GGC ACA GTC AGC GGC  
>>> ACC CTC ATA
>>> ENSE00001279150                  GCC AGC AAG TGT GAC AGT CAT GGC  
>>> ACC CAC
>>> CTG GCA GGG GTG GTC AGC GGC CGG GAT GCC GGC GTG GCC AAG GGT GCC  
>>> AGC ATG
>>> CGC AGC CTG CGC GTG CTC AAC TGC CAA GGG AAG GGC ACG GTT AGC GGC  
>>> ACC CTC ATA
>>>
>>> And parsing this fails...
>>> The next one has gaps and works fine:
>>>
>>> kirovs at horta:~/AESIG> head -n 10 4z6ZX7s1B6/mlc
>>>
>>> seed used = 492252697
>>>
>>> Before deleting alignment gaps
>>>       2    162
>>>
>>> ENSMUSE00000460297               AAT ATC GAT ACA TTT TAC AAG GAG  
>>> GCA GAA
>>> AAG AAG CTT ATA CAC GTG CTT GAG GGA GAC AGT CCC AAG TGG TCC ACA  
>>> CCG AAC
>>> AAA GAC CCC ACC CGA GAG CCC CAT GCA GCC TCC ACT TGC TGT GCT TCA  
>>> GAT CTC
>>> CTT GGT TCA GGA GGT CAG TTC CTG
>>> ENSE00000939192                  AAT ATT GAC ATA CTT TGC AAT GAA  
>>> GCA GAA
>>> AAC AAG CTT ATG CAT ATA CTG CAT GCA AAT GAT CCC AAG TGG TCC ACC  
>>> CCA ACT
>>> AAA GAC TGT ACT TCA GGG CCG TAC ACT GCT CAA ATC --- --- --- ---  
>>> --- ATT
>>> CCT GGT ACA GGA AAC AAG CTT CTG
>>>
>>> I will send both whole files as an attachment with another mail (I do
>>> not know if these are going to pass through).
>>> My guess is that the whole _parse_summary method has to be re- 
>>> worked as
>>> there is no tag to look for before the sequences start. Ugly.
>>> I am not sure what else could become broken if I try to fix it, so I
>>> will leave it to you.
>>> Stefan
>>>
>>>       
>>>> should be fixed.
>>>>
>>>> $ cvs log -r HEAD Bio/Tools/Phylo/PAML.pm
>>>> revision 1.56
>>>> date: 2007/11/01 14:52:56;  author: jason;  state: Exp;  lines:  
>>>> +21 -14
>>>> Parsing PAML4 and PAML3.15 should work now.  Dealing with variable
>>>> order for the sequences and summary results in
>>>> the top of the MLC files
>>>>
>>>> On Dec 4, 2007, at 11:25 AM, Stefan Kirov wrote:
>>>>
>>>>
>>>>         
>>>>> Jason Stajich wrote:
>>>>>
>>>>>           
>>>>>> PAML4 breaks our PAML parser right now because the order of  
>>>>>> things in
>>>>>> the result file has changed.  Now sequences precede the  
>>>>>> information
>>>>>> about the version or the program run.  This means that $result-
>>>>>>
>>>>>>             
>>>>>>> get_seqs() fails because we don't parse the sequences.
>>>>>>>
>>>>>>>               
>>>>>> We'll see what we can do, but as usual with supporting 3rd party
>>>>>> programs it is brittle when file formats change.  Th
>>>>>>
>>>>>> -jason
>>>>>>
>>>>>> -- 
>>>>>> Jason Stajich
>>>>>> jason at bioperl.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Jason,
>>>>> I saw a commit after this post on codeml, but not on PAML.pm- I  
>>>>> assume
>>>>> this is not fixed, am I correct?
>>>>> Thanks!
>>>>> Stefan
>>>>>
>>>>>           
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>       
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From bernd.web at gmail.com  Thu Dec  6 14:58:31 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 6 Dec 2007 15:58:31 +0100
Subject: [Bioperl-l] graphics - Panel
Message-ID: <716af09c0712060658t5504b377ob2d46adb85754284@mail.gmail.com>

Hi,

For map $segstart is available. This holds the left most start of the
feature (The left end of $ref displayed in the detailed view).
However, is it accessible also for track coderefs?
I'd like to access it in add_track, like
  -bgcolor => sub {
 				my $feature = shift;
                                my $start = $feature->segstart;			
                                 ....
                                 do something with the segstart
                                  },

I realize I can add a -tag which holds the left most start of by
segmented feature, and then get it out in from $feature, but I wonder
if the $segstart can also be accessed in the coderef some how.

Does someone know this?

Best regards,
Bernd


From georose at gmail.com  Thu Dec  6 15:28:24 2007
From: georose at gmail.com (geo rose)
Date: Thu, 6 Dec 2007 08:28:24 -0700
Subject: [Bioperl-l] getting sequences from external databank
Message-ID: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>

Hi Bioperl,

In the past, I have been able to retrieve sequences from an external
databank, but my scripts are not working anymore.
I am afraid that I may have broken my Bioperl installation while updating my
Fedora7 machine with yum update.

Below is an example of what happens.

The script is from
http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/node2.html and
it works.
(I used it on an older machine with Bioperl and MacOS Tiger)

__________________________________________________________________________________
#!/usr/bin/perl -w

use Bio::SeqIO;
use Bio::DB::GenBank;

$genBank = new Bio::DB::GenBank;  # This object knows how to talk to GenBank

my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by accession


my $seqOut = new Bio::SeqIO(-format => 'genbank');

$seqOut->write_seq($seq);


_________________________________________________________________________________________
This is the error I get
_________________________________________________________________________________________

[home at home Desktop]# perl final-seq-db-test1.pl
Bio::SeqIO: genbank cannot be found
Exception
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Failed to load module Bio::SeqIO::genbank. Weak references are not
implemented in the version of perl at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
BEGIN failed--compilation aborted at
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
Compilation failed in require at
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.

STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::Root::Root::_load_module
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
STACK: Bio::SeqIO::_load_format_module
/usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
STACK: Bio::DB::WebDBSeqI::get_seq_stream
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------

For more information about the SeqIO system please see the SeqIO docs.
This includes ways of checking for formats at compile time, not run time

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc AF060485 does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
STACK: final-seq-db-test1.pl:8
-----------------------------------------------------------
[home at home Desktop]# Use of uninitialized value in concatenation (.) or
string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/Util.pm
line 30.

[home at home Desktop]#


________________________________________________________________________________________


Before I mess things up further I thought I'd ask:
Can I fix this problem by reinstalling some part of Bioperl or Perl?

Thanks,

George


From barry.moore at genetics.utah.edu  Thu Dec  6 17:56:50 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 6 Dec 2007 10:56:50 -0700
Subject: [Bioperl-l] getting sequences from external databank
In-Reply-To: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
References: <54da06110712060728m2532c177s8da4fa22e2aee1e6@mail.gmail.com>
Message-ID: <B13872F3-4591-4FB6-B057-9215C5DA9059@genetics.utah.edu>

George,

This is a hideous little bug in Red Hat/Fedora installations of  
perl.  It's happened to me a couple time on upgrades, but it's always  
fixed with

perl -MCPAN -e shell
force install Scalar::Util

http://www.perlmonks.org/?node_id=460411

Barry

On Dec 6, 2007, at 8:28 AM, geo rose wrote:

> Hi Bioperl,
>
> In the past, I have been able to retrieve sequences from an external
> databank, but my scripts are not working anymore.
> I am afraid that I may have broken my Bioperl installation while  
> updating my
> Fedora7 machine with yum update.
>
> Below is an example of what happens.
>
> The script is from
> http://www.faculty.uaf.edu/ffnt/teaching/programming/bioperl/ 
> node2.html and
> it works.
> (I used it on an older machine with Bioperl and MacOS Tiger)
>
> ______________________________________________________________________ 
> ____________
> #!/usr/bin/perl -w
>
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $genBank = new Bio::DB::GenBank;  # This object knows how to talk  
> to GenBank
>
> my $seq = $genBank->get_Seq_by_acc('AF060485');  # get a record by  
> accession
>
>
> my $seqOut = new Bio::SeqIO(-format => 'genbank');
>
> $seqOut->write_seq($seq);
>
>
> ______________________________________________________________________ 
> ___________________
> This is the error I get
> ______________________________________________________________________ 
> ___________________
>
> [home at home Desktop]# perl final-seq-db-test1.pl
> Bio::SeqIO: genbank cannot be found
> Exception
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Failed to load module Bio::SeqIO::genbank. Weak references are  
> not
> implemented in the version of perl at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Species.pm line 91.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> BEGIN failed--compilation aborted at
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 172.
> Compilation failed in require at
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm line 425.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::Root::Root::_load_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:427
> STACK: Bio::SeqIO::_load_format_module
> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:555
> STACK: Bio::SeqIO::new /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm:376
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:458
> STACK: Bio::DB::NCBIHelper::get_Stream_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/NCBIHelper.pm:361
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:172
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
>
> For more information about the SeqIO system please see the SeqIO docs.
> This includes ways of checking for formats at compile time, not run  
> time
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc AF060485 does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/WebDBSeqI.pm:173
> STACK: final-seq-db-test1.pl:8
> -----------------------------------------------------------
> [home at home Desktop]# Use of uninitialized value in concatenation  
> (.) or
> string at /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/Scalar/ 
> Util.pm
> line 30.
>
> [home at home Desktop]#
>
>
> ______________________________________________________________________ 
> __________________
>
>
> Before I mess things up further I thought I'd ask:
> Can I fix this problem by reinstalling some part of Bioperl or Perl?
>
> Thanks,
>
> George
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Dec  6 23:58:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 7 Dec 2007 10:58:02 +1100
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU + avoid
	BLAST reload
In-Reply-To: <47545590.1000703@boekhoff.info>
References: <47545590.1000703@boekhoff.info>
Message-ID: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>

Sven,

> I just started working with Perl and BioPerl. I'm quite impressed what
> can be easily done with this module. Today I found that my second CPU
> ist not used, but the first one run's at 100%. I tried to include the
> "-a"-parameter, but I was not successful:

My experience agrees with you, in that "-a" does not seem to work with
the pre-compiled BLAST binaries you get from NCBI on a multi-core
system.

I'm not sure why, as "ldd blastall" shows it links against
"/lib64/tls/libpthread.so.0".

Any others have any ideas?

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University
--Tel +61 3 9905 9010


From lzhtom at hotmail.com  Fri Dec  7 04:25:42 2007
From: lzhtom at hotmail.com (zhihuali)
Date: Fri, 7 Dec 2007 04:25:42 +0000
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
Message-ID: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>


Hi netters,
 
I've installed BioSQL and bioperl-db, and successfully created and stored a persistent object:
 
use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',                             -user=>'annoymous',                             -dbname=>'bioseqdb');
 
my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -id=>"test1",                      -seq=>"AGCTAGCT",                      -version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj->create;$dbobj->commit;
 
It's successful because I found corresponding rows in the bioseqdb tables.
 
Now I want to retrieve the object back from the database. There's not much documents available and I've tried find_by_unique_key/primary_key but all failed. Maybe I didn't use them correctly. Could anyone give me an example as how to retrieve the stored Bio::Seq object?
 
Thanks a lot!
 
Zhihua Li
_________________________________________________________________
? Live Search ???????
http://www.live.com/?searchOnly=true


From Marc.Logghe at ablynx.com  Fri Dec  7 08:33:17 2007
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Dec 2007 09:33:17 +0100
Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
In-Reply-To: <BAY110-W786D73A90FA1B632776A9C7680@phx.gbl>
Message-ID: <03C512635899144083CADB0EE222018901216FA5@alpaca.lan.ablynx.com>

Hi,
The BOSC presentation of Hilmar is a very good way to start with.
Have a look at http://www.open-bio.org/bosc2003/slides/Persistent_Bioperl_BOSC03.pdf
Slide 18 for instance.
Regards,
Marc
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of zhihuali
> Sent: vrijdag 7 december 2007 5:26
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] How to retrieve a persistent object by bioperl-db?
> 
> 
> Hi netters,
> 
> I've installed BioSQL and bioperl-db, and successfully created and stored
> a persistent object:
> 
> use strict;use warnings;use Bio::Seq;use Bio::DB::BioDB;
> my $dbadp=Bio::DB::BioDB->new(-database=>'biosql',
> -user=>'annoymous',                             -dbname=>'bioseqdb');
> 
> my $seqobj=Bio::Seq->new(-accession_number=>"test",                      -
> id=>"test1",                      -seq=>"AGCTAGCT",                      -
> version=>1);my $dbobj=$dbadp->create_persistent($seqobj);$dbobj-
> >create;$dbobj->commit;
> 
> It's successful because I found corresponding rows in the bioseqdb tables.
> 
> Now I want to retrieve the object back from the database. There's not much
> documents available and I've tried find_by_unique_key/primary_key but all
> failed. Maybe I didn't use them correctly. Could anyone give me an example
> as how to retrieve the stored Bio::Seq object?
> 
> Thanks a lot!
> 
> Zhihua Li
> _________________________________________________________________
> ? Live Search ???????
> http://www.live.com/?searchOnly=true


From avilella at gmail.com  Fri Dec  7 10:32:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 7 Dec 2007 10:32:43 +0000
Subject: [Bioperl-l] Query about Hyphy wrapper module "SLAC.pm"
In-Reply-To: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
References: <OFBA70B0CA.66F02D44-ONC12573A8.003FB7B7-C12573A8.003FB7C0@sh.se>
Message-ID: <358f4d650712070232s3d9ed27xf1c5f17e2985bd90@mail.gmail.com>

Hi Johan,

It would be great if you could upload an example reproducible case:

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

Maybe simply doing a tar.gz of the directory with the sample files and
the script, and a simple
explanation on how to run it. If you have any special "env" vars
regarding tmp files, could you
specify those as well?

Thanks,

    Albert.

On Dec 5, 2007 11:35 AM, Johan Nilsson <johan.nilsson at sh.se> wrote:
>
> Hello,
>
> I have a bunch of multiple sequence alignments of protein coding genes,
> which I would like to analyse with the SLAC method of the HyPhy package. I
> tried using the SLAC.pm module in bioperl-run, but I could not get it to
> work properly.
>
> Basically, for each MSA file, I create the Bio::Tree::Tree and
> Bio::SimpleAlign objects ($tree and $aln, respectively) required as
> arguments to SLAC, and call the method with: "($rc,$result) =
> $slac->run($aln,$tree)" in a loop procedure in my script.
>
> When I choose not to save the tmp files (the default option in SLAC.pm),
> the program complains that it cannot find the file
> "$whatevertmpdir/wrapper.bf", and returns $rc=0 for all but the first MSA
> (which works fine). Apparently, it looks for the wrapper.bf file in the
> first tmp dir created, which is deleted in the end of the first SLAC call.
>
> If instead I choose to save the tempfiles ($slac->save_tempfiles('TRUE')),
> all calls to SLAC give returncode 1, and no error message is received.
> However, when I look at the resulting $result hashref, it turns out that
> all results are for the FIRST alignment read. I've made sure there is
> nothing strange with my loop procedure, and I checked that the tree and
> alignment objects look OK for each MSA. Apparently, it does create new
> "results.tsv" files in the tmp directory after each run, but it is
> identical each time it's created. Also, it only creates ONE tmp directory,
> no matter how many times SLAC is executed (I would imagine it was supposed
> to save each result in separate tmp dirs?)
>
> Thus, it seems to me like the errors occur because something goes wrong in
> the creation of temporary files. Have I done something wrong here, or have
> any other of you experienced the same problem?
>
> Best regards
> /Johan
>
>
> --
> Johan Nilsson, Ph.D.
> School of Life Sciences
> S?dert?rns University College
> S-141 89 Huddinge, Sweden
> E-mail: johan.nilsson at sh.se
> Phone: +46 8 608 47 05, +46 70 456 10 51
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From J.Hane at murdoch.edu.au  Mon Dec 10 07:31:17 2007
From: J.Hane at murdoch.edu.au (James Hane)
Date: Mon, 10 Dec 2007 16:31:17 +0900
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
Message-ID: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>

I've been trying to compile some bioperl based scripts for win32 using
perl2exe which have worked out really well - except I've noticed I
cannot compile Align::IO, Bio::Location::Simple or Bio::Location::Atomic
despite requiring perl2exe to include them.  Anyone have any suggestions
how to get these to compile?


From Kevin.M.Brown at asu.edu  Mon Dec 10 15:34:35 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 08:34:35 -0700
Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
In-Reply-To: <477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
References: <mailman.6533.1196225860.2694.bioperl-l@lists.open-bio.org>
	<477A8450F426E34DBD5B2E7C6FA82D54B59489@PLUTO.ad.murdoch.edu.au>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0B82@EX02.asurite.ad.asu.edu>

I use PAR to create exe's for windows users and it works fine with
bioperl. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of James Hane
> Sent: Monday, December 10, 2007 12:31 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Compiling bioperl with perl2exe for win32
> 
> I've been trying to compile some bioperl based scripts for win32 using
> perl2exe which have worked out really well - except I've noticed I
> cannot compile Align::IO, Bio::Location::Simple or 
> Bio::Location::Atomic
> despite requiring perl2exe to include them.  Anyone have any 
> suggestions
> how to get these to compile?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From Kevin.M.Brown at asu.edu  Mon Dec 10 18:23:01 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Dec 2007 11:23:01 -0700
Subject: [Bioperl-l] [StandAloneBLAST] Use more than one CPU +
	avoidBLAST reload
In-Reply-To: <a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
References: <47545590.1000703@boekhoff.info>
	<a79f6a4b0712061558m663fd1ces6bba9ae9d5602d67@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4041D0CAD@EX02.asurite.ad.asu.edu>

I use the -a option with blast all the time and it works, even on
multicore systems. 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Torsten Seemann
> Sent: Thursday, December 06, 2007 4:58 PM
> To: Sven Boekhoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] [StandAloneBLAST] Use more than one 
> CPU + avoidBLAST reload
> 
> Sven,
> 
> > I just started working with Perl and BioPerl. I'm quite 
> impressed what
> > can be easily done with this module. Today I found that my 
> second CPU
> > ist not used, but the first one run's at 100%. I tried to 
> include the
> > "-a"-parameter, but I was not successful:
> 
> My experience agrees with you, in that "-a" does not seem to work with
> the pre-compiled BLAST binaries you get from NCBI on a multi-core
> system.
> 
> I'm not sure why, as "ldd blastall" shows it links against
> "/lib64/tls/libpthread.so.0".
> 
> Any others have any ideas?
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> --Tel +61 3 9905 9010
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From nadav.denekamp at gmail.com  Wed Dec 12 13:29:18 2007
From: nadav.denekamp at gmail.com (Nadav Y. Denekamp)
Date: Wed, 12 Dec 2007 15:29:18 +0200
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
Message-ID: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>

Hello,

I am trying to retrieve a list of sequences from an indexed flast FASTA file. I tried to use the script bp_fetch.pl but I could only retrieve one sequence for one identifier. I am looking for a way to provide a list of accession numbers to a script and to retrieve the sequences. I don't have much experience with perl so I appologize if this question is very basic
thanks - Nadav


------------------------------------------------------------------------------------------------------------
Nadav Y. Denekamp, Ph.D.,
Israel Oceanographic and Limnological Research,
National Institute for Oceanography 
Tel-Shikmona, Haifa, 31080.
Tel: 972-4-8565259
Fax: 972-4-8511911
mobile: 972-50-2167318
Skype: nadavden
Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;

Visit the ?Sleeping Beauty? website: 
http://www.gmm.gu.se/SB


From biojoiner at gmail.com  Wed Dec 12 13:06:42 2007
From: biojoiner at gmail.com (=?GB2312?B?s8y35Q==?=)
Date: Wed, 12 Dec 2007 21:06:42 +0800
Subject: [Bioperl-l] problem_About_Bioperl_Installation
Message-ID: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>

Dear Admin:

    I have a computer which out of network service, but wanted to have
bioperl installed in it.
    I found the installation method all need net to link CPAN to get the
pakage needed, so is there some complete installation program for me to
install it in a net-isolated computer, or some other method to solve the
problom?
    Wait for your kindful answer.
     Thanks very much!

-- 

============================================================
??

?????????????HapMap?
???????????B?6??
???+86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/

***********************************************************************************************
Feng Cheng

Division of HapMap Project
Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
Beijing Airport Industrial Zone B-6, Beijing, 101318, China
Tel: +86-10-80481102/1176
E-mail: chengf at genomics.org.cn
http://www.big.ac.cn/
============================================================


From avilella at gmail.com  Wed Dec 12 14:50:16 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 12 Dec 2007 14:50:16 +0000
Subject: [Bioperl-l] problem_About_Bioperl_Installation
In-Reply-To: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
References: <e1a861900712120506y2120c90bp648b56d876d1849f@mail.gmail.com>
Message-ID: <358f4d650712120650u2ef40089ofe27725ea8497dd7@mail.gmail.com>

You can also download the tar.gz packages from the bioperl.org
website, and copy them to the computer. Then unpack
the tar.gzs, and update your PERL5LIB env var.

On Dec 12, 2007 1:06 PM, ?? <biojoiner at gmail.com> wrote:
> Dear Admin:
>
>     I have a computer which out of network service, but wanted to have
> bioperl installed in it.
>     I found the installation method all need net to link CPAN to get the
> pakage needed, so is there some complete installation program for me to
> install it in a net-isolated computer, or some other method to solve the
> problom?
>     Wait for your kindful answer.
>      Thanks very much!
>
> --
>
> ============================================================
> ??
>
> ?????????????HapMap?
> ???????????B?6??
> ???+86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
>
> ***********************************************************************************************
> Feng Cheng
>
> Division of HapMap Project
> Beijing Institute of Genomics, Chinese Academy of Sciences (CAS)
> Beijing Airport Industrial Zone B-6, Beijing, 101318, China
> Tel: +86-10-80481102/1176
> E-mail: chengf at genomics.org.cn
> http://www.big.ac.cn/
> ============================================================
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 12 15:22:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 12 Dec 2007 09:22:45 -0600
Subject: [Bioperl-l] Fetch sequences from a fasta file using a list of
	idenifiers
In-Reply-To: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
References: <001101c83cc3$00aa28e0$5b00000a@ESTHERLAB2>
Message-ID: <E95ADE14-FF71-4068-B958-60BD1EEEBF3C@uiuc.edu>

If you use Bio::Index::Fasta (which is what bp_index.pl uses for FASTA  
files) then you can write up your own script.  From 'perldoc  
Bio::Index::Fasta':

# Once the index is made it can accessed, either in the
# same script or a different one
use Bio::Index::Fasta;
use strict;

my $Index_File_Name = shift;
my $inx = Bio::Index::Fasta?>new(?filename => $Index_File_Name);
my $out = Bio::SeqIO?>new(?format => ?Fasta?,
                           ?fh => \*STDOUT);

foreach my $id (@ARGV) {
     my $seq = $inx?>fetch($id); # Returns Bio::Seq object
          $out?>write_seq($seq);
}

# or, alternatively
my $id;
my $seq = $inx?>get_Seq_by_id($id); # identical to fetch()


....

chris

On Dec 12, 2007, at 7:29 AM, Nadav Y. Denekamp wrote:

> Hello,
>
> I am trying to retrieve a list of sequences from an indexed flast  
> FASTA file. I tried to use the script bp_fetch.pl but I could only  
> retrieve one sequence for one identifier. I am looking for a way to  
> provide a list of accession numbers to a script and to retrieve the  
> sequences. I don't have much experience with perl so I appologize if  
> this question is very basic
> thanks - Nadav
>
>
> ------------------------------------------------------------------------------------------------------------
> Nadav Y. Denekamp, Ph.D.,
> Israel Oceanographic and Limnological Research,
> National Institute for Oceanography
> Tel-Shikmona, Haifa, 31080.
> Tel: 972-4-8565259
> Fax: 972-4-8511911
> mobile: 972-50-2167318
> Skype: nadavden
> Email: nadavd at ocean.org.il; nadav.denekamp at gmail.com;
>
> Visit the ?Sleeping Beauty? website:
> http://www.gmm.gu.se/SB
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From karchana at ibab.ac.in  Fri Dec 14 03:56:14 2007
From: karchana at ibab.ac.in (Information_details)
Date: Thu, 13 Dec 2007 19:56:14 -0800 (PST)
Subject: [Bioperl-l]  How to get the contents?
Message-ID: <14329679.post@talk.nabble.com>


Hi,

I am new to bioperl.

I am using module  Bio::SeqIO;

I have genbank file. http://www.nabble.com/file/p14329679/seq.gb seq.gb 

In this file i have to match gene tag and get all its contents.

which function i have to use?

The gene portion look like this

 gene            1..485
                     /gene="PRM1"
                     /note="Derived by automated computational analysis
using
                     gene prediction method: BestRefseq. Supporting evidence
                     includes similarity to: 1 mRNA"
                     /db_xref="GeneID:5619"
                     /db_xref="HGNC:9447"

i have to match gene tag and get its contents?

[CODE]
$seq=$seqobj->next_seq();

foreach $feat ($seq->get_all_SeqFeatures())
 {
        if($feat->primary_tag eq "mRNA")
        {
                foreach $tag ($feat->get_all_tags())
                {
                        if($tag eq "gene")
                        {
                            #here i have to retrieve the information like
this.
                           1..485
                         /gene="PRM1"
                        }
                 }
         }
[/CODE]
How do i do that?  

with regards
Archana


-- 
View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From mike.thon at gmail.com  Fri Dec 14 17:41:44 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Fri, 14 Dec 2007 18:41:44 +0100
Subject: [Bioperl-l] How to get the contents?
In-Reply-To: <14329679.post@talk.nabble.com>
References: <14329679.post@talk.nabble.com>
Message-ID: <9F93893E-182A-4A5F-B27C-089521CAA355@gmail.com>

Hi Information_details, a.k.a. Archana :)

"1", and "485" can be retrieved with something like:

$feat->start();
$feat->end();

if you want start and end of each exon then you need:
my $location = $feat->location();

which returns a Bio::LocationI object.

I think the 'gene' tag is a tag-value pair that  can be retrieved with:

my @values = $feat->get_tag_values("gene");

-Mike


On Dec 14, 2007, at 4:56 AM, Information_details wrote:

>
> Hi,
>
> I am new to bioperl.
>
> I am using module  Bio::SeqIO;
>
> I have genbank file. http://www.nabble.com/file/p14329679/seq.gb  
> seq.gb
>
> In this file i have to match gene tag and get all its contents.
>
> which function i have to use?
>
> The gene portion look like this
>
> gene            1..485
>                     /gene="PRM1"
>                     /note="Derived by automated computational analysis
> using
>                     gene prediction method: BestRefseq. Supporting  
> evidence
>                     includes similarity to: 1 mRNA"
>                     /db_xref="GeneID:5619"
>                     /db_xref="HGNC:9447"
>
> i have to match gene tag and get its contents?
>
> [CODE]
> $seq=$seqobj->next_seq();
>
> foreach $feat ($seq->get_all_SeqFeatures())
> {
>        if($feat->primary_tag eq "mRNA")
>        {
>                foreach $tag ($feat->get_all_tags())
>                {
>                        if($tag eq "gene")
>                        {
>                            #here i have to retrieve the information  
> like
> this.
>                           1..485
>                         /gene="PRM1"
>                        }
>                 }
>         }
> [/CODE]
> How do i do that?
>
> with regards
> Archana
>
>
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-get-the-contents--tp14329679p14329679.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Dec 15 15:15:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 15 Dec 2007 09:15:00 -0600
Subject: [Bioperl-l] [ANNOUNCEMENT] CVS freeze
Message-ID: <9FE0873D-E009-42E6-B37A-32584655ED06@uiuc.edu>

All,

We are in the midst of switching over BioPerl from CVS to SVN.  We are  
tentatively freezing the bioperl CVS repository Dec. 19 in order to  
prepare for the switch.  At that time we plan on building and setting  
up the SVN repository, running some remedial tests (commit messages,  
etc), then announcing the switch on the list.  Soon after we will try  
getting a sync'ed read-only CVS set up for legacy purposes.

If anyone has any commits to add to the repository we suggest making  
them as soon as possible.

chris


From margots at mail.nih.gov  Tue Dec 18 15:00:11 2007
From: margots at mail.nih.gov (Margot Sunshine)
Date: Tue, 18 Dec 2007 15:00:11 +0000 (UTC)
Subject: [Bioperl-l] bio-perl cvs freeze
Message-ID: <loom.20071218T145502-552@post.gmane.org>

Hi,

I have been trying to checkout bio-perl from cvs since yesterday afternoon 
(Dec 17). My request just hangs. I can login but I cannot checkout anything. 
My reading of your posting of the planned switch from CVS to SVN seemed to 
indicate that this was not to take place until tomorrow. Help!

Thanks,
Margot Sunshine


From ste.ghi at libero.it  Tue Dec 18 18:04:21 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Tue, 18 Dec 2007 19:04:21 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>

Dear all,
  I'm facing with a really annoying problem regarding large files handling.
I wrote a script (below) which should keep sequences from an embl formatted file and write out the sequences in a customized fasta format. The script works, but since the input file is rather big 5.6 GB unzipped (987 MB zipped), after a while all the physical and virtual memories of my workstation (4GB RAM) are filled and the script is killed...
I really don't know how to avoid this huge memory usage...and now I'm wondering if this is the right approach....
Please help me!
Best wishes,
Stefano 


#################
#!/usr/bin/perl -w

use strict;

use warnings;

use Fcntl;
use Cwd;

use Bio::SeqIO;

my $infile = $ARGV[0];
my $outfile = "$ARGV[0].fasta";
my $organism;
my $count;
my $path = cwd()."/$outfile";

print "Working dir is: ".cwd().".\nCreating file: $path\n";

my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -format => 'EMBL');

while ( my $seq = $in->next_seq() ) {
	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);  
	my $id = $seq->accession_number();	
	my $desc = $seq->desc(); chop $desc;
	my $species = $seq->species->binomial();
	my $subspecies = $seq->species->sub_species();
	if ($seq->species->sub_species()) {chop $subspecies; $organism = $species." ".$subspecies;}
		else {$organism = $species;}
	my $sequence = $seq->seq();
	print TO ">$id $desc [$organism]\n$sequence\n";
    	$count++;
	warn $@ if $@;
	close TO;
}

print "Done!\n\t$count sequences have been treated. The file $ARGV[0].fasta is ready.\n";


From jason at bioperl.org  Tue Dec 18 18:22:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:22:07 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <loom.20071218T145502-552@post.gmane.org>
References: <loom.20071218T145502-552@post.gmane.org>
Message-ID: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>

Margot -
The code freeze won't affect the the anonymous cvs, and we'll likely  
keep anonymous CVS as is (and maybe even figure out how to keep it  
updated with the SVN) since external tools depend on it and have  
published CVS instructions.

I was able to do an anonymous checkout fine on my machine just now --  
if the problem persists please send a message to support at open-bio.org  
and the support volunteers will track it from there.

-jason
On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:

> Hi,
>
> I have been trying to checkout bio-perl from cvs since yesterday  
> afternoon
> (Dec 17). My request just hangs. I can login but I cannot checkout  
> anything.
> My reading of your posting of the planned switch from CVS to SVN  
> seemed to
> indicate that this was not to take place until tomorrow. Help!
>
> Thanks,
> Margot Sunshine
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Tue Dec 18 18:31:39 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 10:31:39 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>

Not exactly clear why you aren't using Bio::SeqIO to write the  
sequence back out in FASTA format and why you are re-opening the file  
each time?

Did you look at the examples that show how to convert file formats?
http://bioperl.org/wiki/HOWTO:SeqIO

You can set the description with
$seq->description($newdescription);
and the ID with
$seq->display_id($newid);
before writing.

It isn't clear to me from your code why it would be leaking memory  
and causing a problem - is it possible that you have a huge sequence  
in the EMBL file?

-jason
On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:

> Dear all,
>   I'm facing with a really annoying problem regarding large files  
> handling.
> I wrote a script (below) which should keep sequences from an embl  
> formatted file and write out the sequences in a customized fasta  
> format. The script works, but since the input file is rather big  
> 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> virtual memories of my workstation (4GB RAM) are filled and the  
> script is killed...
> I really don't know how to avoid this huge memory usage...and now  
> I'm wondering if this is the right approach....
> Please help me!
> Best wishes,
> Stefano
>
>
>
> #################
> #!/usr/bin/perl -w
>
> use strict;
>
> use warnings;
>
> use Fcntl;
> use Cwd;
>
> use Bio::SeqIO;
>
> my $infile = $ARGV[0];
> my $outfile = "$ARGV[0].fasta";
> my $organism;
> my $count;
> my $path = cwd()."/$outfile";
>
> print "Working dir is: ".cwd().".\nCreating file: $path\n";
>
> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');
>
> while ( my $seq = $in->next_seq() ) {
> 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> 	my $id = $seq->accession_number();	
> 	my $desc = $seq->desc(); chop $desc;
> 	my $species = $seq->species->binomial();
> 	my $subspecies = $seq->species->sub_species();
> 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> $species." ".$subspecies;}
> 		else {$organism = $species;}
> 	my $sequence = $seq->seq();
> 	print TO ">$id $desc [$organism]\n$sequence\n";
>     	$count++;
> 	warn $@ if $@;
> 	close TO;
> }
>
> print "Done!\n\t$count sequences have been treated. The file $ARGV 
> [0].fasta is ready.\n";
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cain.cshl at gmail.com  Tue Dec 18 19:04:11 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:04:11 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
Message-ID: <1198004651.11000.19.camel@frissell>

Hi Jason and all,

Does the fact that cvs is sticking around (read only) mean that viewcvs
(the web interface) will stick around too?  I was thinking about
modifying the GBrowse net installer to use the 'automatic' tarball of
bioperl-live to download and install via nmake on Windows since it
doesn't have cvs support built in.  Also, with cvs sticking around, I
don't need to rewrite the installer to use svn (yeah!).

Thanks,
Scott

On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> Margot -
> The code freeze won't affect the the anonymous cvs, and we'll likely  
> keep anonymous CVS as is (and maybe even figure out how to keep it  
> updated with the SVN) since external tools depend on it and have  
> published CVS instructions.
> 
> I was able to do an anonymous checkout fine on my machine just now --  
> if the problem persists please send a message to support at open-bio.org  
> and the support volunteers will track it from there.
> 
> -jason
> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> 
> > Hi,
> >
> > I have been trying to checkout bio-perl from cvs since yesterday  
> > afternoon
> > (Dec 17). My request just hangs. I can login but I cannot checkout  
> > anything.
> > My reading of your posting of the planned switch from CVS to SVN  
> > seemed to
> > indicate that this was not to take place until tomorrow. Help!
> >
> > Thanks,
> > Margot Sunshine
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From jason at bioperl.org  Tue Dec 18 19:20:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 18 Dec 2007 11:20:11 -0800
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <1198004651.11000.19.camel@frissell>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
Message-ID: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>


On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:

> Hi Jason and all,
>
> Does the fact that cvs is sticking around (read only) mean that  
> viewcvs
> (the web interface) will stick around too?  I was thinking about
> modifying the GBrowse net installer to use the 'automatic' tarball of
> bioperl-live to download and install via nmake on Windows since it
> doesn't have cvs support built in.  Also, with cvs sticking around, I
> don't need to rewrite the installer to use svn (yeah!).
>
Hey Scott -

Perhaps, there may be better tools with SVN anyways, we could also  
just instantiate a script that tarballed the already auto-updated  
code here (i think it syncs every hour):
http://bioperl.org/SRC/

We'll still playing around with this and I can't guarantee that we'll  
get the SVN commits back to CVS to work.

-jason
> Thanks,
> Scott
>
> On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
>> Margot -
>> The code freeze won't affect the the anonymous cvs, and we'll likely
>> keep anonymous CVS as is (and maybe even figure out how to keep it
>> updated with the SVN) since external tools depend on it and have
>> published CVS instructions.
>>
>> I was able to do an anonymous checkout fine on my machine just now --
>> if the problem persists please send a message to support at open-bio.org
>> and the support volunteers will track it from there.
>>
>> -jason
>> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
>>
>>> Hi,
>>>
>>> I have been trying to checkout bio-perl from cvs since yesterday
>>> afternoon
>>> (Dec 17). My request just hangs. I can login but I cannot checkout
>>> anything.
>>> My reading of your posting of the planned switch from CVS to SVN
>>> seemed to
>>> indicate that this was not to take place until tomorrow. Help!
>>>
>>> Thanks,
>>> Margot Sunshine
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD


From cain.cshl at gmail.com  Tue Dec 18 19:31:23 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 18 Dec 2007 14:31:23 -0500
Subject: [Bioperl-l] bio-perl cvs freeze
In-Reply-To: <4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
References: <loom.20071218T145502-552@post.gmane.org>
	<681FB463-13A5-4B35-923B-29A91F07D72B@bioperl.org>
	<1198004651.11000.19.camel@frissell>
	<4560525E-AE12-45BF-A174-3B8E3669C2B9@bioperl.org>
Message-ID: <1198006283.11000.20.camel@frissell>

Cool.  For the moment, I'll just wait and see what happens :-)

Thanks,
Scott

On Tue, 2007-12-18 at 11:20 -0800, Jason Stajich wrote:
> On Dec 18, 2007, at 11:04 AM, Scott Cain wrote:
> 
> > Hi Jason and all,
> >
> > Does the fact that cvs is sticking around (read only) mean that  
> > viewcvs
> > (the web interface) will stick around too?  I was thinking about
> > modifying the GBrowse net installer to use the 'automatic' tarball of
> > bioperl-live to download and install via nmake on Windows since it
> > doesn't have cvs support built in.  Also, with cvs sticking around, I
> > don't need to rewrite the installer to use svn (yeah!).
> >
> Hey Scott -
> 
> Perhaps, there may be better tools with SVN anyways, we could also  
> just instantiate a script that tarballed the already auto-updated  
> code here (i think it syncs every hour):
> http://bioperl.org/SRC/
> 
> We'll still playing around with this and I can't guarantee that we'll  
> get the SVN commits back to CVS to work.
> 
> -jason
> > Thanks,
> > Scott
> >
> > On Tue, 2007-12-18 at 10:22 -0800, Jason Stajich wrote:
> >> Margot -
> >> The code freeze won't affect the the anonymous cvs, and we'll likely
> >> keep anonymous CVS as is (and maybe even figure out how to keep it
> >> updated with the SVN) since external tools depend on it and have
> >> published CVS instructions.
> >>
> >> I was able to do an anonymous checkout fine on my machine just now --
> >> if the problem persists please send a message to support at open-bio.org
> >> and the support volunteers will track it from there.
> >>
> >> -jason
> >> On Dec 18, 2007, at 7:00 AM, Margot Sunshine wrote:
> >>
> >>> Hi,
> >>>
> >>> I have been trying to checkout bio-perl from cvs since yesterday
> >>> afternoon
> >>> (Dec 17). My request just hangs. I can login but I cannot checkout
> >>> anything.
> >>> My reading of your posting of the planned switch from CVS to SVN
> >>> seemed to
> >>> indicate that this was not to take place until tomorrow. Help!
> >>>
> >>> Thanks,
> >>> Margot Sunshine
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ---------------------------------------------------------------------- 
> > --
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From avilella at gmail.com  Tue Dec 18 20:33:43 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 18 Dec 2007 20:33:43 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<9CCCD509-EAFA-4528-B045-90910E19B41F@bioperl.org>
Message-ID: <358f4d650712181233q2a1627c3v6fb4e3e20b9f6c78@mail.gmail.com>

There is a Bio::SeqIO "largefasta" object that will use the hard-disk
for very large fasta files.

On Dec 18, 2007 6:31 PM, Jason Stajich <jason at bioperl.org> wrote:
> Not exactly clear why you aren't using Bio::SeqIO to write the
> sequence back out in FASTA format and why you are re-opening the file
> each time?
>
> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
>
> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
>
> It isn't clear to me from your code why it would be leaking memory
> and causing a problem - is it possible that you have a huge sequence
> in the EMBL file?
>
> -jason
>
> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
>
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files
> > handling.
> > I wrote a script (below) which should keep sequences from an embl
> > formatted file and write out the sequences in a customized fasta
> > format. The script works, but since the input file is rather big
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and
> > virtual memories of my workstation (4GB RAM) are filled and the
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> >       sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> >       my $id = $seq->accession_number();
> >       my $desc = $seq->desc(); chop $desc;
> >       my $species = $seq->species->binomial();
> >       my $subspecies = $seq->species->sub_species();
> >       if ($seq->species->sub_species()) {chop $subspecies; $organism =
> > $species." ".$subspecies;}
> >               else {$organism = $species;}
> >       my $sequence = $seq->seq();
> >       print TO ">$id $desc [$organism]\n$sequence\n";
> >       $count++;
> >       warn $@ if $@;
> >       close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Wed Dec 19 02:29:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 18 Dec 2007 20:29:19 -0600
Subject: [Bioperl-l] perl 5.10 released
Message-ID: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>

The next major perl release, perl 5.10, has officially been released:

http://use.perl.org/article.pl?sid=07/12/18/195247

I'll try testing BioPerl with perl 5.10 and any relevant modules when  
I can; this may have to wait until after SVN migration.  If there are  
any interested parties who want to bioperl compatibility with perl  
5.10 feel free to post your results!

chris


From David.Messina at sbc.su.se  Wed Dec 19 16:44:06 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 10:44:06 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
Message-ID: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>

Hi everyone,

Perl 5.10 builds fine and passes all tests on my PB G4 running OS X 10.5.1.
Piece o' cake.

Here are results of testing BioPerl on this virgin install:

I downloaded the latest CVS tarball. I did 'perl Build.PL', which used CPAN
to install a bunch of dependencies. I then did 'Build test'. For the most
part everything was fine.

- Bio::Biblio::IO::medlinexml throws an exception because XML::Parser isn't
installed.

- RNA_SearchIO fails a few tests.

- Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception because
Graph::Directed isn't installed.

- Spidey fails one test.

And of course without the optional dependencies installed, many tests were
skipped.

I'll now go back and install the optional dependencies and do the network
tests, but it looks like for the most part we play nice with the new Perl.

Dave


From ste.ghi at libero.it  Wed Dec 19 16:45:15 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Wed, 19 Dec 2007 17:45:15 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>

> Not exactly clear why you aren't using Bio::SeqIO to write the  
> sequence back out in FASTA format and why you are re-opening the file  
> each time?
It was to avoid tho keep the out file always opened...

> Did you look at the examples that show how to convert file formats?
> http://bioperl.org/wiki/HOWTO:SeqIO
yes I did...but I didn't realized how to set a customized description...

> You can set the description with
> $seq->description($newdescription);
> and the ID with
> $seq->display_id($newid);
> before writing.
Thanks for the hint. Anyway, just using the simple code reported to convert embl to fasta format, the results are the same...I remember you that I'm using a huge input file: the uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!

> It isn't clear to me from your code why it would be leaking memory  
> and causing a problem - is it possible that you have a huge sequence  
> in the EMBL file?
> -jason

At the end, I succeeded in the format conversion using this command:

gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
(/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
(/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'

(Thanks to Riccardo Percudani). It's not bioperl...but it works!

My best wishes,
Stefano


> On Dec 18, 2007, at 10:04 AM, Stefano Ghignone wrote:
> 
> > Dear all,
> >   I'm facing with a really annoying problem regarding large files  
> > handling.
> > I wrote a script (below) which should keep sequences from an embl  
> > formatted file and write out the sequences in a customized fasta  
> > format. The script works, but since the input file is rather big  
> > 5.6 GB unzipped (987 MB zipped), after a while all the physical and  
> > virtual memories of my workstation (4GB RAM) are filled and the  
> > script is killed...
> > I really don't know how to avoid this huge memory usage...and now  
> > I'm wondering if this is the right approach....
> > Please help me!
> > Best wishes,
> > Stefano
> >
> >
> >
> > #################
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > use warnings;
> >
> > use Fcntl;
> > use Cwd;
> >
> > use Bio::SeqIO;
> >
> > my $infile = $ARGV[0];
> > my $outfile = "$ARGV[0].fasta";
> > my $organism;
> > my $count;
> > my $path = cwd()."/$outfile";
> >
> > print "Working dir is: ".cwd().".\nCreating file: $path\n";
> >
> > my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> > format => 'EMBL');
> >
> > while ( my $seq = $in->next_seq() ) {
> > 	sysopen(TO, $path, O_WRONLY | O_APPEND | O_CREAT);
> > 	my $id = $seq->accession_number();	
> > 	my $desc = $seq->desc(); chop $desc;
> > 	my $species = $seq->species->binomial();
> > 	my $subspecies = $seq->species->sub_species();
> > 	if ($seq->species->sub_species()) {chop $subspecies; $organism =  
> > $species." ".$subspecies;}
> > 		else {$organism = $species;}
> > 	my $sequence = $seq->seq();
> > 	print TO ">$id $desc [$organism]\n$sequence\n";
> >     	$count++;
> > 	warn $@ if $@;
> > 	close TO;
> > }
> >
> > print "Done!\n\t$count sequences have been treated. The file $ARGV 
> > [0].fasta is ready.\n";
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Wed Dec 19 17:17:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:17:28 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
Message-ID: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>


On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:

>> Not exactly clear why you aren't using Bio::SeqIO to write the
>> sequence back out in FASTA format and why you are re-opening the file
>> each time?
> It was to avoid tho keep the out file always opened...
>
>> Did you look at the examples that show how to convert file formats?
>> http://bioperl.org/wiki/HOWTO:SeqIO
> yes I did...but I didn't realized how to set a customized  
> description...
>
>> You can set the description with
>> $seq->description($newdescription);
>> and the ID with
>> $seq->display_id($newid);
>> before writing.
> Thanks for the hint. Anyway, just using the simple code reported to  
> convert embl to fasta format, the results are the same...I remember  
> you that I'm using a huge input file: the  
> uniprot_trembl_bacteria.dat.gz...it contains 13101418 sequences!
>
>> It isn't clear to me from your code why it would be leaking memory
>> and causing a problem - is it possible that you have a huge sequence
>> in the EMBL file?
>> -jason
>
> At the end, I succeeded in the format conversion using this command:
>
> gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
>
> (Thanks to Riccardo Percudani). It's not bioperl...but it works!
>
> My best wishes,
> Stefano


As this shows, sometimes BioPerl isn't always the best answer (I know,  
blasphemy...).  As Jason suggested it's quite likely there are large  
sequence records causing your problems when using BioPerl.  The one- 
liner works b/c it doesn't retain data (sequence, annotation, etc) in  
memory as Bio::Seq object; it's a direct conversion.

It would be nice to code up a lazy sequence object and related  
parsers; maybe for the next dev release.

chris


From cjfields at uiuc.edu  Wed Dec 19 17:08:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 11:08:31 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
Message-ID: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>


On Dec 19, 2007, at 10:44 AM, Dave Messina wrote:

> Hi everyone,
>
>
> Perl 5.10 builds fine and passes all tests on my PB G4 running OS X  
> 10.5.1. Piece o' cake.
>
> Here are results of testing BioPerl on this virgin install:
>
> I downloaded the latest CVS tarball. I did 'perl Build.PL', which  
> used CPAN to install a bunch of dependencies. I then did 'Build  
> test'. For the most part everything was fine.
>
> - Bio::Biblio::IO::medlinexml throws an exception because  
> XML::Parser isn't installed.

XML::Parser used to be shipped with a number of perl distros even  
though it isn't core.  We should add a require to these.

> - RNA_SearchIO fails a few tests.

These are very likely from recent commits I made re:GenericHSP and use  
of bits(), raw_score(), etc. (the fails look like missing/switched  
vals with these method tests).  I'll fix these post-svn migration, but  
I don't think these are related to 5.10.

> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception  
> because Graph::Directed isn't installed.

Odd, that should be caught out before tests are run.  Needs to be  
fixed, but one would think this would fail as well under 5.8.

> - Spidey fails one test.

Passes for me.  Is it dependency-related?

> And of course without the optional dependencies installed, many  
> tests were skipped.
>
> I'll now go back and install the optional dependencies and do the  
> network tests, but it looks like for the most part we play nice with  
> the new Perl.
>
> Dave

Not sure, but it seems a bit faster.  Maybe it's just me but it would  
be nice to see some benchmarks comparing perl 5.8 vs 5.10.  I agree,  
it was a very fast and easy install.

I'll start a page on the wiki for test fails using perl 5.10.  I'm  
seeing a few fails;  I'm getting the following with everything  
installed (including DBD::mysql, DBI, etc) using perl 5.10, Mac OS X  
10.5.1 (note Test::Harness now gives TODO's, so some of these are  
actually passing).  Note the entrezgene.t and DB.t fails; I looked  
into these and I think they are related to the odd 'pseudohashes are  
deprecated' warnings we were getting in perl 5.8 tests, so there may  
be something legitimately buggy.

Test Summary Report
-------------------
t/Annotation.t                (Wstat: 0 Tests: 112 Failed: 0)
   TODO passed:   96
t/BioGraphics.t               (Wstat: 256 Tests: 35 Failed: 1)
   Failed test number(s):  4
   Non-zero exit status: 1
t/DB.t                        (Wstat: 65280 Tests: 106 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 116 tests but ran 106.
t/DBCUTG.t                    (Wstat: 1024 Tests: 33 Failed: 4)
   Failed test number(s):  29-31, 33
   Non-zero exit status: 4
t/RNA_SearchIO.t              (Wstat: 2048 Tests: 496 Failed: 8)
   Failed test number(s):  291, 338, 372-374, 395, 455, 486
   Non-zero exit status: 8
t/entrezgene.t                (Wstat: 65280 Tests: 648 Failed: 0)
   Non-zero exit status: 255
   Parse errors: Bad plan.  You planned 1422 tests but ran 648.
Files=255, Tests=15066, 435 wallclock secs ( 3.15 usr  1.72 sys +  
124.87 cusr 13.29 csys = 143.03 CPU)
Result: FAIL
Failed 5/255 test programs. 13/15066 subtests failed.


chris


From David.Messina at sbc.su.se  Wed Dec 19 17:49:32 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 11:49:32 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
Message-ID: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>

>
> XML::Parser used to be shipped with a number of perl distros even
> though it isn't core.  We should add a require to these.


Agreed.


> - RNA_SearchIO fails a few tests.
>
> These are very likely from recent commits I made re:GenericHSP and use
> of bits(), raw_score(), etc. (the fails look like missing/switched
> vals with these method tests).  I'll fix these post-svn migration, but
> I don't think these are related to 5.10.


Agreed -- I doubt this is 5.10-specific.


> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
> > because Graph::Directed isn't installed.
>
> Odd, that should be caught out before tests are run.  Needs to be
> fixed, but one would think this would fail as well under 5.8.


Yep, and in a minute here I'll test it under 5.8.


> > - Spidey fails one test.
>
> Passes for me.  Is it dependency-related?


I don't think so, but I guess we'll see once I finish installing the
dependencies. Here's what I got:

t/Spidey........................ok 1/26 Can't call method "sub_SeqFeature"
on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
# Looks like you planned 26 tests but only ran 3.
# Looks like your test died just after 3.
t/Spidey........................dubious

        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 4-26
        Failed 23/26 tests, 11.54% okay


Dave


From cjfields at uiuc.edu  Wed Dec 19 19:19:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 13:19:10 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
Message-ID: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>

Just updated from CVS and reran tests, Spidey.t is failing now.  This  
may be from a recent commit:

http://lists.open-bio.org/pipermail/bioperl-guts-l/2007-December/026854.html

I'm updating the following page on the wiki for tracking.  There are a  
few more we should look into at some point:

http://www.bioperl.org/w/index.php?title=Bioperl_and_Perl_5.10

chris

On Dec 19, 2007, at 11:49 AM, Dave Messina wrote:

>>
>> XML::Parser used to be shipped with a number of perl distros even
>> though it isn't core.  We should add a require to these.
>
>
> Agreed.
>
>
>> - RNA_SearchIO fails a few tests.
>>
>> These are very likely from recent commits I made re:GenericHSP and  
>> use
>> of bits(), raw_score(), etc. (the fails look like missing/switched
>> vals with these method tests).  I'll fix these post-svn migration,  
>> but
>> I don't think these are related to 5.10.
>
>
> Agreed -- I doubt this is 5.10-specific.
>
>
>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>> because Graph::Directed isn't installed.
>>
>> Odd, that should be caught out before tests are run.  Needs to be
>> fixed, but one would think this would fail as well under 5.8.
>
>
> Yep, and in a minute here I'll test it under 5.8.
>
>
>
>
>>> - Spidey fails one test.
>>
>> Passes for me.  Is it dependency-related?
>
>
> I don't think so, but I guess we'll see once I finish installing the
> dependencies. Here's what I got:
>
> t/Spidey........................ok 1/26 Can't call method  
> "sub_SeqFeature"
> on an undefined value at t/Spidey.t line 24, <GEN1> line 170.
> # Looks like you planned 26 tests but only ran 3.
> # Looks like your test died just after 3.
> t/Spidey........................dubious
>
>        Test returned status 255 (wstat 65280, 0xff00)
> DIED. FAILED tests 4-26
>        Failed 23/26 tests, 11.54% okay
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Wed Dec 19 23:42:14 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 17:42:14 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
Message-ID: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>

Hi Chris and everyone,

With most of the optional dependencies installed, I'm seeing  
essentially the same test failures, including the CODE ref thingy.  
I've noted this on the new Wiki page you created.

According to Data::Dumper's documentation,
Data::Dumper cheats with CODE references. If a code reference is  
encountered in the structure being processed (and if you haven't set  
theDeparse flag), an anonymous subroutine that contains the string  
'"DUMMY"' will be inserted in its place, and a warning will be printed  
if Purity is set. You can eval the result, but bear in mind that the  
anonymous sub that gets created is just a placeholder. Someday, perl  
will have a switch to cache-on-demand the string representation of a  
compiled piece of code, I hope. If you have prior knowledge of all the  
code refs that your data structures are likely to have, you can use  
the Seen method to pre-seed the internal reference table and make the  
dumped output point to them, instead. See EXAMPLES above.


So it's not BioPerl per se, but we can probably work around it.


>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>> because Graph::Directed isn't installed.
>>>
>>> Odd, that should be caught out before tests are run.  Needs to be
>>> fixed, but one would think this would fail as well under 5.8.
>>
>>
>> Yep, and in a minute here I'll test it under 5.8.


Strangely, the Ontology tests properly get skipped under 5.8.

Dave


From ki.baik at roche.com  Thu Dec 20 00:58:42 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 19 Dec 2007 16:58:42 -0800
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>

Hello,

 
I'm interested in parsing the output of the CAP contig assembly program
into a format that is more manageable. The CAP output is shown below:

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

            ____________________________________________________________

consensus   CTGGATGGGTTAATTTACTCCCATAAGAGAGCAGAAATCCTGGATCTCTGGATATATCAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

Seq2+       ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

            ____________________________________________________________

consensus   ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACACCGGGACCAGGACCTAGATTCC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

Seq2+       CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

            ____________________________________________________________

consensus   CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCAGCAGAAGAGGCAGAGAGAC

 
                .    :    .    :    .    :    .    :    .    :    .    :

Seq1+       TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

Seq2+       TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC

            ____________________________________________________________

consensus   TGGGTAATACAAATGAAGATGCTAGTCTTCTACATCCAGCTTGTAATCATGGAGCTGAGG

 
I would like to maintain the alignment with their base positions for
each sequence. A fasta format retaining the alignment position is ideal
such as below:

 
>Seq1+

CTGGATGGGTTAATTTACTCCCATAAGATTTTTGAAATCCTTAATTTACTGATATATCAC

ACTCTTAATTTACTCCCTGATTGG--CAGTGTTACACACCGGGACCAGGACCTAGATTCC

CACTGACATTTGGATGGTTAATTTACTCTTTTCCAGTGTCAGCAGAAGAGCGGGGGAGAC

TGGGTAATACAAACACTTTTCGGCGGCTTCTACATCCAGCTTGTTAATTTACTCTTTAGG

>Seq2+

------------------------------------------------------------

ACTCAGGGATTCTTCCCTGATTGGTTCAGTGTTACACTTTTGCGCCAGGACCTAGATTCC

CACTGACATTTGGATGGTTGTTTAAACTGGTACCAGTGTCCGCTCGCGGGGCAGAGAGAC

TGGGTAATACAAATGAAGATGTTTCCGGCCTACATCCAGCTTGTAATCATGC--------

 
Does anyone have any experience doing this?

 
Regards,

 
KB


From cjfields at uiuc.edu  Thu Dec 20 01:41:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 19:41:51 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
Message-ID: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>

On Dec 19, 2007, at 5:42 PM, Dave Messina wrote:

> Hi Chris and everyone,
>
> With most of the optional dependencies installed, I'm seeing  
> essentially the same test failures, including the CODE ref thingy.  
> I've noted this on the new Wiki page you created.
>
> According to Data::Dumper's documentation,
> Data::Dumper cheats with CODE references. If a code reference is  
> encountered in the structure being processed (and if you haven't set  
> theDeparse flag), an anonymous subroutine that contains the string  
> '"DUMMY"' will be inserted in its place, and a warning will be  
> printed if Purity is set. You can eval the result, but bear in mind  
> that the anonymous sub that gets created is just a placeholder.  
> Someday, perl will have a switch to cache-on-demand the string  
> representation of a compiled piece of code, I hope. If you have  
> prior knowledge of all the code refs that your data structures are  
> likely to have, you can use the Seen method to pre-seed the internal  
> reference table and make the dumped output point to them, instead.  
> See EXAMPLES above.
>
>
> So it's not BioPerl per se, but we can probably work around it.

May be something in Module::Build or Build.PL that needs tweaking.

It looks like EntrezGene parsing is broken for now using perl 5.10;  
the 'pseudohash' warnings with perl 5.8 were indicating something was  
amiss but we could never place it.  Any fixes will have to wait until  
after svn migration.  Not sure what's going on with the others fails  
just yet.

>>>> - Bio::Ontology::SimpleGOEngine::GraphAdaptor throws an exception
>>>>> because Graph::Directed isn't installed.
>>>>
>>>> Odd, that should be caught out before tests are run.  Needs to be
>>>> fixed, but one would think this would fail as well under 5.8.
>>>
>>>
>>> Yep, and in a minute here I'll test it under 5.8.
>
>
> Strangely, the Ontology tests properly get skipped under 5.8.
>
> Dave

May be worth looking into.  Have you added it to the wiki?

chris


From David.Messina at sbc.su.se  Thu Dec 20 04:52:16 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 22:52:16 -0600
Subject: [Bioperl-l] perl 5.10 released
In-Reply-To: <980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
References: <AF986160-D1AB-4AEF-9588-90861F5B7B98@uiuc.edu>
	<628aabb70712190844o17a40c2eva3ef863dc42afb6c@mail.gmail.com>
	<C7FF12CB-278A-4B47-8D04-F2F038C35AB2@uiuc.edu>
	<628aabb70712190949j30756b8ap97666f4962c2b83d@mail.gmail.com>
	<04AB8971-466D-4EEF-9A75-310ACDD224A6@uiuc.edu>
	<FC679851-77A7-4603-B722-4A6089333EE9@sbc.su.se>
	<980C0D1B-9E3F-4904-9CA1-8C672CED0B35@uiuc.edu>
Message-ID: <628aabb70712192052p5d9afe3bvf4fa1da872f56355@mail.gmail.com>

>
> May be something in Module::Build or Build.PL that needs tweaking.


I took a quick look-see and I'm pretty sure it's Module::Build.
Specifically, Module::Build::Base::write_config(), where there are three
calls with coderefs as parameters to _write_data() to match the three
coderef errors we are seeing at the end of 'perl Build.PL'.

_write_data() in turn calls Module::Build::Dumper::_data_dump() and uses
some ugly Data::Dumper voodoo to serialize.

I don't understand the voodoo well enough to explain why this appears only
with Perl 5.10, though; it sure looks like it should have with 5.8, too.


> Strangely, the Ontology tests properly get skipped under 5.8.
>
> May be worth looking into.  Have you added it to the wiki?


Uhhh, yeah...of course! (just now)

Should be a simple fix after the post-svn thaw.

Dave


From David.Messina at sbc.su.se  Thu Dec 20 05:39:41 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Wed, 19 Dec 2007 23:39:41 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
Message-ID: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>

Hi Ki,

Hopefully someone who (unlike me) uses these modules regularly will chime
in, but in the meantime, here are some ideas:

The Bio::AssemblyIO module can read and write ace files, which CAP3 can
produce as output. I don't think there is an explicit means to dump to a
multi-fasta file like you want.

But you could probably write a Bio::AssemblyIO::Fasta class which could
write the multi-Fasta format you want. Then you could use Bio::AssemblyIO
objects to read in ace files from CAP3 and write out to multi-fasta.

Look at

Bio::AssemblyIO::*
Bio::Assembly::ScaffoldI
Bio::Assembly::Contig
Bio::LocatableSeq
Bio::AlignIO

Assemblies are made of scaffolds, scaffolds are made of contigs, and contigs
are made of sequences which can be manipulated like any old seq in BioPerl.
Bio::AlignIO can read and write multiple sequence alignments and
multi-fastas, so that should help you to get from AssemblyIO to your desired
output format.


Hope this helps,
Dave


From mike.thon at gmail.com  Thu Dec 20 05:59:06 2007
From: mike.thon at gmail.com (Michael Thon)
Date: Thu, 20 Dec 2007 06:59:06 +0100
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
Message-ID: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>


On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:

> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> format => 'EMBL');

This is just for the sake of curiosity, since you already found a  
solution to your problem, but I wonder how perl will handle a file  
opened this way.  Will it try to suck the whole thing into ram in one  
go?

Mike


From cjfields at uiuc.edu  Thu Dec 20 05:54:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 19 Dec 2007 23:54:36 -0600
Subject: [Bioperl-l] Parsing CAP3 output to Fasta
In-Reply-To: <628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
References: <6D5431B47E46BD45AAA453432AD3B803027553B7@rpbmsem01.nala.roche.com>
	<628aabb70712192139q5e061428v56ed2ce8cf1f4851@mail.gmail.com>
Message-ID: <EB4F110F-9F12-4478-89C2-5DDF4FEF07C6@uiuc.edu>


On Dec 19, 2007, at 11:39 PM, Dave Messina wrote:

> Hi Ki,
>
> Hopefully someone who (unlike me) uses these modules regularly will  
> chime
> in, but in the meantime, here are some ideas:
>
> The Bio::AssemblyIO module can read and write ace files, which CAP3  
> can
> produce as output. I don't think there is an explicit means to dump  
> to a
> multi-fasta file like you want.
>
> But you could probably write a Bio::AssemblyIO::Fasta class which  
> could
> write the multi-Fasta format you want. Then you could use  
> Bio::AssemblyIO
> objects to read in ace files from CAP3 and write out to multi-fasta.
>
> Look at
>
> Bio::AssemblyIO::*
> Bio::Assembly::ScaffoldI
> Bio::Assembly::Contig
> Bio::LocatableSeq
> Bio::AlignIO
>
> Assemblies are made of scaffolds, scaffolds are made of contigs, and  
> contigs
> are made of sequences which can be manipulated like any old seq in  
> BioPerl.
> Bio::AlignIO can read and write multiple sequence alignments and
> multi-fastas, so that should help you to get from AssemblyIO to your  
> desired
> output format.
>
>
>
> Hope this helps,
> Dave

What would help is to make Bio::Assembly::Contig implement Bio::AlignI  
correctly, or make it a subclass of Bio::SimpleAlign.  That way one  
could read in Scaffolds in via Bio::Assembly::IO and write out Contigs  
through Bio::AlignIO directly.  In theory that should work but IIRC it  
doesn't.

chris


From jason at bioperl.org  Thu Dec 20 07:13:55 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Dec 2007 23:13:55 -0800
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
References: <JT9BJ9$83113BA715E1F7CF3B9D29BFFCC4B0CF@libero.it>
	<F23D8C0D-AE41-40B4-A30B-83AC59A7BDD8@gmail.com>
Message-ID: <02EC6D6D-F807-492F-B125-9FE0393B1FD9@bioperl.org>

It gets buffered via the OS -- Bio::Root::IO calls next_line  
iteratively, but eventually the whole sequence object will get put  
into RAM as it is built up.
zcat or bzcat can also be used for gzipped and bzipped files  
respectively, I like to use this where I want to disk space footprint  
down.

Because we treat data input usually as from a stream ignoring whether  
it is in a file or not, we have to have a more flexible structure to  
really handle this, although I'd argue the data really belongs in a  
database when it is too big for memory.
More compact Feature/Location objects would probably also help here.   
I would not be surprised if the memory requirement has more to do  
with the number of features than length of the sequence - human chrom  
1 can fit into memory just fine on most machines with 2GB of RAM.

But it would require someone taking an interest in some re- 
architecting here.

-jason

On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:

>
> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>
>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
>> format => 'EMBL');
>
> This is just for the sake of curiosity, since you already found a  
> solution to your problem, but I wonder how perl will handle a file  
> opened this way.  Will it try to suck the whole thing into ram in  
> one go?
>
> Mike
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ste.ghi at libero.it  Thu Dec 20 13:57:54 2007
From: ste.ghi at libero.it (Stefano Ghignone)
Date: Thu, 20 Dec 2007 14:57:54 +0100
Subject: [Bioperl-l] dealing with large files
Message-ID: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>

I was wandering if, working with so big FILE, should be better first index the database, than query it formatting the sequences as one want...

> It gets buffered via the OS -- Bio::Root::IO calls next_line  
> iteratively, but eventually the whole sequence object will get put  
> into RAM as it is built up.
> zcat or bzcat can also be used for gzipped and bzipped files  
> respectively, I like to use this where I want to disk space footprint  
> down.
> 
> Because we treat data input usually as from a stream ignoring whether  
> it is in a file or not, we have to have a more flexible structure to  
> really handle this, although I'd argue the data really belongs in a  
> database when it is too big for memory.
> More compact Feature/Location objects would probably also help here.   
> I would not be surprised if the memory requirement has more to do  
> with the number of features than length of the sequence - human chrom  
> 1 can fit into memory just fine on most machines with 2GB of RAM.
> 
> But it would require someone taking an interest in some re- 
> architecting here.
> 
> -jason
> 
> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
> 
> >
> > On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
> >
> >> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", - 
> >> format => 'EMBL');
> >
> > This is just for the sake of curiosity, since you already found a  
> > solution to your problem, but I wonder how perl will handle a file  
> > opened this way.  Will it try to suck the whole thing into ram in  
> > one go?
> >
> > Mike
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From amackey at pcbi.upenn.edu  Thu Dec 20 15:32:19 2007
From: amackey at pcbi.upenn.edu (Aaron Mackey)
Date: Thu, 20 Dec 2007 10:32:19 -0500
Subject: [Bioperl-l] BioPerl and NHX tree
In-Reply-To: <476A7736.109@toulouse.inra.fr>
References: <476A7736.109@toulouse.inra.fr>
Message-ID: <24c96eca0712200732q20523c1co1075c15d056ff634@mail.gmail.com>

The NHX writer will only add the [&&NHX] block when there are tags to
be written.  Your code reads in a Newick tree without tags, and then
writes it back out without adding any new tags.  So yes, you need to
1) read the Newick tree, 2) traverse the tree, calling
$node->nhx_tag({T => $taxon_id}) for each node with each corresponding
$taxon_id, and then 3) write out the NHX tree.

-Aaron

On Dec 20, 2007 9:07 AM, Laurence Amilhat
<Laurence.Amilhat at toulouse.inra.fr> wrote:
> Dear Mr MacKey,
>
>
> I am pretty new in Tree parsing and writing with BioPerl.
> I am trying to convert a Newick tree file to a NHX tree file with adding
> the Taxid for the node in the NHX tree file.
>
> I saw the module Bio::Tree::NodeNHX, but very few examples...
>
> I don't know where do i need to start, I tried the easy way with
> Bio::TreeIO,
> but the resulting tree doesn't have the [&&NHX] in the internal node,
> and I don't know how to add the tag [&&NHX:T=xxxx] on the node,
> Do I need to use the nhx_tag method to do this?
>
> Maybe you have an example that use NHX tag in tree node, that might be
> very helpfull for me to get to understand how it works...
>
>
> Have a nice holidays,
>
>
> Best regards,
>
>
> Laurence Amilhat.
>
>
>
>
> This is the simple code that I use to convert a tree from  newick to nhx:
>
> use Bio::TreeIO;
> use Getopt::Long;
> my $tree_file;
> my $outfile;
>
> GetOptions('f|file:s' =>\$tree_file, 'o|out:s' =>\$outfile);
>
> my $treeio = new Bio::TreeIO (-format => 'newick', -file => "$tree_file");
> my $treeout= new Bio::TreeIO (-format => 'nhx', -file =>">$outfile");
>
> while (my $tree= $treeio->next_tree)
> {
>    $treeout->write_tree($tree);
> }
>
> --
> ====================================================================
> = Laurence Amilhat    INRA Toulouse 31326 Castanet-Tolosan         =
> = Tel: 33 5 61 28 53 34   Email: laurence.amilhat at toulouse.inra.fr =
> ====================================================================
>
>
>
>


From cjfields at uiuc.edu  Thu Dec 20 16:14:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 10:14:55 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>

As Jason mentioned, it may be the number of features in the record if  
the record itself is huge (i.e. human chromosome-sized, full  
metagenome, etc).  If (my) memory serves correctly the mem. footprint  
for a perl object is ~10x the actual data, give or take (it depends on  
the complexity of the object itself).  In cases like this indexing may  
not fix the problem, unless you have an object which retains the file  
position of the data instead of the data itself; I don't think we have  
this object type in BioPerl.

The only way I can think of to fix this would be (as Jason also  
suggested) lightweight objects, or something like the lazy sequence  
object ala the SwissKnife suite (which only bring what you want into  
memory).

Related to that, I have been testing something like that, which uses  
iterators to pass in chunks of data from a stream to handlers to build  
a sequence object.  Wouldn't be too hard to reconfigure that to return  
file positions as well.  Maybe for the 1.7 release...

chris

On Dec 20, 2007, at 7:57 AM, Stefano Ghignone wrote:

> I was wandering if, working with so big FILE, should be better first  
> index the database, than query it formatting the sequences as one  
> want...
>
>> It gets buffered via the OS -- Bio::Root::IO calls next_line
>> iteratively, but eventually the whole sequence object will get put
>> into RAM as it is built up.
>> zcat or bzcat can also be used for gzipped and bzipped files
>> respectively, I like to use this where I want to disk space footprint
>> down.
>>
>> Because we treat data input usually as from a stream ignoring whether
>> it is in a file or not, we have to have a more flexible structure to
>> really handle this, although I'd argue the data really belongs in a
>> database when it is too big for memory.
>> More compact Feature/Location objects would probably also help here.
>> I would not be surprised if the memory requirement has more to do
>> with the number of features than length of the sequence - human chrom
>> 1 can fit into memory just fine on most machines with 2GB of RAM.
>>
>> But it would require someone taking an interest in some re-
>> architecting here.
>>
>> -jason
>>
>> On Dec 19, 2007, at 9:59 PM, Michael Thon wrote:
>>
>>>
>>> On Dec 18, 2007, at 7:04 PM, Stefano Ghignone wrote:
>>>
>>>> my $in  = Bio::SeqIO->new(-file => "/bin/gunzip -c $infile |", -
>>>> format => 'EMBL');
>>>
>>> This is just for the sake of curiosity, since you already found a
>>> solution to your problem, but I wonder how perl will handle a file
>>> opened this way.  Will it try to suck the whole thing into ram in
>>> one go?
>>>
>>> Mike
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Thu Dec 20 16:26:17 2007
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 20 Dec 2007 10:26:17 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
Message-ID: <628aabb70712200826p36d3d451wdcd901f555bc210a@mail.gmail.com>

On 12/20/07, Stefano Ghignone <ste.ghi at libero.it> wrote:
>
> I was wandering if, working with so big FILE, should be better first index
> the database, than query it formatting the sequences as one want...
>

Agreed, but only if you want to randomly access sequences within the file. I
believe the original poster intends to do something with every sequence in
the big file, in which case streaming the file is likely to be much faster.


Dave


From akarger at CGR.Harvard.edu  Thu Dec 20 16:48:58 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 11:48:58 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>
	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>

 
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu] 
> 
> 
> On Dec 19, 2007, at 10:45 AM, Stefano Ghignone wrote:
> 
> > At the end, I succeeded in the format conversion using this command:
> >
> > gunzip -c uniprot_trembl_bacteria.dat.gz | perl -ne 'print ">$1 " if
> > (/^AC\s+(\S+);/); print " $1" if (/^DE\s+(.*)/);print " [$1]\n" if
> > (/^OS\s+(.*)/); if (($a)=/^\s+(.*)/){$a=~s/ //g; print "$a\n"};'
> >
> > (Thanks to Riccardo Percudani). It's not bioperl...but it works!
> 
> 
> As this shows, sometimes BioPerl isn't always the best answer 
> (I know,  
> blasphemy...).  As Jason suggested it's quite likely there are large  
> sequence records causing your problems when using BioPerl.  The one- 
> liner works b/c it doesn't retain data (sequence, annotation, 
> etc) in  
> memory as Bio::Seq object; it's a direct conversion.
> 
> It would be nice to code up a lazy sequence object and related  
> parsers; maybe for the next dev release.

Yes!

Also, BLAST parsing. Blasting the proteome against the genome makes for
rather large result files. Right now, if you want to delete queries that
hit, say, more than 1000 times, you still need to wait for Bioperl to
create objects and sub-objects for every single hit. Sadly, this example
isn't hypothetical. I'm going to solve it with something like:

perl -wne 'BEGIN {$/="TBLASTN"} print if length($_) < $some_big_value'
big_blast > filtered_blast

(Not that I'm volunteering to help with the parser writing, so I should
stop complaining.)

-Amir


From bix at sendu.me.uk  Thu Dec 20 17:06:28 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 17:06:28 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
Message-ID: <476AA114.2060201@sendu.me.uk>

Chris Fields wrote:
> The only way I can think of to fix this would be (as Jason also 
> suggested) lightweight objects, or something like the lazy sequence 
> object ala the SwissKnife suite (which only bring what you want into 
> memory).
> 
> Related to that, I have been testing something like that, which uses 
> iterators to pass in chunks of data from a stream to handlers to build a 
> sequence object.  Wouldn't be too hard to reconfigure that to return 
> file positions as well.  Maybe for the 1.7 release...

Bio::PullParserI is your friend.


From bix at sendu.me.uk  Thu Dec 20 18:48:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 18:48:29 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
Message-ID: <476AB8FD.8090108@sendu.me.uk>

Amir Karger wrote:
>> It would be nice to code up a lazy sequence object and related  
>> parsers; maybe for the next dev release.
> 
> Yes!
> 
> Also, BLAST parsing. Blasting the proteome against the genome makes for
> rather large result files.

This has already been done. Use Bio::SearchIO::blast_pull. In a 
situation like yours I dropped run time from 20223s to
951s (~20x faster) and memory usage from over 8GB to less than 5GB (~40%
less).


From akarger at CGR.Harvard.edu  Thu Dec 20 18:52:51 2007
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu, 20 Dec 2007 13:52:51 -0500
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AB8FD.8090108@sendu.me.uk>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
Message-ID: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>

> Amir Karger wrote:
> >> It would be nice to code up a lazy sequence object and related  
> >> parsers; maybe for the next dev release.
> > 
> > Also, BLAST parsing. Blasting the proteome against the 
> genome makes for
> > rather large result files.
> 
> This has already been done. Use Bio::SearchIO::blast_pull. In a 
> situation like yours I dropped run time from 20223s to
> 951s (~20x faster) and memory usage from over 8GB to less 
> than 5GB (~40%
> less).

Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
can put in my own perl lib for this, or does it require large bunches of
new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
here, but I don't see our whole center using CVS Bioperl.

-Amir


From cjfields at uiuc.edu  Thu Dec 20 20:27:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:27:45 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <476AA114.2060201@sendu.me.uk>
References: <JTCPGI$4B9622B6978AB21534CC4DC74CC6BC89@libero.it>
	<FDB7609C-93B8-47C3-A97D-32D73B16651E@uiuc.edu>
	<476AA114.2060201@sendu.me.uk>
Message-ID: <29E190AB-8A6C-4F1C-BDD1-6034CFFEEFFF@uiuc.edu>

On Dec 20, 2007, at 11:06 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> The only way I can think of to fix this would be (as Jason also  
>> suggested) lightweight objects, or something like the lazy sequence  
>> object ala the SwissKnife suite (which only bring what you want  
>> into memory).
>> Related to that, I have been testing something like that, which  
>> uses iterators to pass in chunks of data from a stream to handlers  
>> to build a sequence object.  Wouldn't be too hard to reconfigure  
>> that to return file positions as well.  Maybe for the 1.7 release...
>
> Bio::PullParserI is your friend.

I'm looking into that, yes.  I'm thinking of something like a generic  
lazy sequence class with an embedded Handler/PullParser object which  
processes stuff on the fly.

Oh, when I have a bit more time...

chris


From cjfields at uiuc.edu  Thu Dec 20 20:39:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 20 Dec 2007 14:39:48 -0600
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>
	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>
	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <2EC6A1C2-FBC9-45F6-AD1B-040E29FAFA28@uiuc.edu>


On Dec 20, 2007, at 12:52 PM, Amir Karger wrote:

>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related
>>>> parsers; maybe for the next dev release.
>>>
>>> Also, BLAST parsing. Blasting the proteome against the
>> genome makes for
>>> rather large result files.
>>
>> This has already been done. Use Bio::SearchIO::blast_pull. In a
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less
>> than 5GB (~40%
>> less).
>
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large  
> bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.
>
> -Amir

It's in CVS.

Just to note: there have been a lot of changes between 1.5.1 and  
1.5.2, and probably as many from 1.5.2 to now.  We are cleaning up  
some code introduced prior to the 1.5 release and working on other  
fixes and code docs, with the final aim to be a new 1.6; I'm hoping  
that release will have routine point releases for bug fixes.  Of  
course that'll have to wait until after SVN migration!

There a few discussions on the list about speeding up parsing using  
lightweight/featherweight objects or even straight hashes (for  
instance, Jason has a lightweight seqfeature implementation committed  
on a ranch which is quite fast, and Sendu's Bio::SearchIO PullParser  
implementations).  My feeling is that will be part of the next dev  
release, along with GFF3 integration and code cleanup.

chris


From bix at sendu.me.uk  Thu Dec 20 23:29:30 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 20 Dec 2007 23:29:30 +0000
Subject: [Bioperl-l] dealing with large files
In-Reply-To: <B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
References: <JTB2JF$8DA035C3FA7C0AA4E73865996D18C568@libero.it>	<B92AAA3F-C93A-41EC-B68D-3E6F4053BBD4@uiuc.edu>	<B9182BFF5B004245BABC12956EA6322E07CF5C0F@huls5.nucleus.harvard.edu>	<476AB8FD.8090108@sendu.me.uk>
	<B9182BFF5B004245BABC12956EA6322E07CF5C77@huls5.nucleus.harvard.edu>
Message-ID: <476AFADA.20604@sendu.me.uk>

Amir Karger wrote:
>> Amir Karger wrote:
>>>> It would be nice to code up a lazy sequence object and related  
>>>> parsers; maybe for the next dev release.
>>> Also, BLAST parsing. Blasting the proteome against the 
>>> genome makes for rather large result files.
>> This has already been done. Use Bio::SearchIO::blast_pull. In a 
>> situation like yours I dropped run time from 20223s to
>> 951s (~20x faster) and memory usage from over 8GB to less 
>> than 5GB (~40% less).
> 
> Not in 1.5.1. Is it in 1.5.2 or just in cvs? Is there a single file I
> can put in my own perl lib for this, or does it require large bunches of
> new code? (I'm guessing the latter.) We're about to upgrade to 1.5.2
> here, but I don't see our whole center using CVS Bioperl.

blast_pull is only in CVS (and needs a whole bunch of associated modules 
to work), though 1.5.2 also contains significant improvements to 
SearchIO generally which should provide you with significant speed 
improvements during blast parsing with the normal Bio::SearchIO::blast.


From abdul.sattar4 at ntlworld.com  Fri Dec 21 00:32:06 2007
From: abdul.sattar4 at ntlworld.com (Abdul Sattar)
Date: Fri, 21 Dec 2007 00:32:06 -0000
Subject: [Bioperl-l]  bioperl-db & biperl version
Message-ID: <000001c84368$ee7872b0$c5836351@owner00d4289a7>

BFG-0DRTGO0EEGREWTYU


From DGroskreutz at twt.com  Fri Dec 21 07:01:27 2007
From: DGroskreutz at twt.com (DGroskreutz at twt.com)
Date: Fri, 21 Dec 2007 01:01:27 -0600
Subject: [Bioperl-l] Groskreutz, Deb is out of the office.
Message-ID: <OF1CBDB887.820A02D2-ON862573B8.002695BD-862573B8.002695BD@twt.com>


I will be out of the office starting  12/20/2007 and will not return until
01/01/2008.

I will respond to your message when I return on January 2nd, 2008


NOTICE OF CONFIDENTIALITY:
The information contained in this communication, including attachments, is intended for the specific delivery to and use by the individual(s) to whom it is addressed. This email includes confidential information that may be attorney-client privileged. Any review, retransmission, dissemination, or unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please reply to the sender immediately and delete the original communication and any copy of it from your computer system, including all attachments.


From bug-bioperl at rt.cpan.org  Fri Dec 21 12:07:39 2007
From: bug-bioperl at rt.cpan.org (Brandi Cantarel via RT)
Date: Fri, 21 Dec 2007 07:07:39 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25638-1198238855-470.31796-4-0@rt.cpan.org>


Fri Dec 21 07:07:30 2007: Request 31796 was acted upon.
Transaction: Ticket created by brandi.cantarel at afmb.univ-mrs.fr
       Queue: bioperl
     Subject: SeqIO
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: brandi.cantarel at afmb.univ-mrs.fr
      Status: new
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >


I might have found a bug in SeqIO in bioperl.  Well it is actually a  
memory leak.  When I try to load large file, I can step through the  
first 10K or so sequences (using next_seq) but then it just hangs.....

If this bug is fixed please let me know.

Brandi Cantarel

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From bug-bioperl at rt.cpan.org  Fri Dec 21 13:57:20 2007
From: bug-bioperl at rt.cpan.org (Sendu Bala via RT)
Date: Fri, 21 Dec 2007 08:57:20 -0500
Subject: [Bioperl-l] [rt.cpan.org #31796] SeqIO
In-Reply-To: <5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
References: <RT-Ticket-31796@rt.cpan.org>
	<5F694A96-AC4B-4279-8060-9E28A92837ED@afmb.univ-mrs.fr>
Message-ID: <rt-3.6.HEAD-25615-1198245436-879.31796-5-0@rt.cpan.org>


       Queue: bioperl
 Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=31796 >

On Fri Dec 21 07:07:30 2007, brandi.cantarel at afmb.univ-mrs.fr wrote:
> I might have found a bug in SeqIO in bioperl.  Well it is actually a  
> memory leak.  When I try to load large file, I can step through the  
> first 10K or so sequences (using next_seq) but then it just hangs.....
> 
> If this bug is fixed please let me know.

Please use http://bugzilla.bioperl.org/ to tell us about this bug. 
After creating a bug report you'll be able to attach the script in 
which you encounter the problem, which we need to diagnose this issue.


From susantoroy at gmail.com  Sat Dec 22 12:06:42 2007
From: susantoroy at gmail.com (Susanta Roy)
Date: Sat, 22 Dec 2007 17:36:42 +0530
Subject: [Bioperl-l] Enquiry about bioperl project
Message-ID: <236a58340712220406m3d3f9884h8f7b5e58bdfb356@mail.gmail.com>

Dear Sir,


Most humbly I have to state that I am Susanta Roy, 25 years and I have
done  my masters in bioinformatics. I have more than  nine months of work
experience as Associate Technical Content  Developer. I have also worked
in the journal "Bioinformatics  India" (The first bioinformatics journal
of India, now "Bioinformatics Trends"). My work with  previous employer
was highly appreciated.

This year I have founded Bioexplore, a bioinformatics KPO (Knowledge
Process Outsourcing) due to lack of bioinformatics jobs in India.

Our services include

1. Bioinformatics data mining / programming
2. HR solution
3. Technical writing solution
4. E-learning
5. Abstracing & indexing
6. Business promotion solution

I want to inquire if you can give me a project.

-- Looking forward to your reply.

Kind Regards
Mr. Susanta Roy, MS Bioinformatics
Founder Director
Bioexplore
C-5, Hazipark Market
Dimapur, Nagaland - 797112
India
+ 91 - 9811517324 (Mobile)
susanta.roy at bioexplore.co.in
susantoroy at gmail.com