From dmessina at wustl.edu  Sun Jul  1 01:38:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Jul 2007 00:38:48 -0500
Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn
	repository]
In-Reply-To: <46869226.70203@sheffield.ac.uk>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>	<4673C7CB.1030709@mail.nih.gov>	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>	<18049.30026.61328.134490@almost.alerce.com>	<5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu>	<BFBA575A-E653-40F6-9242-D72655B6AE9C@wustl.edu>	<E83D9D3C-96F2-4B5A-B503-09C3860586D0@gmx.net>	<D7111143-D173-42DE-AAEF-C2365AA453A0@wustl.edu>	<18051.44281.831316.749586@almost.alerce.com>	<F5B048F4-CBA5-493A-8A5C-2033709D8A63@wustl.edu>
	<18051.61992.627473.323346@almost.alerce.com>
	<4684AF3D.5090907@sheffield.ac.uk>
	<843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu>
	<468628AC.9060200@sheffield.ac.uk>
	<461F64B9-87FD-458A-8945-8238E7076109@wustl.edu>
	<46869226.70203@sheffield.ac.uk>
Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu>


> [Nath]
> I think the list of seq formats recognised by Bioperl in Bio::SeqIO  
> and
> Bio::AlignIO would be a good start. As these are likely to be the ones
> that are sensitive to file format recognition and thus could break  
> tests
> if renamed.

Sounds good to me. I will do a quick tour of the rest of the repo  
looking for other common or important file extensions, but I don't  
expect there to be many if any.


> [still Nath]
> I think a lot of people have used "." in file names as an  
> alternative to
> a space. I think it would be beneficial to use an underscore "_" in
> these cases and leave the "." to represent the beginning of the file
> extension.

That's a great idea.


> [Chris]
> Do we need to define every filetype extension, or can there be a  
> fallback (eg if it isn't on the list or has no extension it's plain  
> text)?

For every file that's added, svn takes a peek to see if it's human- 
readable. If not, it's tagged with the generic MIME type application/ 
octet-stream. (It does this so it knows not to try to do diffs and  
merges on a binary file.)

So the default for a human-readable file is no MIME type, which I  
believe is essentially the same thing as text/plain.

And then regardless of the outcome of svn's peek, any matching auto- 
props are then applied, overriding svn's choice.

So if we don't define every extension, I think we'll be fine. It'd be  
nice to have everything tagged with a MIME type, though. For one  
thing, Apache will use it to do the right thing when people browse  
the repo over the web. And two, because metadata is cool. :)

One more thing: in the course of reading up on this, I learned that  
my earlier expectation about multiple auto-prop matches was  
incorrect. It's true that multiple unrelated matches means that  
multiple properties are set on the file. But when a file matches  
multiple *conflicting* auto-property patterns, there's no telling  
which value it'll get.


Dave

From hartzell at alerce.com  Sun Jul  1 12:29:29 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 1 Jul 2007 09:29:29 -0700
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
	<E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
Message-ID: <18055.54889.677775.868974@almost.alerce.com>

Hilmar Lapp writes:
 > It turns out that both files are also present on the release-0-9-3,  
 > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add
 > 
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ 
 > HUMBETGLOA.fasta
 > 
 > to the post-processing commands.
 > [...]

Will do.  Thanks for working out the incantations!

g.

From cjfields at uiuc.edu  Mon Jul  2 09:26:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:26:06 -0500
Subject: [Bioperl-l] test data
Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>

I am planing on adding test data to cvs for eutils and have run  
across some stuff in bugzilla that needs to be added as well.

Should we, as convention, start adding data sequestered to a fold  
with the test name, within t/data?  This might make life easier in  
the long run (keep track of files, get rid of old files, etc), and  
may make it easier for wrapping up the correct data with tests if we  
start submitting single module CPAN updates.

chris

From cjfields at uiuc.edu  Mon Jul  2 09:52:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:52:27 -0500
Subject: [Bioperl-l] test data
In-Reply-To: <468901C1.8020505@sendu.me.uk>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>

On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am planing on adding test data to cvs for eutils and have run  
>> across some stuff in bugzilla that needs to be added as well.
>> Should we, as convention, start adding data sequestered to a fold  
>> with the test name, within t/data?
>
> I'd actually argue that this shouldn't be done: data is sometimes  
> reused amongst multiple different test scripts, and when looking  
> for data to reuse its easier to spot it in a single directory  
> compared to searching through multiple directories.
>
>
>> This might make life easier in the long run (keep track of files,  
>> get rid of old files, etc), and may make it easier for wrapping up  
>> the correct data with tests if we start submitting single module  
>> CPAN updates.
>
> I don't think that will be an issue. The automated process would  
> read the test script and see what input files it uses, copying  
> those into the archive. So, just be sure to standardise on using  
> test_input_file() to make that possible.
>
>
> That said, I wouldn't mind especially either way. Just don't do it  
> now, since test script names (and therefore the name of the  
> directory you'd want to store the input files in) might all change.
>
>
> In fact we can imagine that we have a test script t/ 
> BioZombieKitten.t which stores its test data in t/data/ 
> BioZombieKitten/input.file but the script gets the path to this  
> file by:
> my $input_file = test_input_file('input.file');
>
> test_input_file() is then implemented to look for the file in the  
> subdir of data corresponding to the script name if we're dealing  
> with the 900-modules-in-a-package checkout-type situation, but just  
> in t/data if we're in the one-module-in-a-package situation.
>
> In any case, things will be most flexible if you drop files  
> directly into t/data for now and reference them without any subdirs  
> in the call to test_input_file().

Fine by me, I just find it very cluttered.

BioZombieKitten?!?

chris

From bix at sendu.me.uk  Mon Jul  2 10:00:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 15:00:37 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
	<61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
Message-ID: <46890505.1070707@sendu.me.uk>

Chris Fields wrote:
> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:
> Fine by me, I just find it very cluttered.

Yes, I agree. I also wish we had a decent naming convention for files. 
(Ie. it would be nice to have a good idea what a file was for without 
having to study the test script that uses it.)


> BioZombieKitten?!?

I get Bio/perl/ and Bio/ware/ confused in my head ;)
http://forums.bioware.com/viewtopic.html?topic=562916&forum=84

From bix at sendu.me.uk  Mon Jul  2 09:46:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 14:46:41 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
Message-ID: <468901C1.8020505@sendu.me.uk>

Chris Fields wrote:
> I am planing on adding test data to cvs for eutils and have run across 
> some stuff in bugzilla that needs to be added as well.
> 
> Should we, as convention, start adding data sequestered to a fold with 
> the test name, within t/data?

I'd actually argue that this shouldn't be done: data is sometimes reused 
amongst multiple different test scripts, and when looking for data to 
reuse its easier to spot it in a single directory compared to searching 
through multiple directories.


> This might make life easier in the long 
> run (keep track of files, get rid of old files, etc), and may make it 
> easier for wrapping up the correct data with tests if we start 
> submitting single module CPAN updates.

I don't think that will be an issue. The automated process would read 
the test script and see what input files it uses, copying those into the 
archive. So, just be sure to standardise on using test_input_file() to 
make that possible.


That said, I wouldn't mind especially either way. Just don't do it now, 
since test script names (and therefore the name of the directory you'd 
want to store the input files in) might all change.


In fact we can imagine that we have a test script t/BioZombieKitten.t 
which stores its test data in t/data/BioZombieKitten/input.file but the 
script gets the path to this file by:
my $input_file = test_input_file('input.file');

test_input_file() is then implemented to look for the file in the subdir 
of data corresponding to the script name if we're dealing with the 
900-modules-in-a-package checkout-type situation, but just in t/data if 
we're in the one-module-in-a-package situation.

In any case, things will be most flexible if you drop files directly 
into t/data for now and reference them without any subdirs in the call 
to test_input_file().

From hlapp at gmx.net  Mon Jul  2 16:02:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 16:02:37 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>

Just FYI, after applying the changes I've been sending, I was able to  
check out the repository in its entirety.

	-hilmar

On Jun 30, 2007, at 8:48 PM, George Hartzell wrote:

>
> There's a second cut at the subversion repository.  I've done a better
> job of setting svn:keywords and svn:eol-style on various files.  The
> defaults were more cautious and I used an auto-props files based on
> the wiki version.
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>
> The old repository's still around as
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>
> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it, you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From wrp at virginia.edu  Mon Jul  2 16:08:04 2007
From: wrp at virginia.edu (William R. Pearson)
Date: Mon, 2 Jul 2007 16:08:04 -0400
Subject: [Bioperl-l] Course: Computational and Comparative Genomics
Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu>


Course announcement - Application deadline, July 15, 2007

================================================================

Cold Spring Harbor
COMPUTATIONAL & COMPARATIVE GENOMICS
November 7 - 13, 200
Application Deadline: July 15, 2007

INSTRUCTORS:

Pearson, William, Ph.D., University of Virginia, Charlottesville, VA
Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of
Prussia, PA

Beyond BLAST and FASTA - Alignment: from proteins to genomes - This
course presents a comprehensive overview of the theory and practice of
computational methods for extracting the maximum amount of information
from protein and DNA sequence similarity through sequence database
searches, statistical analysis, and multiple sequence alignment, and
genome scale alignment. Additional topics include gene finding,
dentifying signals in unaligned sequences, integration of genetic and
sequence information in biological databases.

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course makes extensive use of local WWW pages to present
problem sets and the computing tools to solve them. Students use
Windows and Mac workstations attached to a UNIX server.

The course is designed for biologists seeking advanced training in
biological sequence analysis, computational biology core resource
directors and staff, and for scientists in other disciplines, such as
computer science, who wish to survey current research problems in
biological sequence analysis and comparative genomics.

The primary focus of the Computational and Comparative Genomics Course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms. Cold Spring Harbor also offers a
"Programming for Biology" course, which focuses more on software
development.

For additional information and the lecture schedule and problem sets
for the 2006 course, see:

         http://fasta.bioch.virginia.edu/cshl06

================================================================

To apply to the course, fill out and send in the form at:

         http://meetings.cshl.edu/courses/courseapplication.asp

================================================================

Bill Pearson


From niels at genomics.dk  Mon Jul  2 16:45:07 2007
From: niels at genomics.dk (Niels Larsen)
Date: Mon, 02 Jul 2007 22:45:07 +0200
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
References: <18054.63942.316904.413911@almost.alerce.com>
	<F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
Message-ID: <468963D3.3000007@genomics.dk>

I write hoping someone could show me how to create a PrimarySeq
object without parsing features and all first. The lines below
return

"Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16."

whereas calling Bio::SeqIO-> gives no error, but a too big object.
The GenBank record after the __END__ is the "1.gb" file. I could not
find out how from the tutorial or the Bio::PrimarySeq description.

Niels L


#!/usr/bin/env perl

use strict;
use warnings FATAL => qw ( all );

use Data::Dumper;

use Bio::Seq;
use Bio::SeqIO;

my ( $seq_h, $seq );

$seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' );
# $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' );

$seq = $seq_h->next_seq();

# print Dumper( $seq );

__END__

LOCUS       X60065                     9 bp    mRNA    linear   MAM 14-NOV-2006
DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
ACCESSION   X60065 REGION: 1..9
VERSION     X60065.1  GI:5
KEYWORDS    beta-2 glycoprotein I.
SOURCE      Bos taurus (cattle)
   ORGANISM  Bos taurus
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
             Pecora; Bovidae; Bovinae; Bos.
REFERENCE   1
   AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and
             Kristensen,T.
   TITLE     Complete primary structure of bovine beta 2-glycoprotein I:
             localization of the disulfide bridges
   JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
    PUBMED   1567819
REFERENCE   2  (bases 1 to 9)
   AUTHORS   Kristensen,T.
   TITLE     Direct Submission
   JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology,
             University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C,
             DENMARK
FEATURES             Location/Qualifiers
      source          1..9
                      /organism="Bos taurus"
                      /mol_type="mRNA"
                      /db_xref="taxon:9913"
                      /clone="pBB2I"
                      /tissue_type="liver"
      gene            <1..>9
                      /gene="beta-2-gpI"
      CDS             <1..>9
                      /gene="beta-2-gpI"
                      /codon_start=1
                      /product="beta-2-glycoprotein I"
                      /protein_id="CAA42669.1"
                      /db_xref="GI:6"
                      /db_xref="GOA:P17690"
                      /db_xref="UniProtKB/Swiss-Prot:P17690"
                      /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
                      VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
                      ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
                      SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
                      PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
                      VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
                      DASDVKPC"
      sig_peptide     <1..>9
                      /gene="beta-2-gpI"
ORIGIN
         1 ccagcgctc
//

From Kevin.M.Brown at asu.edu  Mon Jul  2 17:35:12 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 2 Jul 2007 14:35:12 -0700
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <468963D3.3000007@genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>

Start by having a look at the following link:
http://bioperl.org/cgi-bin/deob_interface.cgi

SeqIO is how one reads or writes sequences to/from files.
Bio::PrimarySeq is just an object that holds information about a
sequence obtained from a file.

As for how to parse a Genbank file into a list of features:

$file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
while (my $seq = $file->next_seq())
{
	@features = $seq->all_SeqFeatures;
	# sort features by their primary tags
	for my $f (@features)
	{
		my $tag = $f->primary_tag;
		if ($tag eq 'CDS')
		{
			# @sorted_features holds all the Bio::PrimarySeq
features obtained from the genbank file
			push @sorted_features, $f; 
		}
	}
}
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Niels Larsen
> Sent: Monday, July 02, 2007 1:45 PM
> Cc: bioperl-l List
> Subject: [Bioperl-l] simple PrimarySeq question
> 
> I write hoping someone could show me how to create a 
> PrimarySeq object without parsing features and all first. The 
> lines below return
> 
> "Can't locate object method "next_seq" via package 
> "Bio::PrimarySeq" at ./tst2 line 16."
> 
> whereas calling Bio::SeqIO-> gives no error, but a too big object.
> The GenBank record after the __END__ is the "1.gb" file. I 
> could not find out how from the tutorial or the 
> Bio::PrimarySeq description.
> 
> Niels L
> 
> 
> #!/usr/bin/env perl
> 
> use strict;
> use warnings FATAL => qw ( all );
> 
> use Data::Dumper;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my ( $seq_h, $seq );
> 
> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 
> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", 
> -format => 'genbank' );
> 
> $seq = $seq_h->next_seq();
> 
> # print Dumper( $seq );
> 
> __END__
> 
> LOCUS       X60065                     9 bp    mRNA    linear 
>   MAM 14-NOV-2006
> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
> ACCESSION   X60065 REGION: 1..9
> VERSION     X60065.1  GI:5
> KEYWORDS    beta-2 glycoprotein I.
> SOURCE      Bos taurus (cattle)
>    ORGANISM  Bos taurus
>              Eukaryota; Metazoa; Chordata; Craniata; 
> Vertebrata; Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; 
> Cetartiodactyla; Ruminantia;
>              Pecora; Bovidae; Bovinae; Bos.
> REFERENCE   1
>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., 
> Sottrup-Jensen,L. and
>              Kristensen,T.
>    TITLE     Complete primary structure of bovine beta 
> 2-glycoprotein I:
>              localization of the disulfide bridges
>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>     PUBMED   1567819
> REFERENCE   2  (bases 1 to 9)
>    AUTHORS   Kristensen,T.
>    TITLE     Direct Submission
>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of 
> Mol Biology,
>              University of Aarhus, C F Mollers Alle 130, 
> DK-8000 Aarhus C,
>              DENMARK
> FEATURES             Location/Qualifiers
>       source          1..9
>                       /organism="Bos taurus"
>                       /mol_type="mRNA"
>                       /db_xref="taxon:9913"
>                       /clone="pBB2I"
>                       /tissue_type="liver"
>       gene            <1..>9
>                       /gene="beta-2-gpI"
>       CDS             <1..>9
>                       /gene="beta-2-gpI"
>                       /codon_start=1
>                       /product="beta-2-glycoprotein I"
>                       /protein_id="CAA42669.1"
>                       /db_xref="GI:6"
>                       /db_xref="GOA:P17690"
>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>                       
> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>                       
> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>                       
> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>                       
> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>                       
> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>                       
> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>                       DASDVKPC"
>       sig_peptide     <1..>9
>                       /gene="beta-2-gpI"
> ORIGIN
>          1 ccagcgctc
> //
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From niels at genomics.dk  Mon Jul  2 20:41:24 2007
From: niels at genomics.dk (niels at genomics.dk)
Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST)
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>

Kevin,

Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
gets entries from file, and from those large parsed entries I can get a
simplified primary_seq object. But the SeqIO object includes feature
and annotation objects etc that takes time to make, and I wish to know
if there is a way to get a primari_seq object without this overhead. I
apologize if I overlooked it in the docs.

Niels


> Start by having a look at the following link:
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> SeqIO is how one reads or writes sequences to/from files.
> Bio::PrimarySeq is just an object that holds information about a
> sequence obtained from a file.
>
> As for how to parse a Genbank file into a list of features:
>
> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
> while (my $seq = $file->next_seq())
> {
> 	@features = $seq->all_SeqFeatures;
> 	# sort features by their primary tags
> 	for my $f (@features)
> 	{
> 		my $tag = $f->primary_tag;
> 		if ($tag eq 'CDS')
> 		{
> 			# @sorted_features holds all the Bio::PrimarySeq
> features obtained from the genbank file
> 			push @sorted_features, $f;
> 		}
> 	}
> }
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Niels Larsen
>> Sent: Monday, July 02, 2007 1:45 PM
>> Cc: bioperl-l List
>> Subject: [Bioperl-l] simple PrimarySeq question
>>
>> I write hoping someone could show me how to create a
>> PrimarySeq object without parsing features and all first. The
>> lines below return
>>
>> "Can't locate object method "next_seq" via package
>> "Bio::PrimarySeq" at ./tst2 line 16."
>>
>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>> The GenBank record after the __END__ is the "1.gb" file. I
>> could not find out how from the tutorial or the
>> Bio::PrimarySeq description.
>>
>> Niels L
>>
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings FATAL => qw ( all );
>>
>> use Data::Dumper;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>>
>> my ( $seq_h, $seq );
>>
>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>> -format => 'genbank' );
>>
>> $seq = $seq_h->next_seq();
>>
>> # print Dumper( $seq );
>>
>> __END__
>>
>> LOCUS       X60065                     9 bp    mRNA    linear
>>   MAM 14-NOV-2006
>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>> ACCESSION   X60065 REGION: 1..9
>> VERSION     X60065.1  GI:5
>> KEYWORDS    beta-2 glycoprotein I.
>> SOURCE      Bos taurus (cattle)
>>    ORGANISM  Bos taurus
>>              Eukaryota; Metazoa; Chordata; Craniata;
>> Vertebrata; Euteleostomi;
>>              Mammalia; Eutheria; Laurasiatheria;
>> Cetartiodactyla; Ruminantia;
>>              Pecora; Bovidae; Bovinae; Bos.
>> REFERENCE   1
>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>> Sottrup-Jensen,L. and
>>              Kristensen,T.
>>    TITLE     Complete primary structure of bovine beta
>> 2-glycoprotein I:
>>              localization of the disulfide bridges
>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>     PUBMED   1567819
>> REFERENCE   2  (bases 1 to 9)
>>    AUTHORS   Kristensen,T.
>>    TITLE     Direct Submission
>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>> Mol Biology,
>>              University of Aarhus, C F Mollers Alle 130,
>> DK-8000 Aarhus C,
>>              DENMARK
>> FEATURES             Location/Qualifiers
>>       source          1..9
>>                       /organism="Bos taurus"
>>                       /mol_type="mRNA"
>>                       /db_xref="taxon:9913"
>>                       /clone="pBB2I"
>>                       /tissue_type="liver"
>>       gene            <1..>9
>>                       /gene="beta-2-gpI"
>>       CDS             <1..>9
>>                       /gene="beta-2-gpI"
>>                       /codon_start=1
>>                       /product="beta-2-glycoprotein I"
>>                       /protein_id="CAA42669.1"
>>                       /db_xref="GI:6"
>>                       /db_xref="GOA:P17690"
>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>
>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>
>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>
>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>
>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>
>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>
>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>                       DASDVKPC"
>>       sig_peptide     <1..>9
>>                       /gene="beta-2-gpI"
>> ORIGIN
>>          1 ccagcgctc
>> //
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From hlapp at gmx.net  Mon Jul  2 22:36:19 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 22:36:19 -0400
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
	<23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net>

Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have  
examples for what you want to do:

      use Bio::SeqIO;
      # usually you won't instantiate this yourself - a SeqIO object -
      # you will have one already
      my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
      my $builder = $seqin->sequence_builder();

      # if you need only sequence, id, and description (e.g. for
      # conversion to FASTA format):
      $builder->want_none();
      $builder->add_wanted_slot('display_id','desc','seq');

      # if you want everything except the sequence and features
      $builder->want_all(1); # this is the default if it's untouched
      $builder->add_unwanted_slot('seq','features');

Let us know if that doesn't answer your question.

Note that this is currently only implemented for Genbank format.

	-hilmar

On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:

> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can  
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> 	@features = $seq->all_SeqFeatures;
>> 	# sort features by their primary tags
>> 	for my $f (@features)
>> 	{
>> 		my $tag = $f->primary_tag;
>> 		if ($tag eq 'CDS')
>> 		{
>> 			# @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> 			push @sorted_features, $f;
>> 		}
>> 	}
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS       X60065                     9 bp    mRNA    linear
>>>   MAM 14-NOV-2006
>>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION   X60065 REGION: 1..9
>>> VERSION     X60065.1  GI:5
>>> KEYWORDS    beta-2 glycoprotein I.
>>> SOURCE      Bos taurus (cattle)
>>>    ORGANISM  Bos taurus
>>>              Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>              Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>>              Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE   1
>>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>>              Kristensen,T.
>>>    TITLE     Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>>              localization of the disulfide bridges
>>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>>     PUBMED   1567819
>>> REFERENCE   2  (bases 1 to 9)
>>>    AUTHORS   Kristensen,T.
>>>    TITLE     Direct Submission
>>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>>              University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>>              DENMARK
>>> FEATURES             Location/Qualifiers
>>>       source          1..9
>>>                       /organism="Bos taurus"
>>>                       /mol_type="mRNA"
>>>                       /db_xref="taxon:9913"
>>>                       /clone="pBB2I"
>>>                       /tissue_type="liver"
>>>       gene            <1..>9
>>>                       /gene="beta-2-gpI"
>>>       CDS             <1..>9
>>>                       /gene="beta-2-gpI"
>>>                       /codon_start=1
>>>                       /product="beta-2-glycoprotein I"
>>>                       /protein_id="CAA42669.1"
>>>                       /db_xref="GI:6"
>>>                       /db_xref="GOA:P17690"
>>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>>                       DASDVKPC"
>>>       sig_peptide     <1..>9
>>>                       /gene="beta-2-gpI"
>>> ORIGIN
>>>          1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ewijaya at gmail.com  Tue Jul  3 02:56:30 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 14:56:30 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward

From ewijaya at gmail.com  Tue Jul  3 03:00:16 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 15:00:16 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward

From ewijaya at i2r.a-star.edu.sg  Tue Jul  3 02:35:12 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 3 Jul 2007 14:35:12 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net>

 
Dear all, 
I was trying to perform check with this command:
 
$ perl -MGD -e 'print $GD::VERSION';

And it gave: 
 
GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

 
I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
 
Can anybody suggest how can I resolve my problem?
 
This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi
 
--
Edward

------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.--------------------------------------------------------


From lstein at cshl.edu  Tue Jul  3 10:41:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 3 Jul 2007 10:40:26 -0401
Subject: [Bioperl-l] Problem with GD.pm version 2.35
In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com>

This happens when there is a mismatch between the compiled (.so) portion of
GD and the perl (.pm) version. Typically it occurs when you have installed
GD incorrectly by, e.g., copying the .pm file into position rather than
using the make file.

Solution: Uninstall old versions of GD by manually finding all occurrences
of GD.so and GD.pm and removing them. Then reinstall the correct way.

Lincoln

On 7/3/07, Edward Wijaya <ewijaya at gmail.com> wrote:
>
> Dear all,
> I was trying to perform check with this command:
>
> $ perl -MGD -e 'print $GD::VERSION';
>
> And it gave:
>
> GD object version 2.32 does not match $GD::VERSION 2.35 at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
> Compilation failed in require.
> BEGIN failed--compilation aborted.
>
> Similarly my script that uses GD.pm doesn't execute.
>
>
> I have installed the latest version of libgd version 2.0.35 downloaded
> from
> http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
>
> Can anybody suggest how can I resolve my problem?
>
> This is my Perl version:
> This is perl, v5.8.8 built for i386-linux-thread-multi
>
> --
> Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Wed Jul  4 01:45:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 00:45:16 -0500
Subject: [Bioperl-l] genbank2gff3 - Name attribute?
Message-ID: <C790FCC2-81E5-4BB4-A9CB-E2E59E5ABE27@uiuc.edu>

I noticed that genbank2gff3.pl doesn't have an explicitly defined way  
of converting the gene/locus/etc name to a Name tag (for, say,  
GBrowse).  Any particular reason?

Should I stick with GFF2 for now?

chris

From bix at sendu.me.uk  Wed Jul  4 06:00:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 04 Jul 2007 11:00:31 +0100
Subject: [Bioperl-l] Splitting Bioperl
Message-ID: <468B6FBF.1070708@sendu.me.uk>

To summarise some previous threads:
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409

# Bioperl is currently one monolithic distribution of ~900 modules
# There is some desire to split it up into smaller functional groups
# There are some problems with that proposal
# An extreme variant of that proposal is to make the groups individual 
modules


Following this discussion:
http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
(especially Adam Kennedy's postings of 4/07, soon to appear in that 
archive), the extreme variant doesn't seem like a good idea.


I'm now suggesting that Steve's original split idea, as 
modified/expanded by Adam's driver and other ideas, is the best choice. 
The problems I previously identified can be solved in the same way they 
were solved in my extreme variant: the splits are done by Build.PL 
automation working on a single repository/code-base, not by splitting 
things up at the repository level.


As I see it, the way forward now is for someone interested enough to 
decide on the specifics of how things will be split and offer them up to 
the group for discussion. I don't mean vague possibilities of what might 
work as a split, but rather some real thought should go into it to make 
sure the split makes sense and will actually work in practice.

Following that, the splits can be implemented by some automated dist 
action of Build.PL.


If there isn't sufficient interest to make this happen, I don't see that 
as a terrible thing. There are benefits to keeping Bioperl monolithic, 
and some of the problems (eg. lack of updates) can be solved without 
changing its nature.

From cjfields at uiuc.edu  Wed Jul  4 10:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 09:53:45 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <468B6FBF.1070708@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>


On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote:

> To summarise some previous threads:
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ 
> focus=15409
>
> # Bioperl is currently one monolithic distribution of ~900 modules
> # There is some desire to split it up into smaller functional groups
> # There are some problems with that proposal
> # An extreme variant of that proposal is to make the groups individual
> modules
>
>
> Following this discussion:
> http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
> (especially Adam Kennedy's postings of 4/07, soon to appear in that
> archive), the extreme variant doesn't seem like a good idea.

brian d foy made some sound arguments against it as well.

> I'm now suggesting that Steve's original split idea, as
> modified/expanded by Adam's driver and other ideas, is the best  
> choice.
> The problems I previously identified can be solved in the same way  
> they
> were solved in my extreme variant: the splits are done by Build.PL
> automation working on a single repository/code-base, not by splitting
> things up at the repository level.
>
> As I see it, the way forward now is for someone interested enough to
> decide on the specifics of how things will be split and offer them  
> up to
> the group for discussion. I don't mean vague possibilities of what  
> might
> work as a split, but rather some real thought should go into it to  
> make
> sure the split makes sense and will actually work in practice.

We've already identified a few (SearchIO, Tools, GBrowse-related, etc).
...
> If there isn't sufficient interest to make this happen, I don't see  
> that
> as a terrible thing. There are benefits to keeping Bioperl monolithic,
> and some of the problems (eg. lack of updates) can be solved without
> changing its nature.

If so, proposals that solve this problem need to be made as well.

If we stay monolithic, then here's mine: we start having fixed,  
regularly timed dev releases like Parrot, monthly or bimonthly (quite  
common on CPAN), with brief release reports on which bugs have been  
fixed, code has been added, so on.  Not every bug has to be fixed per  
dev release; if that were true there would never be releases for some  
of the XML parser packages.  No RCs for dev releases (it's a dev  
release!).  These would be 1.x.y.  We can then, every once in a  
while, have a bug-squashing session, hackathon, etc, and have regular  
non-dev release (1.x) that all core devs accept and that passes a  
particular milestone.

As for the advantage of a split approach, as mentioned previously it  
is to focus modules/tests/scripts into groups with related  
functions.  Even just splitting off ones with external reqs (XML  
parsers, GD, etc) into an 'aux' release would be an advantage, as it  
doesn't confront a new user with the burden of installing a large  
list of dependencies, some of which may be complicated for a perl  
newbie to either install from scratch (DBD::mysql, GD) or to get the  
latest bug-fixed prereq release for their OS (the recent debacle with  
XML::SAX::Expat issues come to mind, which wasn't immediately  
available for win32 as a PPM).

I'm fairly open to any approach as long as it's reasonably though  
out, though I am admittedly a bit biased towards the split approach.   
I do think some change is in order; I worry about there ever being a  
1.6 release at this point.

chris

From davila at ioc.fiocruz.br  Wed Jul  4 13:11:20 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Wed, 04 Jul 2007 14:11:20 -0300
Subject: [Bioperl-l] ESTs in EST format
Message-ID: <468BD4B8.5050105@ioc.fiocruz.br>

Dear All,

I am trying to get all ESTs from a given species (eg: Trypanosoma 
brucei) from Genbank in EST format (eg: 
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... 
while using Entrez I can "display" individual EST entries in EST format, 
this "EST format" is not an option in the main "display" menu for batch 
download ...

I dont see the EST format listed 
(http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO 
deal with, so wonder there would another BioPerl module to do this ? any 
tips, would be greatly appreciated ;-)

Kindest regards, Alberto

From jason at bioperl.org  Wed Jul  4 13:52:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 10:52:59 -0700
Subject: [Bioperl-l] ESTs in EST format
In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br>
References: <468BD4B8.5050105@ioc.fiocruz.br>
Message-ID: <D0D013CC-1D28-46D6-A94F-EA53C7EC5219@bioperl.org>

Currently we don't support this format as far as I know it isn't a  
published standard nor is it a format that you NCBI distributes this  
data in flat format for (i.e. genbank dumps).

Is there any reason why you can't get what you need from the GenBank  
format?
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb

-jason
On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote:

> Dear All,
>
> I am trying to get all ESTs from a given species (eg: Trypanosoma
> brucei) from Genbank in EST format (eg:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucest&id=10280980)...
> while using Entrez I can "display" individual EST entries in EST  
> format,
> this "EST format" is not an option in the main "display" menu for  
> batch
> download ...
>
> I dont see the EST format listed
> (http://www.bioperl.org/wiki/Sequence_formats) among the ones that  
> SeqIO
> deal with, so wonder there would another BioPerl module to do  
> this ? any
> tips, would be greatly appreciated ;-)
>
> Kindest regards, Alberto
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Wed Jul  4 14:37:22 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Jul 2007 13:37:22 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>


On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:

>  we start having fixed,
> regularly timed dev releases like Parrot, monthly or bimonthly (quite
> common on CPAN), with brief release reports on which bugs have been
> fixed, code has been added, so on.  Not every bug has to be fixed per
> dev release; if that were true there would never be releases for some
> of the XML parser packages.  No RCs for dev releases (it's a dev
> release!).  These would be 1.x.y.  We can then, every once in a
> while, have a bug-squashing session, hackathon, etc, and have regular
> non-dev release (1.x) that all core devs accept and that passes a
> particular milestone.


Regardless of whether we split or don't, I think these ideas of  
adding a little more structure to BioPerl's development cycles --  
especially having bug-squashing and hacking sessions, where we all  
band together and commit some time to cranking through a bunch of to- 
dos -- would be beneficial, particularly as a means to keeping a  
certain basal level of momentum in BioPerl.

Dave


From jason at bioperl.org  Wed Jul  4 15:45:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 12:45:29 -0700
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
Message-ID: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>

I definitely agree - we can live up to the unstable "living on the  
edge" nature of dev releases a bit more perhaps?


On Jul 4, 2007, at 11:37 AM, David Messina wrote:

>
> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>
>>  we start having fixed,
>> regularly timed dev releases like Parrot, monthly or bimonthly (quite
>> common on CPAN), with brief release reports on which bugs have been
>> fixed, code has been added, so on.  Not every bug has to be fixed per
>> dev release; if that were true there would never be releases for some
>> of the XML parser packages.  No RCs for dev releases (it's a dev
>> release!).  These would be 1.x.y.  We can then, every once in a
>> while, have a bug-squashing session, hackathon, etc, and have regular
>> non-dev release (1.x) that all core devs accept and that passes a
>> particular milestone.
>
>
> Regardless of whether we split or don't, I think these ideas of
> adding a little more structure to BioPerl's development cycles --
> especially having bug-squashing and hacking sessions, where we all
> band together and commit some time to cranking through a bunch of to-
> dos -- would be beneficial, particularly as a means to keeping a
> certain basal level of momentum in BioPerl.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Wed Jul  4 16:54:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 15:54:14 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
Message-ID: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>

I think what's partially responsible for slowing down releases is the  
expectation that each dev release is supposed to have all bugs fixed,  
work for every OS, etc.  In other words, act like a stable release.

A developer release by nature is living on the edge, so why not have  
regular dev releases?  We keep telling users to update to using  
bioperl-live whenever something breaks, anyway.  We could decide to  
split stuff off along the way into more 'stable' sections if there  
were more demand for it, and have the more API-volatile code  
(DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
'dev' tag until we feel it's ready for prime time.

chris

On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:

> I definitely agree - we can live up to the unstable "living on the
> edge" nature of dev releases a bit more perhaps?
>
>
> On Jul 4, 2007, at 11:37 AM, David Messina wrote:
>
>>
>> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>>
>>>  we start having fixed,
>>> regularly timed dev releases like Parrot, monthly or bimonthly  
>>> (quite
>>> common on CPAN), with brief release reports on which bugs have been
>>> fixed, code has been added, so on.  Not every bug has to be fixed  
>>> per
>>> dev release; if that were true there would never be releases for  
>>> some
>>> of the XML parser packages.  No RCs for dev releases (it's a dev
>>> release!).  These would be 1.x.y.  We can then, every once in a
>>> while, have a bug-squashing session, hackathon, etc, and have  
>>> regular
>>> non-dev release (1.x) that all core devs accept and that passes a
>>> particular milestone.
>>
>>
>> Regardless of whether we split or don't, I think these ideas of
>> adding a little more structure to BioPerl's development cycles --
>> especially having bug-squashing and hacking sessions, where we all
>> band together and commit some time to cranking through a bunch of to-
>> dos -- would be beneficial, particularly as a means to keeping a
>> certain basal level of momentum in BioPerl.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Thu Jul  5 04:09:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 09:09:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
Message-ID: <468CA721.4020804@sheffield.ac.uk>

Chris Fields wrote:
> I think what's partially responsible for slowing down releases is the  
> expectation that each dev release is supposed to have all bugs fixed,  
> work for every OS, etc.  In other words, act like a stable release.
>
> A developer release by nature is living on the edge, so why not have  
> regular dev releases?  We keep telling users to update to using  
> bioperl-live whenever something breaks, anyway.  We could decide to  
> split stuff off along the way into more 'stable' sections if there  
> were more demand for it, and have the more API-volatile code  
> (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
> 'dev' tag until we feel it's ready for prime time.
>
> chris
>
> On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:
>
>   
-- snip --

I agree, although would the dev releases still need to pass all the 
tests? I'm thinking of people installing via CPAN.

I also agree with what was said in a previous post about bringing back 
bioperl-run (and some others) back into the same repository as 
bioperl-core (after a successful move over to svn) and have Build.PL 
deal with creating the packages etc for CPAN. This would hopefully help 
keep the run package (and others) up to speed with the core package.

I also agree with previous posts about organising and/or having some 
naming convention for test data files. I think an approach whereby data 
files were organised into directory trees (1 - 3 deep) with names that 
elude to the type of data in that subtree/file rather than the tests 
that use it etc. For example:

t/data
    |__ formats
    |           |__ seq
    |           |        |__ legal_fasta
    |           |        |              |__ extension.fas
    |           |        |              |__ extension.fasta
    |           |        |              |__ extension.foo
    |           |        |              |__ extension.bar
    |           |        |              |__ no_extension
    |           |        |              |__ interleaved.fas
    |           |        |              |__ non_interleaved.fas
    |           |        |              |__ single_seq.fas
    |           |        |              |__ multiple_seq.fas
    |           |        |              |__ desc_line1.fas
    |           |        |              |__ desc_line2.fas
    |           |        |
    |           |        |__ illegal_fasta
    |           |        |              |__ illegal_chars.fas
    |           |        |              |__ 
some_other_illegal_alternative.fas
    |           |        |
    |           |        |__ legal_genbank
    |           |        |              |__ etc etc
    |           |        |
    |           |        |__ illegal_genank
    |           |                      |__ etc etc
    |           |
    |           |__ aln
    |           |__ blast
    |           |        |__ legal_blastx
    |           |        |
    |           |        |__ legal_blastp
    |           |        |
    |           |        |__ legal_tblastx
    |           |        |
    |           |        |__ legal_plastpsi
    |           |        |
    |           |        |__ legal_wublast
    |           |__ foo
    |           |__ bar
    |           |__ misc
    |
    |__ etc

This type of setup, might lend itself to having a test script simply try 
to parse all the files in a directory to ensure nothing fails (for legal 
file formats) and fails for illegal formats. Naming of the file paths 
would help test authors to identify a suitable data file for their own 
tests before adding their own to the t/data dir. It might also help to 
identify areas where example test data is currently lacking.

Thinking about this a little more, I think it would be a good idea to 
include Test::Exception in t/lib. We should also be testing that 
warnings and exceptions are generated when expected - e.g. illegal 
characters in seq files etc etc. Without these sorts of tests we are 
only getting half the story. This testing might account for a large 
chunk of the poor test coverage, particularly when it comes to branches 
in the code.

Anyway, this type of reorganisation couldn't take place until the svn 
repo is up and working.

I'd appreciate any comments on the above!
Nath


From bix at sendu.me.uk  Thu Jul  5 04:55:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 09:55:25 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <468CB1FD.7060301@sendu.me.uk>

Nathan S. Haigh wrote:
> I agree, although would the dev releases still need to pass all the 
> tests? I'm thinking of people installing via CPAN.

Yes, they'd all have to pass. 'Developer release' should never have the 
connotation of 'broken release'. However, getting all tests to pass is a 
lot easier than fixing all bugs in bugzilla.

(... which actually goes to show how poor our tests are)

Worst case, if we were forced to stick to a schedule but couldn't fix a 
failing test, we could always make it a 'todo' test.


> I also agree with what was said in a previous post about bringing back 
> bioperl-run (and some others) back into the same repository as 
> bioperl-core (after a successful move over to svn)

Agree (with myself essentially).


> I also agree with previous posts about organising and/or having some 
> naming convention for test data files. I think an approach whereby data 
> files were organised into directory trees (1 - 3 deep) with names that 
> elude to the type of data in that subtree/file rather than the tests 
> that use it etc. For example:
> 
> t/data
>     |__ formats
>     |           |__ seq
>     |           |        |__ legal_fasta
>     |           |        |              |__ extension.fas
[snip]

At that level, files don't need extensions and can have fully 
informative names that explain what's interesting or special about them.


> This type of setup, might lend itself to having a test script simply try 
> to parse all the files in a directory to ensure nothing fails (for legal 
> file formats) and fails for illegal formats.

Great idea.


> Thinking about this a little more, I think it would be a good idea to 
> include Test::Exception in t/lib.

Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.


> Anyway, this type of reorganisation couldn't take place until the svn 
> repo is up and working.

Agree.

From bix at sendu.me.uk  Thu Jul  5 05:39:10 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 10:39:10 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>
	<468CB1FD.7060301@sendu.me.uk>
Message-ID: <468CBC3E.1020408@sendu.me.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Thinking about this a little more, I think it would be a good idea to 
>> include Test::Exception in t/lib.
> 
> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.

I've now done that: BioperlTest loads Test::Exception, from the copy in 
t/lib if necessary.

So, in BioperlTest-using scripts you now have access to the methods 
dies_ok, lives_ok, throws_ok and lives_and.

From N.Haigh at sheffield.ac.uk  Thu Jul  5 06:01:04 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 11:01:04 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk>

Quoting Sendu Bala <bix at sendu.me.uk>:

-- snip --
> 
> 
> > I also agree with previous posts about organising and/or having some 
> > naming convention for test data files. I think an approach whereby data 
> > files were organised into directory trees (1 - 3 deep) with names that 
> > elude to the type of data in that subtree/file rather than the tests 
> > that use it etc. For example:
> > 
> > t/data
> >     |__ formats
> >     |           |__ seq
> >     |           |        |__ legal_fasta
> >     |           |        |              |__ extension.fas
> [snip]
> 
> At that level, files don't need extensions and can have fully 
> informative names that explain what's interesting or special about them.
> 

You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside
the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to
check that the peek inside the file correctly determines the format.

-- snip --


From bix at sendu.me.uk  Thu Jul  5 06:04:16 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:04:16 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
Message-ID: <468CC220.804@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Sendu Bala <bix at sendu.me.uk>:
> 
> -- snip --
>> 
>>> I also agree with previous posts about organising and/or having
>>> some naming convention for test data files. I think an approach
>>> whereby data files were organised into directory trees (1 - 3
>>> deep) with names that elude to the type of data in that
>>> subtree/file rather than the tests that use it etc. For example:
>>> 
>>> t/data |__ formats |           |__ seq |           |        |__
>>> legal_fasta |           |        |              |__ extension.fas
>>> 
>> [snip]
>> 
>> At that level, files don't need extensions and can have fully 
>> informative names that explain what's interesting or special about
>> them.
>> 
> 
> You may be correct in most cases, however, isn't there a method for
> detecting the file format from the file extension and failing that it
> peeks inside the file? Therefore there should be a file extension for
> each of these to get good code coverage as well as each format not
> having an extension to check that the peek inside the file correctly
> determines the format.

Yes, you're quite correct.

From bix at sendu.me.uk  Thu Jul  5 06:47:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:47:12 +0100
Subject: [Bioperl-l] Warnings
Message-ID: <468CCC30.90406@sendu.me.uk>

I'm trying to get Test::Warn to work with Bioperl warnings as produced 
by Bio::Root::RootI::warn(). However, afaict the warnings must be 
generated with CORE::warn(), not print STDERR.

Is there any particular reason RootI::warn is done with print and not 
CORE::warn ? Can I change it to a warn?

From bix at sendu.me.uk  Thu Jul  5 09:04:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:04:50 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
Message-ID: <468CEC72.4090909@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> My guess is that using 'print STDERR' avoids showing sometimes annoying 
>    errordescription  at programname line  NN
> syntax being used.

Afaik,

CORE::warn "anything\n";

never includes the line number: messages with a new line always disable 
that feature. Bio::Root::RootI::warn /always/ puts new lines into the 
message, so they /never/ have the line number.


> On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
> objects is to find where warnings are coming from. Maybe extra text in 
> warnings leads to easier debugging.
> 
> I favour changing it.

So its my understanding there will be absolutely no difference in 
behaviour following this change (except that warning can be caught by 
Test::Warn). I just wanted to confirm my understanding.

From hlapp at gmx.net  Thu Jul  5 09:07:27 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Jul 2007 09:07:27 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>


On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I think what's partially responsible for slowing down releases is the
>> expectation that each dev release is supposed to have all bugs fixed,
>> work for every OS, etc.  In other words, act like a stable release.
>>

It doesn't. A stable release has a stable API that will be supported  
until the next stable release through point releases.

>> A developer release by nature is living on the edge, so why not have
>> regular dev releases?

There's no problem with regular dev releases, but tests will need to  
pass. There was never a stipulation that all bugs need to have been  
fixed. But all tests need to pass, so in an ideal world (in which  
everything is being tested) all tests passing would imply all (known)  
bugs fixed. Obviously, we don't live in an ideal world ...

If not everything passes then what is the big difference to a code  
snapshot? If using cvs (or svn) is too difficult for most people, we  
can consider creating a mechanism that puts up nightly snapshots for  
download.

> -- snip --
>
> I agree, although would the dev releases still need to pass all the
> tests? I'm thinking of people installing via CPAN.

For example, that's another point.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From heikki at sanbi.ac.za  Thu Jul  5 09:12:37 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 15:12:37 +0200
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <200707051512.38185.heikki@sanbi.ac.za>


One more suggestion:

It would be extemaly useful if we had a standard way of testing that a when a 
file is read into a bioperl object and then written out again into a same 
format, the input and output files are identical. If not, the test should 
show where the the differences start (showing all the differences would just 
clutter the screen).

This standard method/subroutine should be used to test all sequence and other 
text file IO.

Any takers? 

	-Heikki

On Thursday 05 July 2007 11:39:10 Sendu Bala wrote:
> Sendu Bala wrote:
> > Nathan S. Haigh wrote:
> >> Thinking about this a little more, I think it would be a good idea to
> >> include Test::Exception in t/lib.
> >
> > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
>
> I've now done that: BioperlTest loads Test::Exception, from the copy in
> t/lib if necessary.
>
> So, in BioperlTest-using scripts you now have access to the methods
> dies_ok, lives_ok, throws_ok and lives_and.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Thu Jul  5 08:58:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 14:58:59 +0200
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CCC30.90406@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
Message-ID: <200707051458.59921.heikki@sanbi.ac.za>

My guess is that using 'print STDERR' avoids showing sometimes annoying 
   errordescription  at programname line  NN
syntax being used.

On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
objects is to find where warnings are coming from. Maybe extra text in 
warnings leads to easier debugging.

I favour changing it.

	-Heikki


On Thursday 05 July 2007 12:47:12 Sendu Bala wrote:
> I'm trying to get Test::Warn to work with Bioperl warnings as produced
> by Bio::Root::RootI::warn(). However, afaict the warnings must be
> generated with CORE::warn(), not print STDERR.
>
> Is there any particular reason RootI::warn is done with print and not
> CORE::warn ? Can I change it to a warn?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From bix at sendu.me.uk  Thu Jul  5 09:44:08 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:44:08 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF5A8.7040402@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that
> a when a file is read into a bioperl object and then written out
> again into a same format, the input and output files are identical.

As Hilmar has pointed out in the past, Bioperl doesn't aim for the files 
to be identical, only for none of the information to be lost and to be 
ouput in the correct format.

So a round-trip test should read in the original, store all the parsed 
data, write it out, then read in the written version and see if the new 
parsed data matches the original.


For simpler or ultra-strict file formats, though...

> If not, the test should show where the the differences start (showing
> all the differences would just clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence
> and other text file IO.
> 
> Any takers?

There's already something along these lines in t/SeqIO.t (the section
that uses Algorithm::Diff).

I copied that over from the old testformats.pl script but haven't really
taken the time to see if its a good way of doing the test.

Is it? Can someone come up with something better? Can someone generalise
it if necessary?

I imagine you could just read the files into arrays and use 
Test::More::is_deeply(). If that would be satisfactory I could easily 
add a little method to BioperlTest that did that.


From n.haigh at sheffield.ac.uk  Thu Jul  5 09:47:24 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 14:47:24 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF66C.2070907@sheffield.ac.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that a when a 
> file is read into a bioperl object and then written out again into a same 
> format, the input and output files are identical. If not, the test should 
> show where the the differences start (showing all the differences would just 
> clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence and other 
> text file IO.
> 
> Any takers? 
> 
> 	-Heikki
> 

Wouldn't this require info about the formatting of the file to be stored 
in the object as well, such that the same formatting could be used when 
writing the file?

Wouldn't a better approach be to read the contents of file1 into ojb1, 
write obj1 to file2 in the same format, and then read file2 into obj2 
and compare obj1 to obj2 to ensure we have all the same data.

Nath

From cjfields at uiuc.edu  Thu Jul  5 09:52:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 08:52:12 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <BECE91CB-980B-4063-8E85-291CC85DCDC1@uiuc.edu>


On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote:

> ...
> I agree, although would the dev releases still need to pass all the  
> tests? I'm thinking of people installing via CPAN.

Remains to be decided.  All current tests (net and non-non) should  
pass.  Any bug fixes should try to have added tests if possible, with  
in-process stuff as TODO's.  Network tests are left up to user  
discretion, so if they fail for any particular reason there is a way  
around them.

> I also agree with what was said in a previous post about bringing  
> back bioperl-run (and some others) back into the same repository as  
> bioperl-core (after a successful move over to svn) and have  
> Build.PL deal with creating the packages etc for CPAN. This would  
> hopefully help keep the run package (and others) up to speed with  
> the core package.

It's up to how we want to have everything split.  I don't think it's  
immediately prescient (there are more important priorities, i.e.  
bugs, svn) but I would say folding everything back into live and  
'splitting' them out using an automated Build process is a viable  
option.

> I also agree with previous posts about organising and/or having  
> some naming convention for test data files. I think an approach  
> whereby data files were organised into directory trees (1 - 3 deep)  
> with names that elude to the type of data in that subtree/file  
> rather than the tests that use it etc. For example:
>
> t/data
>    |__ formats
>    |           |__ seq
>    |           |        |__ legal_fasta
>    |           |        |              |__ extension.fas
>    |           |        |              |__ extension.fasta
>    |           |        |              |__ extension.foo
>    |           |        |              |__ extension.bar
>    |           |        |              |__ no_extension
>    |           |        |              |__ interleaved.fas
>    |           |        |              |__ non_interleaved.fas
>    |           |        |              |__ single_seq.fas
>    |           |        |              |__ multiple_seq.fas
>    |           |        |              |__ desc_line1.fas
>    |           |        |              |__ desc_line2.fas
>    |           |        |
>    |           |        |__ illegal_fasta
>    |           |        |              |__ illegal_chars.fas
>    |           |        |              |__  
> some_other_illegal_alternative.fas
>    |           |        |
>    |           |        |__ legal_genbank
>    |           |        |              |__ etc etc
>    |           |        |
>    |           |        |__ illegal_genank
>    |           |                      |__ etc etc
>    |           |
>    |           |__ aln
>    |           |__ blast
>    |           |        |__ legal_blastx
>    |           |        |
>    |           |        |__ legal_blastp
>    |           |        |
>    |           |        |__ legal_tblastx
>    |           |        |
>    |           |        |__ legal_plastpsi
>    |           |        |
>    |           |        |__ legal_wublast
>    |           |__ foo
>    |           |__ bar
>    |           |__ misc
>    |
>    |__ etc
>
> This type of setup, might lend itself to having a test script  
> simply try to parse all the files in a directory to ensure nothing  
> fails (for legal file formats) and fails for illegal formats.  
> Naming of the file paths would help test authors to identify a  
> suitable data file for their own tests before adding their own to  
> the t/data dir. It might also help to identify areas where example  
> test data is currently lacking.

...
This seems like more of a 'guess sequence' and format validation  
issue, something we've talked about before:

http://bugzilla.open-bio.org/show_bug.cgi?id=1508

The way I feel about it is sequence format validation and sequence  
parsing should be separate issues and therefore in separate classes  
(with parsing optionally preceded by validation), but that's  
something for another discussion.

> Thinking about this a little more, I think it would be a good idea  
> to include Test::Exception in t/lib. We should also be testing that  
> warnings and exceptions are generated when expected - e.g. illegal  
> characters in seq files etc etc. Without these sorts of tests we  
> are only getting half the story. This testing might account for a  
> large chunk of the poor test coverage, particularly when it comes  
> to branches in the code.
>
> Anyway, this type of reorganisation couldn't take place until the  
> svn repo is up and working.
>
> I'd appreciate any comments on the above!
> Nath

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:08:29 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:08:29 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CF5A8.7040402@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk>
Message-ID: <468CFB5D.6080406@sheffield.ac.uk>

Is there a way to install all the modules that are used in the tests? I 
mean there are cases where tests are skipped and pass if the required 
module for testing is not installed. Therefore, missing out a chunk of 
the tests. It would be desirable to be able to install all these modules 
in order to complete they whole test suite - any ideas if/how this can 
be done?

Cheers
Nath

From bix at sendu.me.uk  Thu Jul  5 10:15:34 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 15:15:34 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <468CFD06.3080604@sendu.me.uk>

Nathan S. Haigh wrote:
> Is there a way to install all the modules that are used in the tests? I 
> mean there are cases where tests are skipped and pass if the required 
> module for testing is not installed. Therefore, missing out a chunk of 
> the tests. It would be desirable to be able to install all these modules 
> in order to complete they whole test suite - any ideas if/how this can 
> be done?

Yes, add them as recommended (or perhaps 'build_requires') modules in 
Build.PL, then run Build.PL and install the modules when it asks you.

Everything should be in Build.PL already. If I missed something, please 
add it.


From cjfields at uiuc.edu  Thu Jul  5 10:18:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:08 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <C3B6AF09-B395-4303-9B50-953C0FAAE8A7@uiuc.edu>


On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote:

> Is there a way to install all the modules that are used in the  
> tests? I
> mean there are cases where tests are skipped and pass if the required
> module for testing is not installed. Therefore, missing out a chunk of
> the tests. It would be desirable to be able to install all these  
> modules
> in order to complete they whole test suite - any ideas if/how this can
> be done?
>
> Cheers
> Nath

That's optionally done upon 'perl Build.PL', correct?  So if you  
choose not to install a particular prereq (i.e. XML::SAX), you  
shouldn't be forced to install it later just for tests.  Or am I  
misunderstanding you?

chris


From cjfields at uiuc.edu  Thu Jul  5 10:18:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:23 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CC220.804@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
Message-ID: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>


On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:

> Nathan S. Haigh wrote:
>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>> ...<snip snips>
>>> At that level, files don't need extensions and can have fully
>>> informative names that explain what's interesting or special about
>>> them.
>>>
>>
>> You may be correct in most cases, however, isn't there a method for
>> detecting the file format from the file extension and failing that it
>> peeks inside the file? Therefore there should be a file extension for
>> each of these to get good code coverage as well as each format not
>> having an extension to check that the peek inside the file correctly
>> determines the format.
>
> Yes, you're quite correct.

I actually like Sendu's idea more, or the idea of each test suite  
having it's own directory.

Tests which need to guess/validate the format are probably best left  
sequestered to a specific suite focused on format guessing/ 
validation, at least in my opinion.

chris

From n.haigh at sheffield.ac.uk  Thu Jul  5 10:22:40 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:22:40 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFD06.3080604@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk>
Message-ID: <468CFEB0.80201@sheffield.ac.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Is there a way to install all the modules that are used in the tests? 
>> I mean there are cases where tests are skipped and pass if the 
>> required module for testing is not installed. Therefore, missing out a 
>> chunk of the tests. It would be desirable to be able to install all 
>> these modules in order to complete they whole test suite - any ideas 
>> if/how this can be done?
> 
> Yes, add them as recommended (or perhaps 'build_requires') modules in 
> Build.PL, then run Build.PL and install the modules when it asks you.
> 
> Everything should be in Build.PL already. If I missed something, please 
> add it.
> 

OK, to clarify using the test file Sendu mentioned in a previous post: 
t/SeqIO.t

This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String 
are not installed (the first two are not mentioned in Build.PL). 
However, if there are a lot of such skips in the whole test suite then 
there maybe few system with all these modules installed in order to 
conduct a complete test. These are the modules I'm referring to.

Nath

From n.haigh at sheffield.ac.uk  Thu Jul  5 10:30:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:30:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
Message-ID: <468D006D.6050806@sheffield.ac.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:
> 
>> Nathan S. Haigh wrote:
>>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>>> ...<snip snips>
>>>> At that level, files don't need extensions and can have fully
>>>> informative names that explain what's interesting or special about
>>>> them.
>>>>
>>>
>>> You may be correct in most cases, however, isn't there a method for
>>> detecting the file format from the file extension and failing that it
>>> peeks inside the file? Therefore there should be a file extension for
>>> each of these to get good code coverage as well as each format not
>>> having an extension to check that the peek inside the file correctly
>>> determines the format.
>>
>> Yes, you're quite correct.
> 
> I actually like Sendu's idea more, or the idea of each test suite having 
> it's own directory.
> 
> Tests which need to guess/validate the format are probably best left 
> sequestered to a specific suite focused on format guessing/validation, 
> at least in my opinion.
> 
> chris


How easily would this lend itself to using the same data for multiple 
tests, or is it likely to lead to/exacerbate a culture of adding 
duplicate data files in each "test suite" rather than reusing?

Nath

From cjfields at uiuc.edu  Thu Jul  5 10:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:33:46 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
Message-ID: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>


On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote:

> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:
>
>> Chris Fields wrote:
>>> I think what's partially responsible for slowing down releases is  
>>> the
>>> expectation that each dev release is supposed to have all bugs  
>>> fixed,
>>> work for every OS, etc.  In other words, act like a stable release.
>
> It doesn't. A stable release has a stable API that will be  
> supported until the next stable release through point releases.

I agree, but I think there is still an expectation that 1.5.2 and  
beyond are more like true 'stable' releases even though we still  
designate them as 'developer.'   We unfortunately reinforce that when  
we tell users they need to update to v. 1.5.2 or bioperl-live to fix  
a particular bug in the 1.4 release.

There's nothing we can do about that now (hindsight is always 20/20,  
and 1.4 is just too old).  We (pumpkin, core devs) can try correcting  
that by ensuring any bug fixes be committed to any new stable branch  
as well as to live, at least until it becomes too problematic to  
maintain that particular stable branch (at which point we would go  
about getting ready for the next 'stable' and repeat the cycle over  
again).

>>> A developer release by nature is living on the edge, so why not have
>>> regular dev releases?
>
> There's no problem with regular dev releases, but tests will need  
> to pass. There was never a stipulation that all bugs need to have  
> been fixed. But all tests need to pass, so in an ideal world (in  
> which everything is being tested) all tests passing would imply all  
> (known) bugs fixed. Obviously, we don't live in an ideal world ...

...particularly when it comes to network-related tests and remote  
server problems (but those are by default not run, so there is a way  
around test fails there).  I agree here as well (all tests must  
pass).  As for the bug fixes, we can just stipulate which ones were  
fixed with the release (in a RELEASE_NOTES or similar), and maybe  
have TODO's in the test suite designating they are being worked on.

Basically, at regular intervals, maybe with a few weeks of lead time,  
the pumpkin would announce an impending dev. release.  Go through  
rounds of tests, bug fixes, etc.  When all tests pass post it on CPAN  
as a dev. release.  If we have a stable release branch with relevant  
bug fixes we can post that as well, again to the point where it  
becomes too problematic.

Would we just take a snapshot of MAIN and any relevant stable branch  
at that particular point for the CPAN release, just increasing the  
version number (1.x.y)?  Would it make sense to have a 1.x.y branch  
for each release (I don't think so, but maybe others disagree)?

> If not everything passes then what is the big difference to a code  
> snapshot? If using cvs (or svn) is too difficult for most people,  
> we can consider creating a mechanism that puts up nightly snapshots  
> for download.

If we feel a nightly snapshot is warranted we could do that though.   
I personally don't think there is a need, particularly since we have  
several means to obtain the latest code at any point in time  
(including the browsable CVS 'Download tarball').  We could state the  
next dev/stable CPAN release (pending on date dd/mm/yy) will have the  
bug fix, and if they want it immediately then pick it up from CVS.

>> -- snip --
>>
>> I agree, although would the dev releases still need to pass all the
>> tests? I'm thinking of people installing via CPAN.
>
> For example, that's another point.
>
>  	-hilmar

Yes, I agree.

As an aside, I don't think dev. releases pop up when you run a simple  
'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may  
know the answer to that.

chris 

From cjfields at uiuc.edu  Thu Jul  5 10:34:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:34:22 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>


On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:

>
> One more suggestion:
>
> It would be extemaly useful if we had a standard way of testing  
> that a when a
> file is read into a bioperl object and then written out again into  
> a same
> format, the input and output files are identical. If not, the test  
> should
> show where the the differences start (showing all the differences  
> would just
> clutter the screen).
>
> This standard method/subroutine should be used to test all sequence  
> and other
> text file IO.
>
> Any takers?
>
> 	-Heikki
...

I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
that do some checking, I think, but something like this would be of  
use.  However, what if the test file is old (as many in t/data are)  
and the format has changed?  GenBank and EMBL, for instance, have  
gone through several changes to format.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:43:51 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:43:51 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <468D03A7.3090408@sheffield.ac.uk>

Chris Fields wrote:
-- snip --

>>>
>>> I agree, although would the dev releases still need to pass all the
>>> tests? I'm thinking of people installing via CPAN.
>>
>> For example, that's another point.
>>
>>      -hilmar
> 
> Yes, I agree.
> 
> As an aside, I don't think dev. releases pop up when you run a simple 
> 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know 
> the answer to that.
> 
> chris


Thats right, it'll only install the non-developer releases (1.4 
currently). If you want to install the developer release from CPAN you 
need to know the path the archive and then do:

cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz

as detailed on the wiki:
http://www.bioperl.org/wiki/Release_1.5.2

Nath

From cjfields at uiuc.edu  Thu Jul  5 10:49:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:49:33 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFEB0.80201@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>


On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:

> Sendu Bala wrote:
>> ...
>> Yes, add them as recommended (or perhaps 'build_requires') modules in
>> Build.PL, then run Build.PL and install the modules when it asks you.
>>
>> Everything should be in Build.PL already. If I missed something,  
>> please
>> add it.
>>
>
> OK, to clarify using the test file Sendu mentioned in a previous post:
> t/SeqIO.t
>
> This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> IO::String
> are not installed (the first two are not mentioned in Build.PL).
> However, if there are a lot of such skips in the whole test suite then
> there maybe few system with all these modules installed in order to
> conduct a complete test. These are the modules I'm referring to.
>
> Nath

If they are only necessary for tests, work for all OSs, and are pure  
Perl they should be added to t/lib, like Test::More and the rest.  If  
they only work for some OSs they could be added to t/lib and skip  
based on OS, but they still must be pure Perl.  I would avoid  
anything that requires any compiling for XS or Inline altogether (I  
don't want to go down the nightmare road of OS-dependent compiler  
issues for a few tests).

Finally, if they are needed for core modules (not just tests) then  
they should be added to the core prereqs in Build.

chris

From cjfields at uiuc.edu  Thu Jul  5 10:52:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:52:58 -0500
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CEC72.4090909@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>


On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:

> ...
>
> So its my understanding there will be absolutely no difference in
> behaviour following this change (except that warning can be caught by
> Test::Warn). I just wanted to confirm my understanding.

You can always just try it out and run tests.  Might be interesting  
to see if anything breaks.

chris

From N.Haigh at sheffield.ac.uk  Thu Jul  5 10:58:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 15:58:30 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
> 
> >
> > One more suggestion:
> >
> > It would be extemaly useful if we had a standard way of testing  
> > that a when a
> > file is read into a bioperl object and then written out again into  
> > a same
> > format, the input and output files are identical. If not, the test  
> > should
> > show where the the differences start (showing all the differences  
> > would just
> > clutter the screen).
> >
> > This standard method/subroutine should be used to test all sequence  
> > and other
> > text file IO.
> >
> > Any takers?
> >
> > 	-Heikki
> ...
> 
> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
> that do some checking, I think, but something like this would be of  
> use.  However, what if the test file is old (as many in t/data are)  
> and the format has changed?  GenBank and EMBL, for instance, have  
> gone through several changes to format.
> 
> chris
> 
> 

Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes?

Nath

From N.Haigh at sheffield.ac.uk  Thu Jul  5 11:04:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 16:04:30 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:
> 
> > Sendu Bala wrote:
> >> ...
> >> Yes, add them as recommended (or perhaps 'build_requires') modules in
> >> Build.PL, then run Build.PL and install the modules when it asks you.
> >>
> >> Everything should be in Build.PL already. If I missed something,  
> >> please
> >> add it.
> >>
> >
> > OK, to clarify using the test file Sendu mentioned in a previous post:
> > t/SeqIO.t
> >
> > This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> > IO::String
> > are not installed (the first two are not mentioned in Build.PL).
> > However, if there are a lot of such skips in the whole test suite then
> > there maybe few system with all these modules installed in order to
> > conduct a complete test. These are the modules I'm referring to.
> >
> > Nath
> 
> If they are only necessary for tests, work for all OSs, and are pure  
> Perl they should be added to t/lib, like Test::More and the rest.  If  
> they only work for some OSs they could be added to t/lib and skip  
> based on OS, but they still must be pure Perl.  I would avoid  
> anything that requires any compiling for XS or Inline altogether (I  
> don't want to go down the nightmare road of OS-dependent compiler  
> issues for a few tests).

If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!?

> 
> Finally, if they are needed for core modules (not just tests) then  
> they should be added to the core prereqs in Build.
> 
> chris
> 


From bix at sendu.me.uk  Thu Jul  5 11:13:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:13:35 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <468D0A9F.4010709@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Chris Fields <cjfields at uiuc.edu>:
>>> OK, to clarify using the test file Sendu mentioned in a previous
>>> post: t/SeqIO.t
>>> 
>>> This test skips tests if Algorithm::Diff, IO::ScalarArray or 
>>> IO::String are not installed
>> 
>> If they are only necessary for tests, work for all OSs, and are
>> pure Perl they should be added to t/lib, like Test::More and the
>> rest.  If they only work for some OSs they could be added to t/lib
>> and skip based on OS, but they still must be pure Perl.  I would
>> avoid anything that requires any compiling for XS or Inline
>> altogether (I don't want to go down the nightmare road of
>> OS-dependent compiler issues for a few tests).
> 
> If this is the case, there surely is no need to skip the tests if
> they should be provided in the t/lib dir. Am I missing something!?

That skip in SeqIO.t is new and I simply didn't think of them as 
important enough to make anyone install them or include them in t/lib.

I'd go ahead and add those modules, but like I say, it may make more 
sense just to use is_deeply(), removing the dependency on 
Algorithm::Diff and IO::ScalarArray completely.

From cjfields at uiuc.edu  Thu Jul  5 11:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:35:41 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <F97172F8-F59A-4CCD-9BBD-B763675EB92F@uiuc.edu>


On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote:

> ...
>> If they are only necessary for tests, work for all OSs, and are pure
>> Perl they should be added to t/lib, like Test::More and the rest.  If
>> they only work for some OSs they could be added to t/lib and skip
>> based on OS, but they still must be pure Perl.  I would avoid
>> anything that requires any compiling for XS or Inline altogether (I
>> don't want to go down the nightmare road of OS-dependent compiler
>> issues for a few tests).
>
> If this is the case, there surely is no need to skip the tests if  
> they should be provided in the t/lib dir. Am I missing something!?

No, you are correct, but these are currently not in t/lib (unless  
someone snuck them in....)

Of the modules you listed above, only one (IO::String) is required by  
the core modules.  The others are not.  Users shouldn't be forced to  
install Algorithm::Diff or IO::ScalarArray just to run tests, so  
anything not required should go into t/lib if at all possible.

If there any reasons (OS issues, list of prereqs) which preclude  
adding these to t/lib we need to ask ourselves (1) why we are using  
that module in the first place?  And, if there is a good reason, (2)  
can we skip them if they aren't present?  Both of those options are  
already available.

chris

From cjfields at uiuc.edu  Thu Jul  5 11:50:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:50:55 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468D006D.6050806@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
	<468D006D.6050806@sheffield.ac.uk>
Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu>


On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote:

> ...
>> I actually like Sendu's idea more, or the idea of each test suite  
>> having it's own directory.
>> Tests which need to guess/validate the format are probably best  
>> left sequestered to a specific suite focused on format guessing/ 
>> validation, at least in my opinion.
>> chris
>
>
> How easily would this lend itself to using the same data for  
> multiple tests, or is it likely to lead to/exacerbate a culture of  
> adding duplicate data files in each "test suite" rather than reusing?
>
> Nath

If there is a group of test data used for more than one test suite we  
can group those together into a common use folder, or we can go by  
format.  I'm pretty open to anything, really, as long as it is more  
organized.

My point is really concerned more with validation/guessing.  I think  
we should limit those tests to their respective specific test suites,  
or even to sections within a particular test suite (for instance,  
genbank.t), but not to force sequence guessing or validation in other  
cases.  To me validation, guessing, and parsing are three distinct  
issues (much like XML parsers handle things), so they require three  
distinct tests.

As for true sequence validation, there is no official format  
validation scheme yet in BioPerl.  It's sort of unofficially  
intergrated into the sequence parsers themselves (something which I  
find to be problematic for several reasons too long to outline here).

chris


From cjfields at uiuc.edu  Thu Jul  5 11:54:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:54:42 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
	<1183647510.468d07168963c@webmail.shef.ac.uk>
Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu>


On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:

> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> 	-Heikki
>> ...
>>
>> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use.  However, what if the test file is old (as many in t/data are)
>> and the format has changed?  GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just  
> layout? e.g. a version number of the likes?
>
> Nath

I don't think so; this veers back into the whole validation issue  
(i.e. does the record fit certain specifications).  There are  
examples of seq records from different sources which bioperl is  
expected to parse, for example Ensembl GenBank records.  Some of  
those have feature tags or annotation fields which may not appear in  
output when using write_seq().

I don't think it's as important to replicate the output data exactly  
like the input as much as it's important to have the data represented  
in a Bio::Seq object (or any other Bio* instance) in a consistent  
manner and have the ability to incorporate new fields (such as the  
recent addition of genome projects) transparently.  The latter is  
hard to do with the current genbank parser (you have to specifically  
code for it), but it is a bit easier to do with the driver-handler  
model I'm working on.

chris

From bix at sendu.me.uk  Thu Jul  5 11:56:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:56:29 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <468D14AD.8050007@sendu.me.uk>

Sendu Bala wrote:
> Sendu Bala wrote:
>> Nathan S. Haigh wrote:
>>> Thinking about this a little more, I think it would be a good idea to 
>>> include Test::Exception in t/lib.
>> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
> 
> I've now done that: BioperlTest loads Test::Exception, from the copy in 
> t/lib if necessary.
> 
> So, in BioperlTest-using scripts you now have access to the methods 
> dies_ok, lives_ok, throws_ok and lives_and.

And I've also now added in support for Test::Warn, giving you 
warning_is, warnings_are, warning_like and warnings_like.

I've updated the HOWTO as well:
http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

You can see these things in action in t/seq_quality.t

From bix at sendu.me.uk  Thu Jul  5 11:57:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:57:23 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
	<2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
Message-ID: <468D14E3.6030104@sendu.me.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:
> 
>> ...
>>
>> So its my understanding there will be absolutely no difference in
>> behaviour following this change (except that warning can be caught by
>> Test::Warn). I just wanted to confirm my understanding.
> 
> You can always just try it out and run tests.  Might be interesting to 
> see if anything breaks.

I've made the change. Everything seems ok as far as I can tell.

From dmessina at wustl.edu  Thu Jul  5 12:02:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:02:26 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>


On Jul 5, 2007, at 9:33 AM, Chris Fields wrote:
> I agree, but I think there is still an expectation that 1.5.2 and
> beyond are more like true 'stable' releases even though we still
> designate them as 'developer.'   We unfortunately reinforce that when
> we tell users they need to update to v. 1.5.2 or bioperl-live to fix
> a particular bug in the 1.4 release.

I know this has been discussed before, but while we're talking about  
future release plans, it might be worth revisiting the BioPerl policy  
of designating only even-numbered releases as 'stable'. It's taking  
so long to get from 1.4 to 1.6. While the principle of keeping a  
stable API between 'stable' releases is valid in the ideal case, I  
think that continuing to label 1.5.2 (or whatever the latest 'dev'  
release is) as a developer release (which implies potentially  
unstable or bleeding-edge code) is highly misleading since we would  
never ever tell anyone to get 1.4 instead.

Alternatively, if we adopt a more aggressive release schedule as  
Chris proposed a couple days ago, then perhaps we could agree to push  
out an even-numbered release once a year or so, so that there is a  
'stable' release we could recommend.


> If we feel a nightly snapshot is warranted we could do that though.
> I personally don't think there is a need, particularly since we have
> several means to obtain the latest code at any point in time
> (including the browsable CVS 'Download tarball').  We could state the
> next dev/stable CPAN release (pending on date dd/mm/yy) will have the
> bug fix, and if they want it immediately then pick it up from CVS.

To make it easier for people to obtain the latest tarball, we could  
put the 'download tarball' link directly on the 'Getting_BioPerl'  
wiki page instead of only a link to the viewcvs interface. That way  
they wouldn't have to navigate the source tree to figure out which  
tarball they want (which is almost always going to be the bioperl- 
live tarball).

I think the actual URL underlying the 'Download tarball' link on  
viewcvs is stable:

	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
live.tar.gz?tarball=1


Dave

From cjfields at uiuc.edu  Thu Jul  5 12:13:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:13:30 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
Message-ID: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>


On Jul 5, 2007, at 11:02 AM, David Messina wrote:

> ...
> I know this has been discussed before, but while we're talking  
> about future release plans, it might be worth revisiting the  
> BioPerl policy of designating only even-numbered releases as  
> 'stable'. It's taking so long to get from 1.4 to 1.6. While the  
> principle of keeping a stable API between 'stable' releases is  
> valid in the ideal case, I think that continuing to label 1.5.2 (or  
> whatever the latest 'dev' release is) as a developer release (which  
> implies potentially unstable or bleeding-edge code) is highly  
> misleading since we would never ever tell anyone to get 1.4 instead.
>
> Alternatively, if we adopt a more aggressive release schedule as  
> Chris proposed a couple days ago, then perhaps we could agree to  
> push out an even-numbered release once a year or so, so that there  
> is a 'stable' release we could recommend.

I think the idea of 'stable' is best summarized back in Hilmar's post  
(i.e. we support a particular API for that release).  The 1.5  
releases I believe break some aspects of 1.4 API (some of the Feature/ 
Annotation stuff introduced before the official 1.5 release).  We  
still need to address some of those issues before a 1.6 which seems  
to be the only real stumbling block, but they are unfortunately not  
well-documented and are somewhat interwoven with GMOD code.

> ...
> To make it easier for people to obtain the latest tarball, we could  
> put the 'download tarball' link directly on the 'Getting_BioPerl'  
> wiki page instead of only a link to the viewcvs interface. That way  
> they wouldn't have to navigate the source tree to figure out which  
> tarball they want (which is almost always going to be the bioperl- 
> live tarball).
>
> I think the actual URL underlying the 'Download tarball' link on  
> viewcvs is stable:
>
> 	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
> live.tar.gz?tarball=1
>
> Dave

Sounds reasonable enough.  Do you want to do the honors?

chris


From dmessina at wustl.edu  Thu Jul  5 12:44:28 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:44:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>


> [Chris]
> The 1.5 releases I believe break some aspects of 1.4 API

Yes, this is true.

I question, though, whether it's relevant given that virtually no one  
uses 1.4 anymore. In any case, I would venture that the number of  
people who would be bitten by the 1.4->1.5 API change is much smaller  
than the number of people who download 1.4 and then ask us why it  
doesn't work.

I think that, rather than continuing to call 1.5.x the developer  
release in order to adhere to the API guarantee, it would be much  
clearer to users if we state clearly that everyone should download  
1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
changes.


>> [me]
>> we could put the 'download tarball' link directly on the  
>> 'Getting_BioPerl' wiki page
>
> [Chris]
> Sounds reasonable enough.  Do you want to do the honors?

Done.


Dave


From cjfields at uiuc.edu  Thu Jul  5 12:57:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:57:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>

On Jul 5, 2007, at 11:44 AM, David Messina wrote:

>
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no  
> one uses 1.4 anymore. In any case, I would venture that the number  
> of people who would be bitten by the 1.4->1.5 API change is much  
> smaller than the number of people who download 1.4 and then ask us  
> why it doesn't work.
>
> I think that, rather than continuing to call 1.5.x the developer  
> release in order to adhere to the API guarantee, it would be much  
> clearer to users if we state clearly that everyone should download  
> 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
> changes.

You'd be surprised how many are still using bioperl 1.2.3 (Ensembl)  
and 1.4 (any admin too scared to go with a 'dev' release).  The real  
answer is to get out a stable 1.6 ASAP.  The problem we currently  
have is (horrible Texas pun) 'too many pokers in the fire.'  We have  
svn migration, major changes in the test suite, talk about splitting  
bioperl, a lot of bugs to sort through, new code to add or work on,  
etc.  Not to mention our $jobs!

I think we should just bite the bullet and proceed with pulling out  
the controversial operator overloading in Bio::Annotation*, deprecate  
the tag methods in AnnotatableI, and go about fixing everything up.   
If that occurs (which seems to be the major impediment) and we get  
GMOD/GBrowse playing well with BioPerl then we can aim for a new  
stable release, and then institute a regular release cycle.

chris

From bpederse at gmail.com  Thu Jul  5 13:58:24 2007
From: bpederse at gmail.com (Brent Pedersen)
Date: Thu, 5 Jul 2007 10:58:24 -0700
Subject: [Bioperl-l] slippy map for genomic features.
Message-ID: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>

hi,
here's a side project i've been tinkering on in googlecode svn that
may be useful to some.
http://code.google.com/p/genome-browser/
it's a simple hack on top of OpenLayers (openlayers.org) to provide a
javascript slippy map interface and API to view and browse genomic
features. It can be used with any image generation program that can
accept &xmin= and &xmax= parameters through the url. -- though i
havent had it working it bioperl as bioperl generates images of
different height depending on the number of tracks.

there's a live example of the code in SVN here:
http://toxic.berkeley.edu/bpederse/genome-browser/
with images generated by a colleague's modules on first request. those
images are then cached by a simple perl script included in the SVN
repo. all subsequent requests are returned from the cache.
an image request (automatically generated by the javascript) looks like:
http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
but any implementation need only implement xmin and xmax. all other
parameters will be used for caching but are not required.

if anyone is interested in getting this going with bioperl image
generation--or improving the project in any way, let me know and i'll
add you as a committer and provide any javascript support that i can.

-brent

tar ball download:
http://genome-browser.googlecode.com/files/genome-browser-0.02.tar

From dmessina at wustl.edu  Thu Jul  5 14:39:16 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 13:39:16 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <DD6F2CE5-FE79-48D2-9410-FACA35AFEF9C@wustl.edu>

> The real answer is to get out a stable 1.6 ASAP.  The problem we  
> currently have is (horrible Texas pun) 'too many pokers in the  
> fire.'  We have svn migration, major changes in the test suite,  
> talk about splitting bioperl, a lot of bugs to sort through, new  
> code to add or work on, etc.  Not to mention our $jobs!

Yep, I hear ya.


> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*,  
> deprecate the tag methods in AnnotatableI, and go about fixing  
> everything up.  If that occurs (which seems to be the major  
> impediment) and we get GMOD/GBrowse playing well with BioPerl then  
> we can aim for a new stable release, and then institute a regular  
> release cycle.

That's a great plan. You're right -- better to devote energy to 1.6  
than to interim solutions.

Alright, I give, I give! :)
Dave

From glauberwagner at yahoo.com.br  Thu Jul  5 15:56:43 2007
From: glauberwagner at yahoo.com.br (Glauber Wagner)
Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART)
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com>

Dear All,

I have a problem if Bio::DB::Query::GenBank module. I
am trying to count the number of protein sequences and
the module did not return the expected number by count
object.

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

$query_string = "Trypanosoma cruzi[Organism]";

  my $query =
Bio::DB::Query::GenBank->new(-db=>'protein',
                                           
-query=>$query_string);
   my $count = $query->count;
   my @ids   = $query->ids;

print "$count\n";

Thanks.
Glauber


____________________________________________________________________________________
Novo Yahoo! Cad?? - Experimente uma nova busca.
http://yahoo.com.br/oqueeuganhocomisso 

From cjfields at uiuc.edu  Thu Jul  5 16:21:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 15:21:49 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>

NCBI esearch doesn't seem to be working at the moment.  I'm getting  
'Internal Server Error' at this time.  Try back again at a later point.

chris

On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:

> Dear All,
>
> I have a problem if Bio::DB::Query::GenBank module. I
> am trying to count the number of protein sequences and
> the module did not return the expected number by count
> object.
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query_string = "Trypanosoma cruzi[Organism]";
>
>   my $query =
> Bio::DB::Query::GenBank->new(-db=>'protein',
>
> -query=>$query_string);
>    my $count = $query->count;
>    my @ids   = $query->ids;
>
> print "$count\n";
>
> Thanks.
> Glauber
>
>
>
>
> ______________________________________________________________________ 
> ______________
> Novo Yahoo! Cad?? - Experimente uma nova busca.
> http://yahoo.com.br/oqueeuganhocomisso
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mitch_skinner at berkeley.edu  Thu Jul  5 17:22:38 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 05 Jul 2007 14:22:38 -0700
Subject: [Bioperl-l] slippy map for genomic features.
In-Reply-To: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
References: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
Message-ID: <468D611E.7020904@berkeley.edu>

Hi,

FWIW, we've been working on something similar:
http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html
based on GBrowse/Bio::Graphics and javascript that Andrew wrote from 
scratch (with the prototype library).  When our project was starting up 
(fall 05) Andrew looked but didn't find openlayers; I'm not sure if it 
was public back then but their current svn only goes back to 2006.

I think that things like layout (bumping) ought to be done in advance on 
a chromosome-wide basis; otherwise it's difficult to keep features from 
ending up at different heights on neighboring tiles.  And it would be 
difficult for the server to know what was being clicked on.  So we've 
been doing some up-front work to either do layout or to just render all 
the tiles in advance:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup
which is driven by this script:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup

Or you could just not bump at all, I guess.  I think of that as 
important functionality but I'd be interested in hearing about use cases 
where it's not necessary.  It's not just bumping, though; things like 
text labels also make it difficult to predict exactly what pixels a 
feature will span if you only have its genomic coordinates.

To make features clickable we've been using imagemaps; it simplifies the 
server code but it bogs down the client quite a bit.

I'd certainly be interested in seeing if there are ways we could work 
together; if you're at Berkeley maybe we could meet.

Regards,
Mitch

Brent Pedersen wrote:
> hi,
> here's a side project i've been tinkering on in googlecode svn that
> may be useful to some.
> http://code.google.com/p/genome-browser/
> it's a simple hack on top of OpenLayers (openlayers.org) to provide a
> javascript slippy map interface and API to view and browse genomic
> features. It can be used with any image generation program that can
> accept &xmin= and &xmax= parameters through the url. -- though i
> havent had it working it bioperl as bioperl generates images of
> different height depending on the number of tracks.
>
> there's a live example of the code in SVN here:
> http://toxic.berkeley.edu/bpederse/genome-browser/
> with images generated by a colleague's modules on first request. those
> images are then cached by a simple perl script included in the SVN
> repo. all subsequent requests are returned from the cache.
> an image request (automatically generated by the javascript) looks like:
> http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
> but any implementation need only implement xmin and xmax. all other
> parameters will be used for caching but are not required.
>
> if anyone is interested in getting this going with bioperl image
> generation--or improving the project in any way, let me know and i'll
> add you as a committer and provide any javascript support that i can.
>
> -brent
>
> tar ball download:
> http://genome-browser.googlecode.com/files/genome-browser-0.02.tar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From cjfields at uiuc.edu  Thu Jul  5 17:42:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 16:42:40 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
	<190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu>

Update: seems to be back up.  Give it a try now.

chris

On Jul 5, 2007, at 3:21 PM, Chris Fields wrote:

> NCBI esearch doesn't seem to be working at the moment.  I'm getting
> 'Internal Server Error' at this time.  Try back again at a later  
> point.
>
> chris
>
> On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:
>
>> Dear All,
>>
>> I have a problem if Bio::DB::Query::GenBank module. I
>> am trying to count the number of protein sequences and
>> the module did not return the expected number by count
>> object.
>>
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>>
>> $query_string = "Trypanosoma cruzi[Organism]";
>>
>>   my $query =
>> Bio::DB::Query::GenBank->new(-db=>'protein',
>>
>> -query=>$query_string);
>>    my $count = $query->count;
>>    my @ids   = $query->ids;
>>
>> print "$count\n";
>>
>> Thanks.
>> Glauber
>>
>>
>>
>>
>> _____________________________________________________________________ 
>> _
>> ______________
>> Novo Yahoo! Cad?? - Experimente uma nova busca.
>> http://yahoo.com.br/oqueeuganhocomisso
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Jul  6 03:09:17 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 08:09:17 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <468DEA9D.6010809@sheffield.ac.uk>

David Messina wrote:
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>>     
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no one  
> uses 1.4 anymore. In any case, I would venture that the number of  
> people who would be bitten by the 1.4->1.5 API change is much smaller  
> than the number of people who download 1.4 and then ask us why it  
> doesn't work.
>   

I'm not really up-to-speed with how the API should remain stable etc. Is 
the idea that the API should be stable from 1.4 though the 1.5 dev and 
then the next stale release can change that API? So any stable to stable 
upgrade could involve an API change while a stable to dev upgrade should 
have the same API? Does a stable API mean that the same method calls are 
available in a newer release....what about adding new methods to a newer 
release?

How are these API changes currently tracked? It seems to me that 
Test::More might be able to help in testing the API:

can_ok($module, @methods);


Nath


From n.haigh at sheffield.ac.uk  Fri Jul  6 07:10:14 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 12:10:14 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
Message-ID: <468E2316.1030804@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm taking a look at the tests for Bio::Variation::RNAChange.

If you create a new oject without arguments:
my $obj = Bio::Variation::RNAChange->new();

What do you expect the following to return:
$obj->label();

I thought it would probably be:
'inframe'

However you get:
'inframe, deletion'

Can anyone in the know explain what behaviour would be expected?

Cheers
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit
B8DxDViDOcx2gTFjSwQ2kNg=
=SroY
-----END PGP SIGNATURE-----

From n.haigh at sheffield.ac.uk  Fri Jul  6 08:54:33 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 13:54:33 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E2316.1030804@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
Message-ID: <468E3B89.3090202@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan S. Haigh wrote:
> I'm taking a look at the tests for Bio::Variation::RNAChange.
> 
> If you create a new oject without arguments:
> my $obj = Bio::Variation::RNAChange->new();
> 
> What do you expect the following to return:
> $obj->label();
> 
> I thought it would probably be:
> 'inframe'
> 
> However you get:
> 'inframe, deletion'
> 
> Can anyone in the know explain what behaviour would be expected?
> 
> Cheers
> Nath

Following on from this, AAChange has the following two methods:
add_Allele() and allele_mut()

It appears that allele_mut is only capable of remembering 1 allele at a
time, whereas add_Allele() is provided to add support for mutliple
alleles - is that correct?

However, add_Allele() also calls allele_mut(), such that mutliple calls
to add_Allele will result in the overwriting of the allele being
remembered by allele_mut(). Things are further complicated by the fact
that label() uses allele_mut() to decide on the label to return.
Shouldn't label know aout multiple alleles set by multiple calls to
add_Allele?

It may be my lack of understanding alleles and what these classes are
intending to do, but trying to rewrite the test scripts to improve code
coverage has let me a little confused!

Thanks
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I
b8ZOENvDDDIxphAoxeKg8/E=
=f/sa
-----END PGP SIGNATURE-----

From tanzeem.mb at gmail.com  Thu Jul  5 02:39:34 2007
From: tanzeem.mb at gmail.com (tanzeem)
Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT)
Subject: [Bioperl-l] Problem working with remoteblast submit method in
 webbrowser.
In-Reply-To: <11114623.post@talk.nabble.com>
References: <11114623.post@talk.nabble.com>
Message-ID: <11441586.post@talk.nabble.com>


Ifound it myself.run apache as root and disable selinux, the problem will not
recur.

tanzeem wrote:
> 
>  I have a program which uses the Bio perl remoteblast module which
> compares a aminoacid  fasta file with swissprot database. The
> submit_blast() method  works successfully when   run  from commandline.But
> when the program is run from web browser it returns -1. I was trying to
> adapt the code from Remoteblast synopsis for my need.
> 

-- 
View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Fri Jul  6 09:00:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 06 Jul 2007 09:00:32 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <1183726832.2566.34.camel@localhost.localdomain>

On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
> 
> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*, deprecate  
> the tag methods in AnnotatableI, and go about fixing everything up.   
> If that occurs (which seems to be the major impediment) and we get  
> GMOD/GBrowse playing well with BioPerl then we can aim for a new  
> stable release, and then institute a regular release cycle.
> 
I think this sounds like a good idea to me too.  I'm planning on having
a GMOD hackathon at the end of the summer; if I had a new API by then,
we could focus on fixing anything that gets broken by the changes.

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070706/d77c2d90/attachment.bin 

From cjfields at uiuc.edu  Fri Jul  6 09:10:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 6 Jul 2007 08:10:41 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
Message-ID: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>


On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:

> David Messina wrote:
>>> [Chris]
>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>
>>
>> Yes, this is true.
>>
>> I question, though, whether it's relevant given that virtually no one
>> uses 1.4 anymore. In any case, I would venture that the number of
>> people who would be bitten by the 1.4->1.5 API change is much smaller
>> than the number of people who download 1.4 and then ask us why it
>> doesn't work.
>>
>
> I'm not really up-to-speed with how the API should remain stable  
> etc. Is
> the idea that the API should be stable from 1.4 though the 1.5 dev and
> then the next stale release can change that API? So any stable to  
> stable
> upgrade could involve an API change while a stable to dev upgrade  
> should
> have the same API? Does a stable API mean that the same method  
> calls are
> available in a newer release....what about adding new methods to a  
> newer
> release?
>
> How are these API changes currently tracked? It seems to me that
> Test::More might be able to help in testing the API:
>
> can_ok($module, @methods);
>
>
> Nath	

It's basically a 'contract' of sorts between the devs (us) and users  
(us/them) that the API won't change for the extent of that release  
series, thus ensuring any scripts out there generating tons of data  
won't break down if they attempt to call a renamed method.  We try to  
maintain the API state anyway for those reasons, but in a dev release  
series we might decide to change some method names for consistency  
and deprecate older ambiguously-named methods (see below).  For a  
stable release it's critical the API remain intact.

There are a few methods which are considered deprecated or will be  
deprecated.  For instance, we recently talked about changes to method  
names which use case to specify whether you're receiving an object  
(get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
nested list, or whether to use each_* vs next_* for iterators.   
Consistency is nice!

chris 

From heikki at sanbi.ac.za  Fri Jul  6 09:20:26 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 6 Jul 2007 15:20:26 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E3B89.3090202@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
Message-ID: <200707061520.27000.heikki@sanbi.ac.za>

Hi Nat,

These modules have not been touched for a while and were developed for a 
specific task. A revire is defiitely in order.

The way RNAChange->label was written, it should return 'inframe' when given no 
alleles, but 'no change' would actually be better.

The multiple alleles were originally though to be a good idea, but the 
vocabulary for labels was developed for single allele, only, The use of the 
module ended up being limited to single allele, so add_allele() behaviour was  
conveniently ignored but not removed. :(

	-Heikki


On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> Nathan S. Haigh wrote:
> > I'm taking a look at the tests for Bio::Variation::RNAChange.
> >
> > If you create a new oject without arguments:
> > my $obj = Bio::Variation::RNAChange->new();
> >
> > What do you expect the following to return:
> > $obj->label();
> >
> > I thought it would probably be:
> > 'inframe'
> >
> > However you get:
> > 'inframe, deletion'
> >
> > Can anyone in the know explain what behaviour would be expected?
> >
> > Cheers
> > Nath
>
> Following on from this, AAChange has the following two methods:
> add_Allele() and allele_mut()
>
> It appears that allele_mut is only capable of remembering 1 allele at a
> time, whereas add_Allele() is provided to add support for mutliple
> alleles - is that correct?
>
> However, add_Allele() also calls allele_mut(), such that mutliple calls
> to add_Allele will result in the overwriting of the allele being
> remembered by allele_mut(). Things are further complicated by the fact
> that label() uses allele_mut() to decide on the label to return.
> Shouldn't label know aout multiple alleles set by multiple calls to
> add_Allele?
>
> It may be my lack of understanding alleles and what these classes are
> intending to do, but trying to rewrite the test scripts to improve code
> coverage has let me a little confused!
>
> Thanks
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From schlesi at ebi.ac.uk  Fri Jul  6 10:24:05 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Fri, 6 Jul 2007 15:24:05 +0100
Subject: [Bioperl-l] Unrooting a tree
Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>

Hi,

I am reading a rooted tree in newick format from a string (i.e. a
bifurcation at the root) and would like to unroot it (i.e. a
trifurcation at the root). I tried getting a grandchild of the root
and adding it as a direct child, but that does not seem to work (the
root still only has two descendents and the tree structure gets messed
up). Is there a nice way to do this directly in bioperl? Doing it on
the newick string is possible of course, but not nice.

Thanks
  Felix

From n.haigh at sheffield.ac.uk  Fri Jul  6 11:37:19 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:37:19 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
Message-ID: <468E61AF.9040106@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:
> 
> On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:
> 
>> David Messina wrote:
>>>> [Chris]
>>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>>
>>>
>>> Yes, this is true.
>>>
>>> I question, though, whether it's relevant given that virtually no one
>>> uses 1.4 anymore. In any case, I would venture that the number of
>>> people who would be bitten by the 1.4->1.5 API change is much smaller
>>> than the number of people who download 1.4 and then ask us why it
>>> doesn't work.
>>>
>>
>> I'm not really up-to-speed with how the API should remain stable etc. Is
>> the idea that the API should be stable from 1.4 though the 1.5 dev and
>> then the next stale release can change that API? So any stable to stable
>> upgrade could involve an API change while a stable to dev upgrade should
>> have the same API? Does a stable API mean that the same method calls are
>> available in a newer release....what about adding new methods to a newer
>> release?
>>
>> How are these API changes currently tracked? It seems to me that
>> Test::More might be able to help in testing the API:
>>
>> can_ok($module, @methods);
>>
>>
>> Nath   
> 
> It's basically a 'contract' of sorts between the devs (us) and users
> (us/them) that the API won't change for the extent of that release
> series, thus ensuring any scripts out there generating tons of data
> won't break down if they attempt to call a renamed method.  We try to
> maintain the API state anyway for those reasons, but in a dev release
> series we might decide to change some method names for consistency and
> deprecate older ambiguously-named methods (see below).  For a stable
> release it's critical the API remain intact.

Hmm, still not 100% clear - it is Friday!

So, someone running a script that was designed when 1.4 was released
should still be able to run their script for all future releases. So all
changes need to be backward compatible?

So you have several situations regarding method names:
1) Adding new methods should e fine since past scripts don't know about
them and won't have used them
2) Removing methods would break past scripts that used them
3) Renamed methods would break past scripts that used the old name

A stable API to me, means the same method calls should still be able to
accept the same arguments (inc the constructor) and return the same
object/data etc.

What if a module is pretty outdated and would benefit from a rewrite -
should all the old method names be included, what if this makes coding
difficult?

> 
> There are a few methods which are considered deprecated or will be
> deprecated.  For instance, we recently talked about changes to method
> names which use case to specify whether you're receiving an object
> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested
> list, or whether to use each_* vs next_* for iterators.  Consistency is
> nice!
> 

You mean the use of case to signify objects vs data being returned are
to be deprecated or encouraged? What was the outcome of the each_* vs
next_*?

Nath


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk
kAWH1zVa1ycopijl761cvkQ=
=fppH
-----END PGP SIGNATURE-----

From n.haigh at sheffield.ac.uk  Fri Jul  6 11:43:41 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:43:41 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
Message-ID: <468E632D.4090801@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heikki Lehvaslaiho wrote:
> Hi Nat,
> 
> These modules have not been touched for a while and were developed for a 
> specific task. A revire is defiitely in order.
> 
> The way RNAChange->label was written, it should return 'inframe' when given no 
> alleles, but 'no change' would actually be better.

Wouldn't this effectively be changing the API since past scripts "could"
expect "inframe" to be returned.

> 
> The multiple alleles were originally though to be a good idea, but the 
> vocabulary for labels was developed for single allele, only, The use of the 
> module ended up being limited to single allele, so add_allele() behaviour was  
> conveniently ignored but not removed. :(

So add_Allele() and each_Allele() should be deprecated in favour of
allele_mut()?

- From my post about API's.....how should the capitalisation of
add_Allele() and each_Allele() be changed?

Cheers
Nath


> 
> 	-Heikki
> 
> 
> 
> On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
>> Nathan S. Haigh wrote:
>>> I'm taking a look at the tests for Bio::Variation::RNAChange.
>>>
>>> If you create a new oject without arguments:
>>> my $obj = Bio::Variation::RNAChange->new();
>>>
>>> What do you expect the following to return:
>>> $obj->label();
>>>
>>> I thought it would probably be:
>>> 'inframe'
>>>
>>> However you get:
>>> 'inframe, deletion'
>>>
>>> Can anyone in the know explain what behaviour would be expected?
>>>
>>> Cheers
>>> Nath
>> Following on from this, AAChange has the following two methods:
>> add_Allele() and allele_mut()
>>
>> It appears that allele_mut is only capable of remembering 1 allele at a
>> time, whereas add_Allele() is provided to add support for mutliple
>> alleles - is that correct?
>>
>> However, add_Allele() also calls allele_mut(), such that mutliple calls
>> to add_Allele will result in the overwriting of the allele being
>> remembered by allele_mut(). Things are further complicated by the fact
>> that label() uses allele_mut() to decide on the label to return.
>> Shouldn't label know aout multiple alleles set by multiple calls to
>> add_Allele?
>>
>> It may be my lack of understanding alleles and what these classes are
>> intending to do, but trying to rewrite the test scripts to improve code
>> coverage has let me a little confused!
>>
>> Thanks
>> Nath
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue
GBHuSHfsesX1ko55s+ME2Zc=
=tkG8
-----END PGP SIGNATURE-----

From cjfields at uiuc.edu  Sat Jul  7 16:57:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 15:57:37 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
	<1183726832.2566.34.camel@localhost.localdomain>
Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu>

We'll prob. get a start soon, then.  I'll let you know when we start.

chris

On Jul 6, 2007, at 8:00 AM, Scott Cain wrote:

> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
>>
>> I think we should just bite the bullet and proceed with pulling out
>> the controversial operator overloading in Bio::Annotation*, deprecate
>> the tag methods in AnnotatableI, and go about fixing everything up.
>> If that occurs (which seems to be the major impediment) and we get
>> GMOD/GBrowse playing well with BioPerl then we can aim for a new
>> stable release, and then institute a regular release cycle.
>>
> I think this sounds like a good idea to me too.  I'm planning on  
> having
> a GMOD hackathon at the end of the summer; if I had a new API by then,
> we could focus on fixing anything that gets broken by the changes.
>
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Jul  7 17:17:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 16:17:14 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468E61AF.9040106@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
	<468E61AF.9040106@sheffield.ac.uk>
Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu>


On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote:

> ...
> Hmm, still not 100% clear - it is Friday!
>
> So, someone running a script that was designed when 1.4 was released
> should still be able to run their script for all future releases.  
> So all
> changes need to be backward compatible?

It helps.  For instance, if we change method names (rename each_Foo  
as next_Foo), we should have each_Foo delegate to next_Foo for the  
time being.  If we plan on deprecating the old method altogether we  
would add a warning message when it's called, then delegate.

It's a better solution than just changing the method outright, which  
means the user has to search through docs to find the renamed method.

> So you have several situations regarding method names:
> 1) Adding new methods should e fine since past scripts don't know  
> about
> them and won't have used them
> 2) Removing methods would break past scripts that used them
> 3) Renamed methods would break past scripts that used the old name
>
> A stable API to me, means the same method calls should still be  
> able to
> accept the same arguments (inc the constructor) and return the same
> object/data etc.

Yes.

> What if a module is pretty outdated and would benefit from a rewrite -
> should all the old method names be included, what if this makes coding
> difficult?

It depends on the module.  If a complete rewrite is needed then maybe  
starting with a new module/interface is best, and we could deprecate  
the older module completely.  That has been done already with  
Bio::Tools::BPLite (in favor of SearchIO) and a few other modules.

>> There are a few methods which are considered deprecated or will be
>> deprecated.  For instance, we recently talked about changes to method
>> names which use case to specify whether you're receiving an object
>> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
>> nested
>> list, or whether to use each_* vs next_* for iterators.   
>> Consistency is
>> nice!
>>
>
> You mean the use of case to signify objects vs data being returned are
> to be deprecated or encouraged? What was the outcome of the each_* vs
> next_*?
>
> Nath

Here's the section I added to the wiki (it started in a thread a few  
weeks or so ago, so it's a summary really):

http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names

Feel free to add to it or make suggestions.

BTWm Hilmar mentioned there was a movement to rename methods in old  
code to follow these recs but it was never completed.  It should be  
taken up again at some point but the recommendations are mainly here  
for newer code.

chris


From heikki at sanbi.ac.za  Sun Jul  8 03:32:21 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 8 Jul 2007 09:32:21 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E632D.4090801@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
	<468E632D.4090801@sheffield.ac.uk>
Message-ID: <200707080932.21818.heikki@sanbi.ac.za>

On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote:
> Heikki Lehvaslaiho wrote:
> > Hi Nat,
> >
> > These modules have not been touched for a while and were developed for a
> > specific task. A revire is defiitely in order.
> >
> > The way RNAChange->label was written, it should return 'inframe' when
> > given no alleles, but 'no change' would actually be better.
>
> Wouldn't this effectively be changing the API since past scripts "could"
> expect "inframe" to be returned.

Checking tha actal usage and what happens when you do change of a nucleotide 
to itself, you get the label 'silent'. I guess that would be a valid lable 
value even when the alleles are not initialised, too.

> > The multiple alleles were originally though to be a good idea, but the
> > vocabulary for labels was developed for single allele, only, The use of
> > the module ended up being limited to single allele, so add_allele()
> > behaviour was conveniently ignored but not removed. :(
>
> So add_Allele() and each_Allele() should be deprecated in favour of
> allele_mut()?

Yes.

> From my post about API's.....how should the capitalisation of
> add_Allele() and each_Allele() be changed?

Definitely, keept the current ones as deprecated alternatives.


    -Heikki

> Cheers
> Nath
>
> > 	-Heikki
> >
> > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> >> Nathan S. Haigh wrote:
> >>> I'm taking a look at the tests for Bio::Variation::RNAChange.
> >>>
> >>> If you create a new oject without arguments:
> >>> my $obj = Bio::Variation::RNAChange->new();
> >>>
> >>> What do you expect the following to return:
> >>> $obj->label();
> >>>
> >>> I thought it would probably be:
> >>> 'inframe'
> >>>
> >>> However you get:
> >>> 'inframe, deletion'
> >>>
> >>> Can anyone in the know explain what behaviour would be expected?
> >>>
> >>> Cheers
> >>> Nath
> >>
> >> Following on from this, AAChange has the following two methods:
> >> add_Allele() and allele_mut()
> >>
> >> It appears that allele_mut is only capable of remembering 1 allele at a
> >> time, whereas add_Allele() is provided to add support for mutliple
> >> alleles - is that correct?
> >>
> >> However, add_Allele() also calls allele_mut(), such that mutliple calls
> >> to add_Allele will result in the overwriting of the allele being
> >> remembered by allele_mut(). Things are further complicated by the fact
> >> that label() uses allele_mut() to decide on the label to return.
> >> Shouldn't label know aout multiple alleles set by multiple calls to
> >> add_Allele?
> >>
> >> It may be my lack of understanding alleles and what these classes are
> >> intending to do, but trying to rewrite the test scripts to improve code
> >> coverage has let me a little confused!
> >>
> >> Thanks
> >> Nath
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From xing.y.hu at gmail.com  Mon Jul  9 02:26:40 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Mon, 09 Jul 2007 14:26:40 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
Message-ID: <4691D520.60700@gmail.com>

Hi friends,

    I wrote a script for getting genomic sequence file from GenBank. To 
fulfill that target, I used DB::GenBank module to get the sequence via 
get_Seq_by_acc, and it works well. But this time, facing enormous amount 
of ESTs, I have no idea how to download them swiftly and elegantly.

    PROBLEM DESCRIPTION:
    goal: download all EST files of a specific species from GenBank, say 
Arabidopsis Thaliana or Oryza sativa(rice).
    other: whether all of ESTs are in a single file or separatedly 
placed does not matter.

    Can I use a bioperl script to achieve that? And How? I really 
appreciate.

Xing.


From akozik at atgc.org  Mon Jul  9 08:25:14 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Mon, 09 Jul 2007 05:25:14 -0700
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4691D520.60700@gmail.com>
References: <4691D520.60700@gmail.com>
Message-ID: <4692292A.1080900@atgc.org>

To download genomic sequences or ESTs for any organism (in various 
formats) you can use NCBI Taxonomy Browser:
http://www.ncbi.nlm.nih.gov/Taxonomy/

you can use taxonomy id to access different organisms, Arabidopsis for 
example (3702):
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702

or by direct web link:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1

assembled genomes can be accessed via ftp:
ftp://ftp.ncbi.nih.gov/genomes/

To download large amount of selected sequences (ESTs for example) you 
can use batch Entrez:
http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
(select EST for EST, it's critical)

It seems, to solve the problem you describe, you don't need to use 
bioperl. NCBI GenBank Entrez provides all necessary tools to work on 
these simple and frequent tasks.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


Xing Hu wrote:
> Hi friends,
> 
>     I wrote a script for getting genomic sequence file from GenBank. To 
> fulfill that target, I used DB::GenBank module to get the sequence via 
> get_Seq_by_acc, and it works well. But this time, facing enormous amount 
> of ESTs, I have no idea how to download them swiftly and elegantly.
> 
>     PROBLEM DESCRIPTION:
>     goal: download all EST files of a specific species from GenBank, say 
> Arabidopsis Thaliana or Oryza sativa(rice).
>     other: whether all of ESTs are in a single file or separatedly 
> placed does not matter.
> 
>     Can I use a bioperl script to achieve that? And How? I really 
> appreciate.
> 
> Xing.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From cjfields at uiuc.edu  Mon Jul  9 10:17:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Jul 2007 09:17:23 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4692292A.1080900@atgc.org>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>

Caveat: if you have millions of ESTs please consider NOT using my  
eutil script below or NCBI Batch Entrez, which would repeatedly hit  
the NCBI server thousands of times.  At least try looking for other  
ways to retrieve the data you want (ftp, organism-specific resources  
like Ensembl, so on), or run any scripts or data retrieval in off  
hours so you don't overtax the NCBI server.

There is a way you can use BioPerl if you don't mind living on the  
bleeding edge by using bioperl-live (core code from CVS).  I have  
been working on a set of modules for the last year  
(Bio::DB::EUtilities) which interact with all the various eutils for  
building data pipelines which uses the NCBI CGI interface.  You could  
possibly retrieve all relevant ESTs using a variation of the example  
script here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch

Note that the code examples do NOT work with rel. 1.5.2 code as the  
API has changed quite a bit; I'm working to rectify some of that.

The script I would use is below.  It retrieves batches of 500  
sequences (in fasta format) at a time, for a total of 10000 max seq  
records, saving the raw record data directly to a file (appending as  
you go along).  I added an eval block to check the server status and  
redo the call up to 4 times before giving up completely.  Using eval  
this way hasn't been extensively tested but should work.

---------------------------------------

use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'nucest',
                                        -term => 'txid3702',
                                        -usehistory => 'y',
                                        -keep_histories => 1);

my $count = $factory->get_count;

print "Count: $count\n";

if (my $hist = $factory->next_History) {
     print "History returned\n";
     # note db carries over from above
     $factory->set_parameters(-eutil => 'efetch',
                              -rettype => 'fasta',
                              -history => $hist);
     my ($retmax, $retstart) = (500,0);
     my $retry = 1;
     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq  
records to return
     RETRIEVE_SEQS:
     while ($retstart < $maxcount) {
         print "Returning from ",$retstart+1," to ",$retstart+ 
$retmax,"\n";
         $factory->set_parameters(-retmax => $retmax,
                                 -retstart => $retstart);
         # check in case of server error
         eval{
             $factory->get_Response(-file => ">>ESTs.fas");
         };
         if ($@) {
             die "Server error: $@.  Try again later" if $retry == 5;
             print STDERR "Server error, redo #$retry\n";
             $retry++ && redo RETRIEVE_SEQS;
         }
         $retstart += $retmax;
     }
}


---------------------------------------


chris

On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:

> To download genomic sequences or ESTs for any organism (in various
> formats) you can use NCBI Taxonomy Browser:
> http://www.ncbi.nlm.nih.gov/Taxonomy/
>
> you can use taxonomy id to access different organisms, Arabidopsis for
> example (3702):
> http://www.ncbi.nlm.nih.gov/sites/entrez? 
> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>
> or by direct web link:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>
> assembled genomes can be accessed via ftp:
> ftp://ftp.ncbi.nih.gov/genomes/
>
> To download large amount of selected sequences (ESTs for example) you
> can use batch Entrez:
> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
> (select EST for EST, it's critical)
>
> It seems, to solve the problem you describe, you don't need to use
> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
> these simple and frequent tasks.
>
> -Alex
>
> -- 
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 East Health Sciences Drive
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Xing Hu wrote:
>> Hi friends,
>>
>>     I wrote a script for getting genomic sequence file from  
>> GenBank. To
>> fulfill that target, I used DB::GenBank module to get the sequence  
>> via
>> get_Seq_by_acc, and it works well. But this time, facing enormous  
>> amount
>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>
>>     PROBLEM DESCRIPTION:
>>     goal: download all EST files of a specific species from  
>> GenBank, say
>> Arabidopsis Thaliana or Oryza sativa(rice).
>>     other: whether all of ESTs are in a single file or separatedly
>> placed does not matter.
>>
>>     Can I use a bioperl script to achieve that? And How? I really
>> appreciate.
>>
>> Xing.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon Jul  9 14:08:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 9 Jul 2007 11:08:07 -0700
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>

I don't think there is a function for this yet but it would be a good  
one to have.
I assume you don't really want to take a shot at writing it though?

To make this work I think you have to create a new node which  
contains the trifurcation and this node is what the root is set to.

-jason

On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:

> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From lstein at cshl.edu  Mon Jul  9 17:35:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 9 Jul 2007 17:35:49 -0400
Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager
Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com>

Hi Folks,

Sorry for the job spam. We're looking for a manager of the Cold Spring
Harbor Laboratory bioinformatics core facility. This is a semi-independent
staff position supporting  CSHL scientific researchers by providing
consultation, data mining and software development activities. You will have
a software staff of two, a  nice salary, good health benefits, and an
exciting and dynamic environment to work in. I'm looking for someone with a
strong bioinformatics background, at least five years experience programming
Perl, Java or Python in a academic or commercial environment, and management
experience. If you are interested, please send your CV and cover letter to
me.

Thanks,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From stewarta at nmrc.navy.mil  Mon Jul  9 18:16:12 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Mon, 9 Jul 2007 18:16:12 -0400
Subject: [Bioperl-l] rpsblast
Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil>

When I run...   $result = $factory->rpsblast($seq);   ... where $seq  
is a Bio::Seq object, it seems to simply copy the $seq object to  
$result;  When I run something similar... $rpsblast('/path/to/ 
myFile');    ... the value of $result then becomes '/path/to/myFile'.

Anyone else encounter this?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason_stajich at berkeley.edu  Mon Jul  9 21:36:10 2007
From: jason_stajich at berkeley.edu (Jason Stajich)
Date: Mon, 9 Jul 2007 18:36:10 -0700
Subject: [Bioperl-l] BOSC2007
Message-ID: <E6F5077E-50A3-489E-94B0-109FCAE6200F@berkeley.edu>

I posted a quick note about meeting up at BOSC/ISMB this year. If you  
are attending, please sign your name on the page or at least express  
an interest on whether you are interested in a BoF.  We'll try and  
discuss some of the current topics in BioPerl development as well try  
and use the time to coordinate any development that benefits from the  
face-to-face time.

http://bioperl.org/wiki/BOSC2007_Meetup
http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/

-jason
--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From schlesi at ebi.ac.uk  Tue Jul 10 08:58:00 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Tue, 10 Jul 2007 13:58:00 +0100
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
	<22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com>

Hi,

>  I don't think there is a function for this yet but it would be a good one
> to have.
> I assume you don't really want to take a shot at writing it though?
> To make this work I think you have to create a new node which contains the
> trifurcation and this node is what the root is set to.

Creating a new root is fine, but what would the (3) children of that
node be? I took a different approach now, where I iterate over all
(indirect) descendents of the root, find the first one which does not
have the root as its direct ancestor and move it up the tree, i.e.

foreach my $d ($root->get_all_Descendents){
  if ($d->ancestor != $root){
    $d->ancestor->remove_Descendent($d);
    if ($root->add_Descendent($d, 1) == 3){
    last;
  }}}

This will make the old root a trifurcation. It does the right thing
for what I am trying to do, but is not general I believe (it does for
example at the moment not worry about branch length). Also instead of
taking the first, taking the most distant possible subtree of a clade
up to the root might be better.

Felix


> On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:
>
> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>

From xing.y.hu at gmail.com  Tue Jul 10 09:29:36 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Tue, 10 Jul 2007 21:29:36 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
Message-ID: <469389C0.5060303@gmail.com>

Thanks you guys.

I had to confess that how stupid I was. The easiest way seems to be the 
way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
fact, I knew that but I thought it was necessary to have all items 
selected before pressing save to launch download. So I was desperate to 
find a button that could achieve that without hundreds of thousands of 
clicking by me. "What about select none of those items at all?" -- This 
idea finally came to me after days of struggling and the problem was solved.

Xing


Chris Fields wrote:
> Caveat: if you have millions of ESTs please consider NOT using my 
> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
> the NCBI server thousands of times.  At least try looking for other 
> ways to retrieve the data you want (ftp, organism-specific resources 
> like Ensembl, so on), or run any scripts or data retrieval in off 
> hours so you don't overtax the NCBI server.
>
> There is a way you can use BioPerl if you don't mind living on the 
> bleeding edge by using bioperl-live (core code from CVS).  I have been 
> working on a set of modules for the last year (Bio::DB::EUtilities) 
> which interact with all the various eutils for building data pipelines 
> which uses the NCBI CGI interface.  You could possibly retrieve all 
> relevant ESTs using a variation of the example script here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>
> Note that the code examples do NOT work with rel. 1.5.2 code as the 
> API has changed quite a bit; I'm working to rectify some of that.
>
> The script I would use is below.  It retrieves batches of 500 
> sequences (in fasta format) at a time, for a total of 10000 max seq 
> records, saving the raw record data directly to a file (appending as 
> you go along).  I added an eval block to check the server status and 
> redo the call up to 4 times before giving up completely.  Using eval 
> this way hasn't been extensively tested but should work.
>
> ---------------------------------------
>
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                        -db => 'nucest',
>                                        -term => 'txid3702',
>                                        -usehistory => 'y',
>                                        -keep_histories => 1);
>
> my $count = $factory->get_count;
>
> print "Count: $count\n";
>
> if (my $hist = $factory->next_History) {
>     print "History returned\n";
>     # note db carries over from above
>     $factory->set_parameters(-eutil => 'efetch',
>                              -rettype => 'fasta',
>                              -history => $hist);
>     my ($retmax, $retstart) = (500,0);
>     my $retry = 1;
>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
> records to return
>     RETRIEVE_SEQS:
>     while ($retstart < $maxcount) {
>         print "Returning from ",$retstart+1," to 
> ",$retstart+$retmax,"\n";
>         $factory->set_parameters(-retmax => $retmax,
>                                 -retstart => $retstart);
>         # check in case of server error
>         eval{
>             $factory->get_Response(-file => ">>ESTs.fas");
>         };
>         if ($@) {
>             die "Server error: $@.  Try again later" if $retry == 5;
>             print STDERR "Server error, redo #$retry\n";
>             $retry++ && redo RETRIEVE_SEQS;
>         }
>         $retstart += $retmax;
>     }
> }
>
>
> ---------------------------------------
>
>
> chris
>
> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>
>> To download genomic sequences or ESTs for any organism (in various
>> formats) you can use NCBI Taxonomy Browser:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>
>> you can use taxonomy id to access different organisms, Arabidopsis for
>> example (3702):
>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>
>>
>> or by direct web link:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>
>>
>> assembled genomes can be accessed via ftp:
>> ftp://ftp.ncbi.nih.gov/genomes/
>>
>> To download large amount of selected sequences (ESTs for example) you
>> can use batch Entrez:
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>> (select EST for EST, it's critical)
>>
>> It seems, to solve the problem you describe, you don't need to use
>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>> these simple and frequent tasks.
>>
>> -Alex
>>
>> --Alexander Kozik
>> Bioinformatics Specialist
>> Genome and Biomedical Sciences Facility
>> 451 East Health Sciences Drive
>> University of California
>> Davis, CA 95616-8816
>> Phone: (530) 754-9127
>> email#1: akozik at atgc.org
>> email#2: akozik at gmail.com
>> web: http://www.atgc.org/
>>
>>
>>
>> Xing Hu wrote:
>>> Hi friends,
>>>
>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>> amount
>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>
>>>     PROBLEM DESCRIPTION:
>>>     goal: download all EST files of a specific species from GenBank, 
>>> say
>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>     other: whether all of ESTs are in a single file or separatedly
>>> placed does not matter.
>>>
>>>     Can I use a bioperl script to achieve that? And How? I really
>>> appreciate.
>>>
>>> Xing.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From davila at ioc.fiocruz.br  Tue Jul 10 09:58:29 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Tue, 10 Jul 2007 10:58:29 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <469389C0.5060303@gmail.com>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com>
Message-ID: <46939085.40906@ioc.fiocruz.br>

Hi Xing,

Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
and 13971 from T. cruzi 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
  that I cannot download at once in GenBank format... even when I select 
"GenBank" format in the Display menu I can only see and get/download 500 
ESTs each time...

I also downloaded all ESTs from GenBank (a pity there are not subsets of 
them !) but merging all them generate a file bigger than 120GB to be 
processed...

Just asked Diogo (my student) to give a try to the script sent by Chris 
Fields.. so finger crossed ;-)

Cheers, Alberto


Xing Hu wrote:
> Thanks you guys.
> 
> I had to confess that how stupid I was. The easiest way seems to be the 
> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
> fact, I knew that but I thought it was necessary to have all items 
> selected before pressing save to launch download. So I was desperate to 
> find a button that could achieve that without hundreds of thousands of 
> clicking by me. "What about select none of those items at all?" -- This 
> idea finally came to me after days of struggling and the problem was solved.
> 
> Xing
> 
> 
> 
> Chris Fields wrote:
>> Caveat: if you have millions of ESTs please consider NOT using my 
>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>> the NCBI server thousands of times.  At least try looking for other 
>> ways to retrieve the data you want (ftp, organism-specific resources 
>> like Ensembl, so on), or run any scripts or data retrieval in off 
>> hours so you don't overtax the NCBI server.
>>
>> There is a way you can use BioPerl if you don't mind living on the 
>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>> which interact with all the various eutils for building data pipelines 
>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>> relevant ESTs using a variation of the example script here:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>
>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>> API has changed quite a bit; I'm working to rectify some of that.
>>
>> The script I would use is below.  It retrieves batches of 500 
>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>> records, saving the raw record data directly to a file (appending as 
>> you go along).  I added an eval block to check the server status and 
>> redo the call up to 4 times before giving up completely.  Using eval 
>> this way hasn't been extensively tested but should work.
>>
>> ---------------------------------------
>>
>> use Bio::DB::EUtilities;
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                        -db => 'nucest',
>>                                        -term => 'txid3702',
>>                                        -usehistory => 'y',
>>                                        -keep_histories => 1);
>>
>> my $count = $factory->get_count;
>>
>> print "Count: $count\n";
>>
>> if (my $hist = $factory->next_History) {
>>     print "History returned\n";
>>     # note db carries over from above
>>     $factory->set_parameters(-eutil => 'efetch',
>>                              -rettype => 'fasta',
>>                              -history => $hist);
>>     my ($retmax, $retstart) = (500,0);
>>     my $retry = 1;
>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>> records to return
>>     RETRIEVE_SEQS:
>>     while ($retstart < $maxcount) {
>>         print "Returning from ",$retstart+1," to 
>> ",$retstart+$retmax,"\n";
>>         $factory->set_parameters(-retmax => $retmax,
>>                                 -retstart => $retstart);
>>         # check in case of server error
>>         eval{
>>             $factory->get_Response(-file => ">>ESTs.fas");
>>         };
>>         if ($@) {
>>             die "Server error: $@.  Try again later" if $retry == 5;
>>             print STDERR "Server error, redo #$retry\n";
>>             $retry++ && redo RETRIEVE_SEQS;
>>         }
>>         $retstart += $retmax;
>>     }
>> }
>>
>>
>> ---------------------------------------
>>
>>
>> chris
>>
>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>
>>> To download genomic sequences or ESTs for any organism (in various
>>> formats) you can use NCBI Taxonomy Browser:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>
>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>> example (3702):
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>
>>>
>>> or by direct web link:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>
>>>
>>> assembled genomes can be accessed via ftp:
>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>
>>> To download large amount of selected sequences (ESTs for example) you
>>> can use batch Entrez:
>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>> (select EST for EST, it's critical)
>>>
>>> It seems, to solve the problem you describe, you don't need to use
>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>> these simple and frequent tasks.
>>>
>>> -Alex
>>>
>>> --Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 East Health Sciences Drive
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>>
>>>
>>> Xing Hu wrote:
>>>> Hi friends,
>>>>
>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>> amount
>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>
>>>>     PROBLEM DESCRIPTION:
>>>>     goal: download all EST files of a specific species from GenBank, 
>>>> say
>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>     other: whether all of ESTs are in a single file or separatedly
>>>> placed does not matter.
>>>>
>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>> appreciate.
>>>>
>>>> Xing.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>

From cjfields at uiuc.edu  Tue Jul 10 10:05:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:05:43 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>

Just make sure you're using the latest from CVS.  Let me know if it  
doesn't work and I'll look into it.

chris

On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote:

> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei  
> ESTs
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=8)
> and 13971 from T. cruzi
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=11)
>   that I cannot download at once in GenBank format... even when I  
> select
> "GenBank" format in the Display menu I can only see and get/ 
> download 500
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not  
> subsets of
> them !) but merging all them generate a file bigger than 120GB to be
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by  
> Chris
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to  
>> be the
>> way using NCBI Taxonomy Browser which suggested by alex. As a  
>> matter of
>> fact, I knew that but I thought it was necessary to have all items
>> selected before pressing save to launch download. So I was  
>> desperate to
>> find a button that could achieve that without hundreds of  
>> thousands of
>> clicking by me. "What about select none of those items at all?" --  
>> This
>> idea finally came to me after days of struggling and the problem  
>> was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>> Caveat: if you have millions of ESTs please consider NOT using my
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit
>>> the NCBI server thousands of times.  At least try looking for other
>>> ways to retrieve the data you want (ftp, organism-specific resources
>>> like Ensembl, so on), or run any scripts or data retrieval in off
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the
>>> bleeding edge by using bioperl-live (core code from CVS).  I have  
>>> been
>>> working on a set of modules for the last year (Bio::DB::EUtilities)
>>> which interact with all the various eutils for building data  
>>> pipelines
>>> which uses the NCBI CGI interface.  You could possibly retrieve all
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. 
>>> 3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500
>>> sequences (in fasta format) at a time, for a total of 10000 max seq
>>> records, saving the raw record data directly to a file (appending as
>>> you go along).  I added an eval block to check the server status and
>>> redo the call up to 4 times before giving up completely.  Using eval
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms,  
>>>> Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez? 
>>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
>>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for  
>>>> example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to  
>>>> work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from  
>>>>> GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the  
>>>>> sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and  
>>>>> elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from  
>>>>> GenBank,
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From diogoat at gmail.com  Tue Jul 10 10:15:20 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 10 Jul 2007 11:15:20 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>

Deal All,
I use this script bellow, and it`s work very fine!
I only changed the query! And the script gave me the 5133 EST from T.
brucei.

#################################################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'gbdiv est[prop] AND Trypanosoma
brucei [organism]',
                                db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'Genbank',
                          -file => '>>Tbrucei.EST.fasta');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
####################################################################

Diogo Tschoeke/Fiocruz (Alberto`s Student)

From cjfields at uiuc.edu  Tue Jul 10 10:35:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:35:03 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
	<638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu>

That will work as well; the key difference between my example and  
this one is that the seq stream retrieved using Bio::DB::GenBank  
passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq  
record directly to a file (or callback or HTTP::Response) for  
optionally parsing later.

If you have problems with Bio::SeqIO you can always use  
Bio::DB::EUtilities to get around the issue until we resolve it.

chris

On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote:

> Deal All,
> I use this script bellow, and it`s work very fine!
> I only changed the query! And the script gave me the 5133 EST from T.
> brucei.
>
> ###################################################################### 
> ###########
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'gbdiv est[prop] AND  
> Trypanosoma
> brucei [organism]',
>                                 db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'Genbank',
>                           -file => '>>Tbrucei.EST.fasta');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> ####################################################################
>
> Diogo Tschoeke/Fiocruz (Alberto`s Student)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hartzell at alerce.com  Tue Jul 10 12:50:31 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 12:50:31 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
Message-ID: <18067.47319.254632.538811@almost.alerce.com>

Jason Stajich writes:
 > [...]
 > Do you know how to have svn commit messages generate summary emails  
 > as well?

I've made a local installation of the SVN::Notify bits in my home
directory and set up its notification script.  If folks are happy with
it then I'll work on getting The Powers That Be to do a real install
and we'll use it for the real repository.

It's currently configured to include diffs inline in the message.  I
prefer them as an attachment, but the current configuration of the
bioperl-guts-l list stalls messages w/ attachments and requires admin
intervention.  I have a support@ request going on it and will change
it if/when we get the issue resolved.

So, to review:

   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/

is the top of the repository and

   svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk 

will get you the main branch of bioperl-live.

Remember that the repository is transient, don't put anything
important in there....

Have at it, but remember that the entire world will see your commit
messages.

g.

From xing.y.hu at gmail.com  Tue Jul 10 13:08:35 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Wed, 11 Jul 2007 01:08:35 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>	<469389C0.5060303@gmail.com>
	<46939085.40906@ioc.fiocruz.br>
Message-ID: <4693BD13.2070509@gmail.com>

Hi Alberto,

Yes, I know that there is only choice for showing no more than 500 
entries on the NCBI website. However, I completely ignored that (doesn't 
mean that I have not seen that), and pulled down the "send to" and chose 
"file". Then a small window popped up, after saying yes to that, the 
downloading started. You might ask me how I know that it was not a batch 
of only 5 (default selection) or 500 ESTs? To be honest, I don't know at 
the first time. But the download has accumulated to millions bytes since 
then(due to my bad network condition, I have no idea when it will reach 
the end), and that doesn't look like a little batch of ESTs less than 
one thousand. Actually, I wrote a script to count the sequences within 
the temporary file and got a number much bigger than ten thousand. So I 
guess it works.

BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys!

Xing


Alberto Davila wrote:
> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
> and 13971 from T. cruzi 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
>   that I cannot download at once in GenBank format... even when I select 
> "GenBank" format in the Display menu I can only see and get/download 500 
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not subsets of 
> them !) but merging all them generate a file bigger than 120GB to be 
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by Chris 
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>   
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to be the 
>> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
>> fact, I knew that but I thought it was necessary to have all items 
>> selected before pressing save to launch download. So I was desperate to 
>> find a button that could achieve that without hundreds of thousands of 
>> clicking by me. "What about select none of those items at all?" -- This 
>> idea finally came to me after days of struggling and the problem was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> Caveat: if you have millions of ESTs please consider NOT using my 
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>>> the NCBI server thousands of times.  At least try looking for other 
>>> ways to retrieve the data you want (ftp, organism-specific resources 
>>> like Ensembl, so on), or run any scripts or data retrieval in off 
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the 
>>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>>> which interact with all the various eutils for building data pipelines 
>>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500 
>>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>>> records, saving the raw record data directly to a file (appending as 
>>> you go along).  I added an eval block to check the server status and 
>>> redo the call up to 4 times before giving up completely.  Using eval 
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to 
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>       
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>         
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from GenBank, 
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>       
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bix at sendu.me.uk  Tue Jul 10 13:14:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Jul 2007 18:14:29 +0100
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
Message-ID: <4693BE75.4090005@sendu.me.uk>

George Hartzell wrote:
> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails  
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.

Can I put a vote in that you don't? I search through email body text in 
my archive of guts to find certain diffs, so really like the diffs inline.

Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
in the subject? Seems redundant and makes it harder to see what was 
changed in a small email client window.

From aaron.j.mackey at gsk.com  Tue Jul 10 13:20:15 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 10 Jul 2007 13:20:15 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
Message-ID: <OF37443F52.13AE1143-ON85257314.005D5FF0-85257314.005F432E@gsk.com>

George, this is all very nice to finally have, thank you for your efforts!

Any chance that the diff-as-attachment vs. diffs-inline question can be 
different for each subscriber?  The utility of the "guts" mailing list (to 
me) is that it's an encyclopedia of browsable, skimmable, and searchable 
diffs, not just a date-stamped record of diffs (if so, why provide an 
attachment at all, just provide a URL to the diff in the respository).

Thanks again,

-Aaron


bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM:

> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails 
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.
> 
> So, to review:
> 
>    svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/
> 
> is the top of the repository and
> 
>    svn co svn+ssh://dev.open-bio.
> org/home/hartzell/bioperl_take2/bioperl-live/trunk 
> 
> will get you the main branch of bioperl-live.
> 
> Remember that the repository is transient, don't put anything
> important in there....
> 
> Have at it, but remember that the entire world will see your commit
> messages.
> 
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Tue Jul 10 14:18:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 13:18:07 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>


On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Jason Stajich writes:
>>> [...]
>>> Do you know how to have svn commit messages generate summary emails
>>> as well?
>>
>> I've made a local installation of the SVN::Notify bits in my home
>> directory and set up its notification script.  If folks are happy  
>> with
>> it then I'll work on getting The Powers That Be to do a real install
>> and we'll use it for the real repository.
>>
>> It's currently configured to include diffs inline in the message.  I
>> prefer them as an attachment, but the current configuration of the
>> bioperl-guts-l list stalls messages w/ attachments and requires admin
>> intervention.  I have a support@ request going on it and will change
>> it if/when we get the issue resolved.
>
> Can I put a vote in that you don't? I search through email body  
> text in
> my archive of guts to find certain diffs, so really like the diffs  
> inline.
>
> Also, is there any way to get rid of the 'bioperl' in [bioperl  
> revision]
> in the subject? Seems redundant and makes it harder to see what was
> changed in a small email client window.

Agree on both counts; the devs have gotten used to seeing the diffs  
inline.

We prob. need to schedule a specific day/time when the switchover  
would take place so we can announce (so everyone knows and no one can  
gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
some tools a while ago...

chris

From hartzell at alerce.com  Tue Jul 10 16:09:09 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:09:09 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <18067.59237.519166.454578@almost.alerce.com>

Sendu Bala writes:
 > George Hartzell wrote:
 > > Jason Stajich writes:
 > >  > [...]
 > >  > Do you know how to have svn commit messages generate summary emails  
 > >  > as well?
 > > 
 > > I've made a local installation of the SVN::Notify bits in my home
 > > directory and set up its notification script.  If folks are happy with
 > > it then I'll work on getting The Powers That Be to do a real install
 > > and we'll use it for the real repository.
 > > 
 > > It's currently configured to include diffs inline in the message.  I
 > > prefer them as an attachment, but the current configuration of the
 > > bioperl-guts-l list stalls messages w/ attachments and requires admin
 > > intervention.  I have a support@ request going on it and will change
 > > it if/when we get the issue resolved.
 > 
 > Can I put a vote in that you don't? I search through email body text in 
 > my archive of guts to find certain diffs, so really like the diffs inline.

Ok, three votes against attachments.  Anyone want to vote in support,
otherwise I'll just leave 'em inline.

 > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
 > in the subject? Seems redundant and makes it harder to see what was 
 > changed in a small email client window.

Sure.  The default's just [RevisionNumber].  Does that work for folk?

g.

From hartzell at alerce.com  Tue Jul 10 16:11:36 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:11:36 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
Message-ID: <18067.59384.247108.463648@almost.alerce.com>

Chris Fields writes:
 > [...]
 > We prob. need to schedule a specific day/time when the switchover  
 > would take place so we can announce (so everyone knows and no one can  
 > gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
 > some tools a while ago...

I haven't done anything about it.

I think that we also need to have some input from the admin/support
folk about access methods (https, etc...).

Are we going to want to mirror the repository anywhere?

g.

From hartzell at alerce.com  Wed Jul 11 09:17:08 2007
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 11 Jul 2007 09:17:08 -0400
Subject: [Bioperl-l] extra hook functionality for svn repos?
Message-ID: <18068.55380.626778.486775@almost.alerce.com>


There are a bunch of "contributed" hook scripts at

  http://subversion.tigris.org/tools_contrib.html#hook_scripts

Given that many bioperl users depend on case-preserving but
case-insensitive file systems, I'm wondering if hooking up the
case-insensitive.py script might be worthwhile.

Likewise, the check-mime-type.pl script might help us keep
svn:mime-type and svn:eol-style properties up to date.

There are others there, but none that I found interesting.

How big-brother do we want the repository to be?

g.

From cjfields at uiuc.edu  Wed Jul 11 09:40:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Jul 2007 08:40:54 -0500
Subject: [Bioperl-l] extra hook functionality for svn repos?
In-Reply-To: <18068.55380.626778.486775@almost.alerce.com>
References: <18068.55380.626778.486775@almost.alerce.com>
Message-ID: <A13F608F-16FA-4432-AA2F-83674E3A73F4@uiuc.edu>


On Jul 11, 2007, at 8:17 AM, George Hartzell wrote:

>
> There are a bunch of "contributed" hook scripts at
>
>   http://subversion.tigris.org/tools_contrib.html#hook_scripts
>
> Given that many bioperl users depend on case-preserving but
> case-insensitive file systems, I'm wondering if hooking up the
> case-insensitive.py script might be worthwhile.

I'm not sure how often we run into this, though.  Anyone know?

> Likewise, the check-mime-type.pl script might help us keep
> svn:mime-type and svn:eol-style properties up to date.

The latter two might be nice.  I thought we planned on defaulting to  
a simple 'plain text' mime type on commits if it isn't specifically  
predefined, but maybe this way is better?

> There are others there, but none that I found interesting.
>
> How big-brother do we want the repository to be?
>
> g.

'Friendly' big-brother, not 'dystopian' big-brother.

chris

From marian.thieme at lycos.de  Wed Jul 11 05:05:18 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 09:05:18 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178019848@lycos-europe.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/eec1aa42/attachment.html 

From dmessina at wustl.edu  Wed Jul 11 16:14:17 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 11 Jul 2007 15:14:17 -0500
Subject: [Bioperl-l] submitting code
In-Reply-To: <188661178019848@lycos-europe.com>
References: <188661178019848@lycos-europe.com>
Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu>

Hi Marian,

Thanks so much for contributing! The best way would be to create a  
Bugzilla ticket and then attach the code to that ticket. One of the  
developers will check it in and give you feedback if there are any  
little tweaks that would be helpful*.

Would you be able to include documentation and test cases with your  
module?

Dave


* For more info:
http://www.bioperl.org/wiki/FAQ#I. 
27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F
http://www.bioperl.org/wiki/Developer_Information
http://www.bioperl.org/wiki/Becoming_a_developer
http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From marian.thieme at lycos.de  Wed Jul 11 11:12:20 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 15:12:20 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178030343@lycos-europe.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/c95991b8/attachment.html 

From e-just at northwestern.edu  Thu Jul 12 10:37:03 2007
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 12 Jul 2007 09:37:03 -0500
Subject: [Bioperl-l] Job opening in Chicago
Message-ID: <fa1fe35c0707120737i71c6c26fq7635e350da9bf23f@mail.gmail.com>

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago)
for a Bioinformatics Software Engineer.  This job involves writing and
maintaining software for a genome database using Chado/OO-Perl/Bioperl
and many other state of the art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric

From cjfields at uiuc.edu  Thu Jul 12 12:09:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Jul 2007 11:09:02 -0500
Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question
Message-ID: <A8310D54-F800-43BE-B6C3-3879206CE697@uiuc.edu>

I have been running into some GFF formatting issues where the  
attributes column is left undef (no '.'), which causes  
GFF3Loader::parse_attributes() to complain with an 'use of undefined  
string with split' warning.  Would it be okay with the powers that be  
(Scott, Lincoln) to add a warning or exception there?  I'm guessing a  
warning is better in this case, as just returning works fine.

chris

From jason at bioperl.org  Fri Jul 13 13:30:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 13:30:05 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.59384.247108.463648@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>

I'll try and look into this and other stuff with the migration in  
next week or so - maybe we'll make some time to talk it through  
during BOSC.  I don't know yet when I'll actually have time to think  
about it properly.

I am still worried about doing https because of the current system we  
have supporting user logins and that we didn't want to run a web  
server on the main repository machine and we'll have to install DAV  
on the main repository machine.  if ssh+svn is going to be sufficient  
hurdle for people, note it was already a hurdle for them with CVS,  
but we'll have to think a bit more on it.

We might be able to do some sort of NFS (or other exported FS) but  
exported to the webserver machine but that is may be a recipe for  
disaster.

-jason
On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:

> Chris Fields writes:
>> [...]
>> We prob. need to schedule a specific day/time when the switchover
>> would take place so we can announce (so everyone knows and no one can
>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>> some tools a while ago...
>
> I haven't done anything about it.
>
> I think that we also need to have some input from the admin/support
> folk about access methods (https, etc...).
>
> Are we going to want to mirror the repository anywhere?
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri Jul 13 14:29:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 13:29:22 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu>

I don't think there's a huge rush on this since BOSC is imminent. If  
devs really want https then we can try adding it after migration, but  
if it becomes too much of a headache (particularly for the web  
admins) I wouldn't worry about it.

chris

On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote:

> I'll try and look into this and other stuff with the migration in
> next week or so - maybe we'll make some time to talk it through
> during BOSC.  I don't know yet when I'll actually have time to think
> about it properly.
>
> I am still worried about doing https because of the current system we
> have supporting user logins and that we didn't want to run a web
> server on the main repository machine and we'll have to install DAV
> on the main repository machine.  if ssh+svn is going to be sufficient
> hurdle for people, note it was already a hurdle for them with CVS,
> but we'll have to think a bit more on it.
>
> We might be able to do some sort of NFS (or other exported FS) but
> exported to the webserver machine but that is may be a recipe for
> disaster.
>
> -jason
> On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:
>
>> Chris Fields writes:
>>> [...]
>>> We prob. need to schedule a specific day/time when the switchover
>>> would take place so we can announce (so everyone knows and no one  
>>> can
>>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>>> some tools a while ago...
>>
>> I haven't done anything about it.
>>
>> I think that we also need to have some input from the admin/support
>> folk about access methods (https, etc...).
>>
>> Are we going to want to mirror the repository anywhere?
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sheris at eps.berkeley.edu  Fri Jul 13 14:42:32 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Fri, 13 Jul 2007 11:42:32 -0700
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
Message-ID: <200707131142.32366.sheris@eps.berkeley.edu>

Hi,
I have a collection of sequencing reads aligned with a consensus sequence that 
I input into a Bio::PopGen::Population object in order to calculate allele 
frequencies. The consensus sequence is included to force clustalw to give a 
better alignment. However,  I need to remove the consensus sequence before 
calculating allele frequencies in the individual reads. I'm having trouble 
with this part of it. I get the following error message:

"Can't locate object method "person_id" via package "Bio::PopGen::Individual" 		
at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line 49."

Here is the code snippet producing the error. $pop is a 
Bio::PopGen::Population object.

	my @consensus = "gene_consensus";
	$pop->remove_Individuals(@consensus);

I also tried:
	my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); 
	$pop->remove_Individuals(@consensus);

which produced the same error. Can anyone send me in the right direction? I 
suspect this is a simple problem.

Sheri

-- 
Sheri Simmons
Department of Earth and Planetary Sciences
University of California, Berkeley
Berkeley, CA 94720-4767

From jason at bioperl.org  Fri Jul 13 16:17:31 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 16:17:31 -0400
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu>
References: <200707131142.32366.sheris@eps.berkeley.edu>
Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org>

Hi Sheri -

Shoot - that was my fault - bug in the code where I was only using  
"Person" not Individuals for the code when I was testing.

I've commited a bugfix to CVS - do you need me to send you the  
updated file or are you comfortable grabbing the code from CVS or  
http://code.open-bio.org

This is the change - you may have a different version of BioPerl than  
what is in CVS so you may have to make the changes on line 260 rather  
than 282 -- or you can upgrade to latest code via CVS (although this  
is probably harder for you since you've got stuff installed in /usr/ 
share)':

RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ 
Population.pm,v
retrieving revision 1.22
diff -r1.22 Population.pm
282c282
<       unshift @tosplice, $i if( $namehash{$ind->person_id} );
---
 >       unshift @tosplice, $i if( $namehash{$ind->unique_id} );

-jason
On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote:

> Hi,
> I have a collection of sequencing reads aligned with a consensus  
> sequence that
> I input into a Bio::PopGen::Population object in order to calculate  
> allele
> frequencies. The consensus sequence is included to force clustalw  
> to give a
> better alignment. However,  I need to remove the consensus sequence  
> before
> calculating allele frequencies in the individual reads. I'm having  
> trouble
> with this part of it. I get the following error message:
>
> "Can't locate object method "person_id" via package  
> "Bio::PopGen::Individual" 		
> at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line  
> 49."
>
> Here is the code snippet producing the error. $pop is a
> Bio::PopGen::Population object.
>
> 	my @consensus = "gene_consensus";
> 	$pop->remove_Individuals(@consensus);
>
> I also tried:
> 	my @consensus = $pop->get_Individuals(-unique_id =>  
> "gene_consensus");
> 	$pop->remove_Individuals(@consensus);
>
> which produced the same error. Can anyone send me in the right  
> direction? I
> suspect this is a simple problem.
>
> Sheri
>
> -- 
> Sheri Simmons
> Department of Earth and Planetary Sciences
> University of California, Berkeley
> Berkeley, CA 94720-4767
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From hartzell at alerce.com  Fri Jul 13 16:34:14 2007
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 13 Jul 2007 16:34:14 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <18071.57798.130368.703488@almost.alerce.com>

Jason Stajich writes:
 > I'll try and look into this and other stuff with the migration in  
 > next week or so - maybe we'll make some time to talk it through  
 > during BOSC.  I don't know yet when I'll actually have time to think  
 > about it properly.
 > 
 > I am still worried about doing https because of the current system we  
 > have supporting user logins and that we didn't want to run a web  
 > server on the main repository machine and we'll have to install DAV  
 > on the main repository machine.  if ssh+svn is going to be sufficient  
 > hurdle for people, note it was already a hurdle for them with CVS,  
 > but we'll have to think a bit more on it.
 > [...]

How are you thinking about providing anonymous readonly non-dev access
to the repository?  svn+ssh using an anonymous/guest account (can it
be screwed down tightly enough?)  svn-mirror the repo onto the public
machine and do DAV there w/out having to worry about authenticating
the devs?

g.


From jason at bioperl.org  Fri Jul 13 17:33:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 17:33:29 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18071.57798.130368.703488@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
	<18071.57798.130368.703488@almost.alerce.com>
Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org>


On Jul 13, 2007, at 4:34 PM, George Hartzell wrote:

> Jason Stajich writes:
>> I'll try and look into this and other stuff with the migration in
>> next week or so - maybe we'll make some time to talk it through
>> during BOSC.  I don't know yet when I'll actually have time to think
>> about it properly.
>>
>> I am still worried about doing https because of the current system we
>> have supporting user logins and that we didn't want to run a web
>> server on the main repository machine and we'll have to install DAV
>> on the main repository machine.  if ssh+svn is going to be sufficient
>> hurdle for people, note it was already a hurdle for them with CVS,
>> but we'll have to think a bit more on it.
>> [...]
>
> How are you thinking about providing anonymous readonly non-dev access
> to the repository?  svn+ssh using an anonymous/guest account (can it
> be screwed down tightly enough?)  svn-mirror the repo onto the public
> machine and do DAV there w/out having to worry about authenticating
> the devs?
>
We'll do svn on the public anonymous machine like we already do with  
CVS and with SVN

See:
http://code.open-bio.org
  AND
http://code.open-bio.org/svnweb/
See blipkit.

-jason
> g.
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From scrosson at uchicago.edu  Fri Jul 13 18:15:30 2007
From: scrosson at uchicago.edu (Sean Crosson)
Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC)
Subject: [Bioperl-l] ace to fasta conversion
Message-ID: <loom.20070714T000856-94@post.gmane.org>

I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
and it works great.  We're now trying to convert a big (250 MB) .ace file to
fasta.  The documentation suggests I can do this, but everytime I run the script
below, it outputs an empty .fas file.  Does anyone have any suggestions on how
to make this script work?  Does SeqIO really convert between these file types? 
Thanks for your help.

#!/usr/bin/perl -w

use Bio::SeqIO;


$in  = Bio::SeqIO->new(-file => "454Contigs.ace",
                       -format => 'ace');
$out = Bio::SeqIO->new(-file => ">454Contigs.fas",
                       -format => 'fasta');
while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }


From cvillamar at gmail.com  Fri Jul 13 19:24:04 2007
From: cvillamar at gmail.com (Carlos Villacorta)
Date: Fri, 13 Jul 2007 16:24:04 -0700
Subject: [Bioperl-l] beginner problem with fasta headers
Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>

hi all,
I have a embl sequence file, when formatting to fasta with Seqio it
gives a long string header for each sequence that my following
phylogenetic software cannot handle...
Does anyone knows how to format those embl or genbank files to fasta
but retrieving in the headers just two or three fields (e.g. id | gene
| sp_name)?
Any advice with this problem would be very appreciated, thanks!

From j_martin at lbl.gov  Fri Jul 13 20:05:45 2007
From: j_martin at lbl.gov (Joel Martin)
Date: Fri, 13 Jul 2007 17:05:45 -0700
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <loom.20070714T000856-94@post.gmane.org>
References: <loom.20070714T000856-94@post.gmane.org>
Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org>

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote:
> I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
> and it works great.  We're now trying to convert a big (250 MB) .ace file to
> fasta.  The documentation suggests I can do this, but everytime I run the script
> below, it outputs an empty .fas file.  Does anyone have any suggestions on how
> to make this script work?  Does SeqIO really convert between these file types? 
> Thanks for your help.
> 
> #!/usr/bin/perl -w
> 
> use Bio::SeqIO;
> 
> 
> $in  = Bio::SeqIO->new(-file => "454Contigs.ace",
>                        -format => 'ace');
> $out = Bio::SeqIO->new(-file => ">454Contigs.fas",
>                        -format => 'fasta');
> while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From cjfields at uiuc.edu  Sat Jul 14 00:06:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 23:06:27 -0500
Subject: [Bioperl-l] beginner problem with fasta headers
In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu>

Some reading material...

http://www.bioperl.org/wiki/ 
FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files
http://www.bioperl.org/wiki/ 
FAQ#I_would_like_to_make_my_own_custom_fasta_header_- 
_how_do_I_do_this.3F
http://www.bioperl.org/wiki/FASTA_sequence_format#Note

Quiz on Monday!

chris

On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote:

> hi all,
> I have a embl sequence file, when formatting to fasta with Seqio it
> gives a long string header for each sequence that my following
> phylogenetic software cannot handle...
> Does anyone knows how to format those embl or genbank files to fasta
> but retrieving in the headers just two or three fields (e.g. id | gene
> | sp_name)?
> Any advice with this problem would be very appreciated, thanks!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scrosson at uchicago.edu  Fri Jul 13 23:43:59 2007
From: scrosson at uchicago.edu (scrosson)
Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT)
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org>
References: <loom.20070714T000856-94@post.gmane.org>
	<20070714000544.GB29841@eniac.jgi-psf.org>
Message-ID: <11590811.post@talk.nabble.com>


This problem now makes sense.  I've been playing with Bio::Assembly::IO,
which does indeed read phrap .ace files.  Does anyone have an idea how to
pull the assembled contigs out of a Bio::Assembly object and write them out
as multi-fasta (or strings for that matter)?  None of our workstations are
running phrap/consed and I'd love to see these contigs.

Sean 
       

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try
poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

-- 
View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bioperlanand at yahoo.com  Sat Jul 14 13:55:53 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT)
Subject: [Bioperl-l] a question on obtain PDB records using bioperl
Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com>

Hi everybody,

Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records.

Thanks in advance,

Anand

       
---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. 

From johnsonm at gmail.com  Tue Jul 17 14:23:58 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 17 Jul 2007 13:23:58 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
Message-ID: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>

I'm tinkering with parsing iprscan reports with BioPerl.  I noticed that this:

  my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro');

  while (my $seq = $seqio->next_seq()) {
      ...
  }

Does not work unless I first 'use XML::DOM::XPath'.  I get this error:

  Can't locate object method "findnodes" via package
"XML::DOM::Document" at
bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
30.

I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
suck in XML::DOM::Xpath.  I see that t/interpro.t requires
XML::DOM::XPath:

test_begin(-tests => 17,
                -requires_module => 'XML::DOM::XPath');

Is suppose the reason the test specs a require XML::DOM::XPath is so
that tests can be skipped if XML::DOM::XPath is not available.
Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?

From sac at bioperl.org  Tue Jul 17 15:49:32 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 17 Jul 2007 12:49:32 -0700
Subject: [Bioperl-l] Ohloh account for bioperl
Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>

I came across a web app that tracks various metrics for open source
projects, noticed that bioperl wasn't listed, and added it:

http://www.ohloh.net/projects/6685

Seems like an interesting resource that could help add some
visibility. It creates metrics by directly processing the source code
repository. I hooked it up to the CVS repos for bioperl-live, -db,
-run, and -pipeline. It has yet to do its analysis at this point.

Feel free to create Ohloh accounts for yourselves. When you add
yourself as a contributor to Bioperl, you can indicate the username
associated with your commits, but this requires that it first process
the commit logs to figure out what the usernames are. You can still
create an account, just update it later with your username.

Steve

From cjfields at uiuc.edu  Tue Jul 17 17:04:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Jul 2007 16:04:44 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>


On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:

> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed  
> that this:
>
>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>  
> 'interpro');
>
>   while (my $seq = $seqio->next_seq()) {
>       ...
>   }
>
> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
>
>   Can't locate object method "findnodes" via package
> "XML::DOM::Document" at
> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> 30.
>
> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> XML::DOM::XPath:
>
> test_begin(-tests => 17,
>                 -requires_module => 'XML::DOM::XPath');
>
> Is suppose the reason the test specs a require XML::DOM::XPath is so
> that tests can be skipped if XML::DOM::XPath is not available.
> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?

You're right; I think tests passed b/c XML::DOM::XPath (if present),  
was eval'd as a required module.  When I commented out the spot where  
it is eval'd in the test suite I can replicate this error.  I have  
added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it  
passes fine.

Thanks for the heads up!

chris

From xianranli78 at yahoo.com.cn  Wed Jul 18 01:55:19 2007
From: xianranli78 at yahoo.com.cn (Xianran Li)
Date: Wed, 18 Jul 2007 13:55:19 +0800
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file
Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>

Hi,

I want to extract some infomation  from the gff3 file like:

12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
   
The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?

Thanks for your help.


Xianran Li


From georg.otto at tuebingen.mpg.de  Wed Jul 18 05:32:26 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 18 Jul 2007 11:32:26 +0200
Subject: [Bioperl-l] run megablast
Message-ID: <m1r6n66or9.fsf@tuebingen.mpg.de>


Hi,

is there a module to run megablast in a script (equivalent to ncbi
blast in StandAloneBlast.pm)?

Cheers,

Georg


From jeevitesh at ibab.ac.in  Wed Jul 18 06:03:24 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 03:15:33 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in>

Hi Friends,

we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES.

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 04:45:50 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From cain.cshl at gmail.com  Wed Jul 18 09:10:40 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 09:10:40 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from
	gff3	file
In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
Message-ID: <1184764240.2570.31.camel@localhost.localdomain>

Hi Xianran Li,

Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
as Bio::DB::GFF3), then you can use the attributes method to get
anything in the ninth column:

  my ($name) = $gene->attributes('Name');

The parenthesis are needed around $name because the attributes method
returns a list and the parens capture the first item of the list into
$name.

Scott


On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> Hi,
> 
> I want to extract some infomation  from the gff3 file like:
> 
> 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
>    
> The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> 
> Thanks for your help.
> 
> 
> Xianran Li
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/c66ec18b/attachment.bin 

From johnsonm at gmail.com  Wed Jul 18 16:53:00 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 18 Jul 2007 15:53:00 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <469DB6C6.9010702@pasteur.fr>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
	<5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>
	<469DB6C6.9010702@pasteur.fr>
Message-ID: <ebf5eb170707181352v4d59ec81kfb6f706ca4643cc7@mail.gmail.com>

The output from InterProScan, invoked thusly:

iprscan -cli -seqtype p -i input_file -o output_file -format xml

On 7/18/07, Emmanuel Quevillon <tuco at pasteur.fr> wrote:
> Hi guys,
>
> I read your email and I wondered which iprscan file you've
> been talking about? Is it the file produced by InterProScan
> or the file called match.xml representing the whole uniprot
> database against InterPro? Reading the xml parser
> implemented into Bio::SeqIO::interpro, I guess it is the
> second one?
> In such case, I just want to let you know that the xml
> schema changed and the file name also. It is now called
> match_complete.xml.
> I attached the DTD to be able to see the new structure.
> Here is an example of the new data representation.
>
>
> <protein id="A0A000" name="A0A000_9ACTO" length="394"
> crc64="F1DD0C1042811B48">
>      <match id="G3DSA:3.40.640.10"
> name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D"
> status="T" evd="HMMPfam">
>        <ipr id="IPR015421" name="Pyridoxal
> phosphate-dependent transferase, major region, subdomain 1"
> type="Domain" />
>        <lcn start="52" end="288" score="4.30000170645879E-75" />
>      </match>
>      <match id="PTHR13693:SF7" name="PTHR13693:SF7"
> dbname="PANTHER" status="T" evd="not_rel">
>        <lcn start="33" end="389" score="0.0" />
>      </match>
> </protein>
>
> As you can see some time there is no interpro info (no ipr
> element).
>
> I think it would be good to change also the interpro parser ?
>
> Regards
>
> Emmanuel
>
> Chris Fields wrote:
> > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:
> >
> >> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed
> >> that this:
> >>
> >>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>
> >> 'interpro');
> >>
> >>   while (my $seq = $seqio->next_seq()) {
> >>       ...
> >>   }
> >>
> >> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
> >>
> >>   Can't locate object method "findnodes" via package
> >> "XML::DOM::Document" at
> >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> >> 30.
> >>
> >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> >> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> >> XML::DOM::XPath:
> >>
> >> test_begin(-tests => 17,
> >>                 -requires_module => 'XML::DOM::XPath');
> >>
> >> Is suppose the reason the test specs a require XML::DOM::XPath is so
> >> that tests can be skipped if XML::DOM::XPath is not available.
> >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?
> >
> > You're right; I think tests passed b/c XML::DOM::XPath (if present),
> > was eval'd as a required module.  When I commented out the spot where
> > it is eval'd in the test suite I can replicate this error.  I have
> > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it
> > passes fine.
> >
> > Thanks for the heads up!
> >
> > chris
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From cain.cshl at gmail.com  Wed Jul 18 22:47:53 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 22:47:53 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from	gff3
	file
In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
	<1184764240.2570.31.camel@localhost.localdomain>
	<008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
Message-ID: <1184813273.2570.96.camel@localhost.localdomain>

[Please always reply to the mailing list so that answers can archived]


Yes, because commas are not allowed in GFF3 in an unescaped form.
Essentially, you are doing this with your GFF3:

  Name=receptor kinase ORK10;Name= putative

and when you do this:

  my ($name) = $gene->attributes('Name');

you are getting the first item in the list of names, and I suspect which
one you get is random.

To fix it, you need to replace the comma with %2C (the URL escape code
for a comma).  If you generated this GFF3, you will need to add a step
to URI encode your attribute strings.  If you got it from someone else,
you should point out to them that their GFF is flawed.

Scott


On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote:
> However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? 
> 
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
> as Bio::DB::GFF3), then you can use the attributes method to get
> anything in the ninth column:
> 
>   my ($name) = $gene->attributes('Name');
> 
> The parenthesis are needed around $name because the attributes method
> returns a list and the parens capture the first item of the list into
> $name.
> 
> Scott
> 
> 
> On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> > Hi,
> > 
> > I want to extract some infomation  from the gff3 file like:
> > 
> > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
> >    
> > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> > 
> > Thanks for your help.
> > 
> > 
> > Xianran Li
> ----- Original Message ----- 
> From: "Scott Cain" <cain.cshl at gmail.com>
> To: "Xianran Li" <xianranli78 at yahoo.com.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 18, 2007 9:10 PM
> Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file
> 
> 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l&#0;??i??'?????h??&
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/86cf671f/attachment.bin 

From acutter at eeb.utoronto.ca  Thu Jul 19 22:25:08 2007
From: acutter at eeb.utoronto.ca (Asher Cutter)
Date: Thu, 19 Jul 2007 22:25:08 -0400
Subject: [Bioperl-l] tree comparisons with bioperl
Message-ID: <46A01D04.5040209@eeb.utoronto.ca>

I was reading over the functions for working with trees in bioperl. I am 
looking for something that will compare two topologies and report back 
if they are equivalent. i.e. something like:

does ((a,(b,c)) == ((A,B),C) ? (in this case, no)

But of course in reality they would be more complicated topologies. This 
would be useful for simulating random trees to compare with some given 
topology of interest.

I saw the methods for testing for monophyly and paraphyly, but not much 
beyond that...perhaps I have missed something?

Any suggestions?

Thanks,
Asher

-- 

___________________________________
Asher D. Cutter
Assistant Professor
Department of Ecology & Evolutionary Biology
University of Toronto
25 Harbord St.
Toronto, ON, M5S 3G5

tel: 416-978-4602
email: acutter at eeb.utoronto.ca
http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130
___________________________________

From jeevitesh at ibab.ac.in  Fri Jul 20 00:25:22 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From n.haigh at sheffield.ac.uk  Sun Jul 22 07:34:58 2007
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sun, 22 Jul 2007 12:34:58 +0100
Subject: [Bioperl-l] Ohloh account for bioperl
In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
Message-ID: <46A340E2.4040505@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Chervitz wrote:
> I came across a web app that tracks various metrics for open source
> projects, noticed that bioperl wasn't listed, and added it:
> 
> http://www.ohloh.net/projects/6685
> 
> Seems like an interesting resource that could help add some
> visibility. It creates metrics by directly processing the source code
> repository. I hooked it up to the CVS repos for bioperl-live, -db,
> -run, and -pipeline. It has yet to do its analysis at this point.
> 
> Feel free to create Ohloh accounts for yourselves. When you add
> yourself as a contributor to Bioperl, you can indicate the username
> associated with your commits, but this requires that it first process
> the commit logs to figure out what the usernames are. You can still
> create an account, just update it later with your username.
> 
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Nice to see the graphs of number of commits each developer has made over
the last 5 years and how new developers have arisen while those more
"seasoned" developers can relax a little more -proof of an excellent
open source project!

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO
4JWvG5Gy+H/UqpeXYAcSCX0=
=LrFt
-----END PGP SIGNATURE-----

From cjfields at uiuc.edu  Sun Jul 22 23:53:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 22 Jul 2007 22:53:48 -0500
Subject: [Bioperl-l] run megablast
In-Reply-To: <m1r6n66or9.fsf@tuebingen.mpg.de>
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>

StandAloneBlast runs the megablast executable directly, though I  
think you can specify a MegaBlast search using blastall with the '-n'  
flag.

We could probably add this functionality in fairly easily since  
SearchIO can parse megablast output; no one's had the need to code it  
yet.

chris

On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:

>
> Hi,
>
> is there a module to run megablast in a script (equivalent to ncbi
> blast in StandAloneBlast.pm)?
>
> Cheers,
>
> Georg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jeevitesh at ibab.ac.in  Mon Jul 23 06:34:36 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6.

We need to find the shared distance as said above.

Kindly helps us it will help our research a lot.

With Thanks & regards
jeevitesh


From bix at sendu.me.uk  Mon Jul 23 07:08:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 23 Jul 2007 12:08:23 +0100
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared
	Distance
In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
Message-ID: <46A48C27.6060905@sendu.me.uk>

jeevitesh at ibab.ac.in wrote:
> Hi Friends,
> 
> We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
> A TREE.

Please stop sending this message. We heard you the first time. If no one 
answered, either no one knows the answer or no one understood you.


> The Distance method of TreeIO in Bioperl module gives the total distance.
> 
> But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
> illustrated
> in figure.
> 
> Suppose we have a tree
>     A                C
>      \              /
>       \2          2/
>        \__________/
>        /    6     \
>       /2          2\
>      /              \
>     B                D
> 
> The shared path between AB and AC is 2.
> and for AC and BD the shared path is 6.

I don't follow. But if you already know how to work the answer out, 
describe the algorithm in words and maybe someone can code it up for you.


From georg.otto at tuebingen.mpg.de  Mon Jul 23 09:56:46 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Mon, 23 Jul 2007 15:56:46 +0200
Subject: [Bioperl-l] run megablast
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
	<1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>
Message-ID: <m11weznrz5.fsf@tuebingen.mpg.de>

Thanks a lot! I guess I should have read the blast documentation more
carefully....

Best,

Georg

Chris Fields <cjfields at uiuc.edu> writes:
> StandAloneBlast runs the megablast executable directly, though I  
> think you can specify a MegaBlast search using blastall with the '-n'  
> flag.
>
> We could probably add this functionality in fairly easily since  
> SearchIO can parse megablast output; no one's had the need to code it  
> yet.
>
> chris
>
> On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:
>
>>
>> Hi,
>>
>> is there a module to run megablast in a script (equivalent to ncbi
>> blast in StandAloneBlast.pm)?
>>
>> Cheers,
>>
>> Georg
>>


From cjfields at uiuc.edu  Mon Jul 23 11:41:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Jul 2007 10:41:35 -0500
Subject: [Bioperl-l] Bio::Assembly bug/feature?
Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu>

To all:

I think I have found a major problem with Bio::Assembly; this was  
first noticed on Mac OS X in relation to bug 2320 and  
Bio::Assembly::IO.  I am uncertain whether this is meant to be a  
feature or a bug but it certainly needs to be documented or fixed as  
it leads to subtle errors.  I also can't see the advantage of this  
approach, but maybe I can be enlightened?  Either way, I think it's  
worth a discussion for those willing to follow.  I'll add as a bug  
later if needed.

A bit of background: each instance of a Bio::Assembly::Contig has a  
Bio::SeqFeature::Collection instance attached to it; each  
Bio::SeqFeature::Collection itself has a tied DB_File handle attached  
which remains open during the lifetime of the Bio::SF::Collection  
object.  When using Bio::Assembly one adds the various Contig objects  
to a Bio::Assembly::Scaffold.  So, for instance, if one had ~1000  
Contigs in a Scaffold, one would also have ~1000 open tied db  
handles, one per Contig instance.  So far, so good.

Unfortunately, when adding a ton of Contig objects to a  
Bio::Assembly::Scaffold one can run into a host of system-dependent  
issues based on resource usage limits (as one might expect).  This  
script:

------------------------------
use Bio::Assembly::Scaffold;
use Bio::Assembly::Contig;
use Bio::SeqFeature::Generic;

my $scaffold = Bio::Assembly::Scaffold->new();

for my $id (1..15000) {
     print "Contig #$id\n";
     my $contig = Bio::Assembly::Contig->new(-id => $id);
     my $feat = Bio::SeqFeature::Generic->new(-start=>1,
                                            -end=>10,
                                            -strand=>1);
     $contig->add_features([$feat]);
     $scaffold->add_contig($contig);
}
------------------------------

may fail on Mac OS X when one reaches the maximum number of open file  
descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - 
n'); the call to tie the DB_File handle in SF::Collection fails  
silently, so later on when called on you get the following:

...
Contig #251
Contig #252
Contig #253
Contig #254
Can't call method "put" on an undefined value at /Users/cjfields/src/ 
bioperl-live/Bio/SeqFeature/Collection.pm line 225.

I have added an exception to catch this.  On Mac OS X you can  
increase the file descriptor limit using ulimit, at least to a  
certain point.  However, when testing this out on dev.open-bio.org  
(Linux) the 'tie' sometimes fails (and the exception pops up), but it  
isn't dependent on 'ulimit -n'.  This is what happens more often:

...
Contig #10567
Contig #10568
Contig #10569
Contig #10570
Out of memory!

Sometimes followed by a seg fault.  Ick!

Any ideas? For instance, should we set this up so that one  
SF::Collection is used for all the Contigs (since each one has a  
unique ID anyway)?  Leave as is and document/track the issue as a  
bug?  Both?

chris

From ba6450 at wayne.edu  Mon Jul 23 16:06:14 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu>

Hello everyone:

I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:

[code]
use Bio::Tools::Run::Phylo::PAML::Codeml;
use Bio::AlignIO;
use Bio::TreeIO;

my $alignio = Bio::AlignIO->new(-format => 'phylip',
			         -file   => 'NM_000034.CDSalign.paml');

my $aln = $alignio->next_aln;

my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
my $tree   = $treeio->next_tree;

my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();

$codeml->alignment($aln);
$codeml->tree($tree);

my ($rc,$parser) = $codeml->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();
print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
[/code]

It gives the following error when I try to compile:

[error]
------------ EXCEPTION: Bio::Root::Exception -------------
MSG: unable to find or run executable for 'codeml'
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
-----------------------------------------------------------
Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
[/error]

Any idea, guys?

Munirul Islam
Phd Student
Computer Science
Wayne State University

From arareko at campus.iztacala.unam.mx  Mon Jul 23 17:19:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 23 Jul 2007 16:19:24 -0500
Subject: [Bioperl-l] error running codeml
In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx>

Apparently, your script isn't able to locate the codeml executable in 
your Windows environment. Do you have the PAML package installed? 
Instructions on how to install it are located here:

http://abacus.gene.ucl.ac.uk/software/paml.html

Regards,
Mauricio.

Munirul Islam wrote:
> Hello everyone:
> 
> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
> 
> [code]
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::AlignIO;
> use Bio::TreeIO;
> 
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
> 			         -file   => 'NM_000034.CDSalign.paml');
> 
> my $aln = $alignio->next_aln;
> 
> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> my $tree   = $treeio->next_tree;
> 
> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> 
> $codeml->alignment($aln);
> $codeml->tree($tree);
> 
> my ($rc,$parser) = $codeml->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> [/code]
> 
> It gives the following error when I try to compile:
> 
> [error]
> ------------ EXCEPTION: Bio::Root::Exception -------------
> MSG: unable to find or run executable for 'codeml'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> -----------------------------------------------------------
> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> [/error]
> 
> Any idea, guys?
> 
> Munirul Islam
> Phd Student
> Computer Science
> Wayne State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From ba6450 at wayne.edu  Mon Jul 23 19:53:22 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu>

Thanks Mauricio. 

I needed to add an environment variable for the paml directiory. 

$ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; 

One question ... I would like to save the temp files.  So, what modification do I need to make such that 
$obj->save_tempfiles returns 1 within codeml.pm? 

Regards 

Munir

---- Original message ----
>Date: Mon, 23 Jul 2007 16:19:24 -0500
>From: Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>  
>Subject: Re: [Bioperl-l] error running codeml  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Apparently, your script isn't able to locate the codeml executable in 
>your Windows environment. Do you have the PAML package installed? 
>Instructions on how to install it are located here:
>
>http://abacus.gene.ucl.ac.uk/software/paml.html
>
>Regards,
>Mauricio.
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
>> 
>> [code]
>> use Bio::Tools::Run::Phylo::PAML::Codeml;
>> use Bio::AlignIO;
>> use Bio::TreeIO;
>> 
>> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>> 			         -file   => 'NM_000034.CDSalign.paml');
>> 
>> my $aln = $alignio->next_aln;
>> 
>> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
>> my $tree   = $treeio->next_tree;
>> 
>> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
>> 
>> $codeml->alignment($aln);
>> $codeml->tree($tree);
>> 
>> my ($rc,$parser) = $codeml->run();
>> my $result = $parser->next_result;
>> my $MLmatrix = $result->get_MLmatrix();
>> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
>> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
>> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
>> [/code]
>> 
>> It gives the following error when I try to compile:
>> 
>> [error]
>> ------------ EXCEPTION: Bio::Root::Exception -------------
>> MSG: unable to find or run executable for 'codeml'
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
>> -----------------------------------------------------------
>> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
>> [/error]
>> 
>> Any idea, guys?
>> 
>> Munirul Islam
>> Phd Student
>> Computer Science
>> Wayne State University
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>
>-- 
>MAURICIO HERRERA CUADRA
>arareko at campus.iztacala.unam.mx
>Laboratorio de Gen?tica
>Unidad de Morfofisiolog?a y Funci?n
>Facultad de Estudios Superiores Iztacala, UNAM
>


From jason at bioperl.org  Tue Jul 24 03:19:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Jul 2007 09:19:18 +0200
Subject: [Bioperl-l] error running codeml
In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
	<46A51B5C.9080808@campus.iztacala.unam.mx>
Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com>

when you initialize the Codeml object just pass in my $codeml =
Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1);

OR do
$codeml->save_tempfiles(1);

You may want to set you TEMPDIR as well and you print out where the tempdir
is located with
print $codeml->tempdir;
and I think you can get the temp outfile.
my $name = $codeml->outfile_name;
print "name is $name\n";

-jason
On 7/23/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
>
> Apparently, your script isn't able to locate the codeml executable in
> your Windows environment. Do you have the PAML package installed?
> Instructions on how to install it are located here:
>
> http://abacus.gene.ucl.ac.uk/software/paml.html
>
> Regards,
> Mauricio.
>
>
> Munirul Islam wrote:
> > Hello everyone:
> >
> > I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is
> the code:
> >
> > [code]
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::AlignIO;
> > use Bio::TreeIO;
> >
> > my $alignio = Bio::AlignIO->new(-format => 'phylip',
> >                                -file   => 'NM_000034.CDSalign.paml');
> >
> > my $aln = $alignio->next_aln;
> >
> > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> > my $tree   = $treeio->next_tree;
> >
> > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> >
> > $codeml->alignment($aln);
> > $codeml->tree($tree);
> >
> > my ($rc,$parser) = $codeml->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> > [/code]
> >
> > It gives the following error when I try to compile:
> >
> > [error]
> > ------------ EXCEPTION: Bio::Root::Exception -------------
> > MSG: unable to find or run executable for 'codeml'
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> > -----------------------------------------------------------
> > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI
> (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> > [/error]
> >
> > Any idea, guys?
> >
> > Munirul Islam
> > Phd Student
> > Computer Science
> > Wayne State University
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Tue Jul 24 17:16:54 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu>

Hello everyone:

I am having problem loading a sequence file from within a directory.  

#############################################################
$dirname = "rundir";
opendir (DIR, $dirname) || die("can't open $dirname");
      
while (defined($file = readdir(DIR))) {
    next if $file =~ /^\.\.?$/;		# skip . and ..
    $abs_path = File::Spec->rel2abs( $file ) ;
    
    # gives a file not found exception for the following code
    my $alignio = Bio::AlignIO->new(-format => 'nexus',
				-file   => $abs_path);
    my $aln = $alignio->next_aln;
    @sequencenames -> $aln->_read_taxlabels;
	  		
    foreach $taxa (@sequencenames) {
	print $taxa . "\n";
    } 		
}        
#############################################################

Your suggestions please.

Regards,

Munirul Islam
PhD Student
Computer Science
Wayne State University
Detroit, Michigan, USA

From bix at sendu.me.uk  Tue Jul 24 18:39:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Jul 2007 23:39:33 +0100
Subject: [Bioperl-l] error loading sequence
In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu>
References: <20070724171654.EEX04380@mirapointms6.wayne.edu>
Message-ID: <46A67FA5.3070505@sendu.me.uk>

Munirul Islam wrote:
> Hello everyone:
> 
> I am having problem loading a sequence file from within a directory.  
> 
> #############################################################
> $dirname = "rundir";
> opendir (DIR, $dirname) || die("can't open $dirname");
>       
> while (defined($file = readdir(DIR))) {
>     next if $file =~ /^\.\.?$/;		# skip . and ..
>     $abs_path = File::Spec->rel2abs( $file ) ;
>     
>     # gives a file not found exception for the following code

This isn't a Bioperl problem. You're using the wrong File::Spec method. 
You want File::Spec->catfile($dirname, $file).

From ba6450 at wayne.edu  Tue Jul 24 20:10:04 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu>

Thanks.  That worked nicely.  I need your suggestion to load codeml control data from a file.  Consider the following code:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params =>	{'noisy' => 9,
		 'verbose' => 2,
		 'runmode' => 0,
		 'seqtype' => 1,
		 'CodonFreq' => 2,
		 'aaDist' => 0,
		 'model' => 2,
		 'NSsites' => 2,
		 'icode' => 0	});
-------------------------------------------------------------

Tried to modify it by passing a hash reference after loading data from a file.:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params => \%hashlist );
-------------------------------------------------------------

Still that didn't work.  Your suggestions pls.

Munir

---- Original message ----
>Date: Tue, 24 Jul 2007 23:39:33 +0100
>From: Sendu Bala <bix at sendu.me.uk>  
>Subject: Re: [Bioperl-l] error loading sequence  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am having problem loading a sequence file from within a directory.  
>> 
>> #############################################################
>> $dirname = "rundir";
>> opendir (DIR, $dirname) || die("can't open $dirname");
>>       
>> while (defined($file = readdir(DIR))) {
>>     next if $file =~ /^\.\.?$/;		# skip . and ..
>>     $abs_path = File::Spec->rel2abs( $file ) ;
>>     
>>     # gives a file not found exception for the following code
>
>This isn't a Bioperl problem. You're using the wrong File::Spec method. 
>You want File::Spec->catfile($dirname, $file).

From ba6450 at wayne.edu  Thu Jul 26 15:21:20 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT)
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu>

Hello Everyone:

I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.

my $alignio = Bio::AlignIO->new(-format => 'phylip',
				-file   => 'seq.txt');

I guess its not in valid phylip format.

I tried to change 'seq.txt' to sequential format.  Still that didn't work.

Any suggestions on how to load 'seq.txt' in bioperl?  

Thanks,

Munir
PhD Student
Computer Science
Wayne State University
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: seq.txt
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0001.txt 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq.out
Type: application/octet-stream
Size: 24318 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0001.obj 

From jason at bioperl.org  Thu Jul 26 20:12:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 17:12:03 -0700
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu>
References: <20070726152120.EFA94600@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com>

You can try and pass in -interleaved => 0 as another option when you
init your AlignIO object.

On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
> Hello Everyone:
>
> I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.
>
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>                                 -file   => 'seq.txt');
>
> I guess its not in valid phylip format.
>
> I tried to change 'seq.txt' to sequential format.  Still that didn't work.
>
> Any suggestions on how to load 'seq.txt' in bioperl?
>
> Thanks,
>
> Munir
> PhD Student
> Computer Science
> Wayne State University
>
>      11     2202
>
> human
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> chimp
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> macaca
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG
> CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC
> GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC
> ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT
> ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG
> CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC
> GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG ---
> --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG
> CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG
> AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> mouse
> GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC
> ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG
> CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA
> AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA
> GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC
> TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG
> GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC
> TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC
> GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC
> CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG
> TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC
> CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC
> CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC
> TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT
> TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG
> AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA
> AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC
> ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC
> TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG
> TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT ---
> --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG
> CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT
> GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG
> AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC
> TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC
> TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG
> GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT
> rat
> GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG
> CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA
> AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA
> GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC
> TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC
> TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC
> GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC
> CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA
> TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT
> CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT
> CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC
> TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT
> TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG
> CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA
> AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC
> ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG
> TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT ---
> --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG
> CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG
> AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC
> TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC
> TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG
> GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT
> rabbit
> GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG
> AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC
> ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG
> CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC
> CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG
> GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC
> TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC
> CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG
> TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC
> CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC
> GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC
> TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA
> GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC
> TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG
> CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT
> --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG
> ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT
> ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG
> TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA
> GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG ---
> --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG
> CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG
> GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC
> AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG
> GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC
> ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG
> GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT
> dog
> GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG
> AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC
> ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG
> CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC
> TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT
> GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC
> TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT
> CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT
> GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC
> CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC
> CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC
> CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC
> ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC
> TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT
> TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG
> CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA
> CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC
> ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC
> ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG
> CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC
> AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG ---
> --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT
> GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT
> AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG
> GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC
> ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG
> GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT
> cow
> GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA
> CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC
> ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG
> CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG
> AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG
> GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC
> CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG
> ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT
> GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC
> TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT
> CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC
> TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC
> GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG
> TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC
> TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC
> ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG
> CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA
> CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC
> ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC
> ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC
> CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT
> AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG ---
> --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG
> TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT
> AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG
> GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC
> ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC
> TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG
> GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT
> elephant
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
> --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC
> ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA
> AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG
> GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG
> ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG
> GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC
> TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG
> TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC
> TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC
> GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC
> CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC
> CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN ---
> --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- ---
> --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN
> NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- ---
> --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN ---
> --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN
> NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG
> GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC
> ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT
> opossum
> GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA
> --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC
> ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA
> AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC
> GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG
> GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC
> CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG
> ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG
> ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT
> TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT
> CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC
> TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC
> CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC
> TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC
> CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC
> CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC
> ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC
> TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA
> GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC
> TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG
> CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG
> GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC
> AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC
> ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC
> ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG
> CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT
> CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC ---
> --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG
> CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA
> GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG
> CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC
> AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA
> GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC
> ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC
> TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG
> GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- ---
> chicken
> GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG
> --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC
> ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG
> CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG
> GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG
> GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC
> CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC
> ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC
> AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC
> TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT
> CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC
> TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT
> GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC
> CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC
> TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT
> CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC
> CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC
> ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC
> TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA
> GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC
> TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG
> CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC
> ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG ---
> --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- ---
> --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG
> GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- ---
> CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC
> AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC
> TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC
> CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG
> GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG
> TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC
> AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG
> GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC
> GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC
> TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG
> GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason

From ba6450 at wayne.edu  Thu Jul 26 21:20:11 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT)
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu>

Thanks.  The error is removed now.

I have a question.  Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file?

Munir

---- Original message ----
>Date: Thu, 26 Jul 2007 17:12:03 -0700
>From: "Jason Stajich" <jason at bioperl.org>  
>Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl)  
>To: "Munirul Islam" <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>You can try and pass in -interleaved => 0 as another option when you
>init your AlignIO object.
>

From jason at bioperl.org  Fri Jul 27 00:28:36 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 21:28:36 -0700
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu>
References: <20070726212011.EFB49252@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com>

Have you tried reading the documentation for the Bio::SimpleAlign object?

for my $seq ( $aln->each_seq ) {
 print $seq->display_id, "\n";
}

I'd appreciate if you added some of your questions with the answers to the
FAQ or to other places on the wiki so that other people can benefit from
your learning here.


On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
>
> Thanks.  The error is removed now.
>
> I have a question.  Is there any function that I can use to get the
> sequence list (human, chimp, etc.) after loading an alignment from file?
>
> Munir
>
> ---- Original message ----
> >Date: Thu, 26 Jul 2007 17:12:03 -0700
> >From: "Jason Stajich" <jason at bioperl.org>
> >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in
> bioperl)
> >To: "Munirul Islam" <ba6450 at wayne.edu>
> >Cc: bioperl-l at lists.open-bio.org
> >
> >You can try and pass in -interleaved => 0 as another option when you
> >init your AlignIO object.
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason

From arareko at campus.iztacala.unam.mx  Fri Jul 27 11:18:55 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 10:18:55 -0500
Subject: [Bioperl-l] Perl Survey 2007
Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx>

It really takes about 5 minutes:

http://perlsurvey.org/

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From dhoworth at mrc-lmb.cam.ac.uk  Fri Jul 27 12:07:17 2007
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Fri, 27 Jul 2007 17:07:17 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk>

Mauricio Herrera Cuadra wrote:
> It really takes about 5 minutes:
> http://perlsurvey.org/

and gives all your personal information including email address to
anybody who cares to snoop the HTTP POST message! So there's definitely
no anonymity.

Cheers, Dave

From spiros at lokku.com  Fri Jul 27 12:38:57 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Fri, 27 Jul 2007 17:38:57 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
Message-ID: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>

On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
> Mauricio Herrera Cuadra wrote:
> > It really takes about 5 minutes:
> > http://perlsurvey.org/
>
> and gives all your personal information including email address to
> anybody who cares to snoop the HTTP POST message! So there's definitely
> no anonymity.

Not to mention that it requires registration (?). Who is behind the
survey ? I am on a number of Perl and Perl related lists and haven't
seen it being mentioned.

Spiros

From arareko at campus.iztacala.unam.mx  Fri Jul 27 13:37:31 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 12:37:31 -0500
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
	<bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx>

Spiros Denaxas wrote:
> On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
>> Mauricio Herrera Cuadra wrote:
>>> It really takes about 5 minutes:
>>> http://perlsurvey.org/
>> and gives all your personal information including email address to
>> anybody who cares to snoop the HTTP POST message! So there's definitely
>> no anonymity.

I didn't provided any personal information other than my country and 
birthyear. As for my email, I always use the one I have for all the SPAM 
I'd like to subscribe to :)

> Not to mention that it requires registration (?). Who is behind the
> survey ? I am on a number of Perl and Perl related lists and haven't
> seen it being mentioned.

Registration is rather different from confirming your email (which 
prevents filling the DB multiple times by spambots/yourself, thus 
screwing the survey). Who's behind it, its purpose, privacy, etc., 
please read the FAQ:

http://perlsurvey.org/faq/

Cheers,
Mauricio.

> Spiros
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From Alicia.Amadoz at uv.es  Mon Jul 30 11:46:57 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
Message-ID: <1245168492amadoz@uv.es>

Hi, i'm trying to run a bioperl script in linux with standaloneblast
from a webserver but I have the following error:

-------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------

I have tried several things to fix it as setting some environment
variables both directly through the shell and adding some code in my
script with,

BEGIN {
$ENV{PATH} .= ':/usr/local/blast-2.2.16';
$ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; 
$ENV{BLASTDATADIR} = '/usr/local/data/';
}

and with,

$local->executable('/usr/local/bin');
my $blast_report = $local->blastall($inputfilename); 

I have also checked that the webserver has permission of read and
execute in all blast executables and directories. But trying all of
these things it keeps showing the same error above.

Any more idea to solve this problem? My script works well when I use it
as a simply script and I've reboot the system several times when changes
where performed. 

Thanks to anyone who will be able to help me!
Regards,
Alicia


From gyang at plantbio.uga.edu  Mon Jul 30 16:58:51 2007
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 30 Jul 2007 16:58:51 -0400
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>

I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this?  
Thanks a lot,  
   

Guojun Yang
University of Georgia
  
   
From grafman at graphcomp.com  Sun Jul 29 17:08:04 2007
From: grafman at graphcomp.com (Grafman Productions)
Date: Sun, 29 Jul 2007 14:08:04 -0700
Subject: [Bioperl-l] Perl 3D OpenGL
Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>

If this posting is inappropriate, please let me know - my apologies.

I recently came across an article on BioPerl, and it occurred to me that 
there might be some need for 3D rendering within your BioPerl project.

I released a number of new/updated Perl OpenGL (POGL) modules this year, 
along with benchmarks that demonstrate that it performs comparably to C.

If there's a need for 3D features within BioPerl, and if I can be of any 
assistance in helping to add such features, I would enjoy the opportunity. 


From torsten.seemann at infotech.monash.edu.au  Mon Jul 30 19:27:46 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 09:27:46 +1000
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <1245168492amadoz@uv.es>
References: <1245168492amadoz@uv.es>
Message-ID: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>

Alicia,

> Hi, i'm trying to run a bioperl script in linux with standaloneblast
> from a webserver but I have the following error:
> -------------------- WARNING ---------------------
> MSG: cannot find path to blastall
> ---------------------------------------------------
> $ENV{BLASTDATADIR} = '/usr/local/data/';
> $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';

I think the last one (or two) paths should be
'/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
BLAST installation is where the 'blastall' binary actually lives.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University

From cjfields at uiuc.edu  Mon Jul 30 20:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Jul 2007 19:53:45 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
Message-ID: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>


On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:

> I am running remoteblast and using readmethod "xml", I noticed that  
> it is printing the output repeatedly nonstop. It's like in a loop.  
> Did anybody notice this before? Can anybody help me getting out of  
> this?
> Thanks a lot,
>
>
> Guojun Yang
> University of Georgia

Not seeing that using bioperl-live; you may need to update  
RemoteBlast.pm as this sounds similar to an issue that popped up  
earlier in the spring.

chris

From torsten.seemann at infotech.monash.edu.au  Tue Jul 31 02:24:34 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 16:24:34 +1000
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
	<FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
Message-ID: <a79f6a4b0707302324t261687e7g1012e1f536500c09@mail.gmail.com>

> as this sounds similar to an issue that popped up
> earlier in the spring.

I could have sworn it was autumn! ;-)

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University

From Alicia.Amadoz at uv.es  Tue Jul 31 06:11:54 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
References: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
Message-ID: <2361686267amadoz@uv.es>

Hi, I tried what you suggested and that was it, it works perfectly.
Thank you very much. 

Regards,
Alicia

> Alicia,
> 
> > Hi, i'm trying to run a bioperl script in linux with standaloneblast
> > from a webserver but I have the following error:
> > -------------------- WARNING ---------------------
> > MSG: cannot find path to blastall
> > ---------------------------------------------------
> > $ENV{BLASTDATADIR} = '/usr/local/data/';
> > $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';
> 
> I think the last one (or two) paths should be
> '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
> BLAST installation is where the 'blastall' binary actually lives.
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> 
> 


From jay at jays.net  Tue Jul 31 08:00:56 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 31 Jul 2007 07:00:56 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>

On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
> If this posting is inappropriate, please let me know - my apologies.

Not at all. AFAIK this is the perfect place to discuss any  
contributions you're motivated to make to the BioPerl project.

> I recently came across an article on BioPerl, and it occurred to me  
> that
> there might be some need for 3D rendering within your BioPerl project.
>
> I released a number of new/updated Perl OpenGL (POGL) modules this  
> year,
> along with benchmarks that demonstrate that it performs comparably  
> to C.
>
> If there's a need for 3D features within BioPerl, and if I can be  
> of any
> assistance in helping to add such features, I would enjoy the  
> opportunity.

I know nothing about 3D modeling in biology, nor do I hang out with  
any protein structure folks, but 3D always sounds sexy. -grin-

If you're new to bioinformatics (I certainly am) you might want to  
read this:

   http://en.wikipedia.org/wiki/Protein_structure

Because that's probably where your 3D work would be used. Especially  
note the "Software" section, where you'll find some of the  
"competition".  :)

There's some cool stuff out there. I don't know what all would or  
wouldn't be time well spent in Perl / BioPerl.

HTH,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at uiuc.edu  Tue Jul 31 12:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 11:51:42 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu>

Make sure to keep responses on the ail list.

You might want to run a full install, just in case.  If I remember  
correctly Sendu made some changes a while back in the BLAST-related  
modules which may be related to this.  At the very least install/ 
upgrade all modules in Bio::Tools::Run.

chris

On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote:

> Thanks, Chris,
> But when I replaced the old RemoteBlast.pm with the new one, I got  
> "can't locate the object method "retrieve_parameter"". Does this  
> mean I need to install something else?
> Guojun
>
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast  
> with xml
>
>
>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:
>>>> I am running remoteblast and using readmethod "xml", I noticed that
>>> it is printing the output repeatedly nonstop. It's like in a loop.
>>> Did anybody notice this before? Can anybody help me getting out of
>>> this?
>>> Thanks a lot,
>>>
>>>
>>> Guojun Yang
>>> University of Georgia
>>> Not seeing that using bioperl-live; you may need to update
>> RemoteBlast.pm as this sounds similar to an issue that popped up
>> earlier in the spring.
>>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Jul 31 22:15:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 21:15:45 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
	<25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu>


On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote:

> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
>> If this posting is inappropriate, please let me know - my apologies.
>
> Not at all. AFAIK this is the perfect place to discuss any
> contributions you're motivated to make to the BioPerl project.
>
>> I recently came across an article on BioPerl, and it occurred to me
>> that
>> there might be some need for 3D rendering within your BioPerl  
>> project.
>>
>> I released a number of new/updated Perl OpenGL (POGL) modules this
>> year,
>> along with benchmarks that demonstrate that it performs comparably
>> to C.
>>
>> If there's a need for 3D features within BioPerl, and if I can be
>> of any
>> assistance in helping to add such features, I would enjoy the
>> opportunity.
>
> I know nothing about 3D modeling in biology, nor do I hang out with
> any protein structure folks, but 3D always sounds sexy. -grin-
>
> If you're new to bioinformatics (I certainly am) you might want to
> read this:
>
>    http://en.wikipedia.org/wiki/Protein_structure
>
> Because that's probably where your 3D work would be used. Especially
> note the "Software" section, where you'll find some of the
> "competition".  :)
>
> There's some cool stuff out there. I don't know what all would or
> wouldn't be time well spent in Perl / BioPerl.
>
> HTH,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah

I agree that protein structure is the best place for something like  
this.

It's a wide open area as far as I'm concerned; in fact I would say  
that Bio::Structure is getting pretty dated, so if anyone wants to  
take it over, refactor the code, and so on I don't have a problem.

chris


From dmessina at wustl.edu  Sun Jul  1 01:38:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Jul 2007 00:38:48 -0500
Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn
	repository]
In-Reply-To: <46869226.70203@sheffield.ac.uk>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>	<4673C7CB.1030709@mail.nih.gov>	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>	<18049.30026.61328.134490@almost.alerce.com>	<5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu>	<BFBA575A-E653-40F6-9242-D72655B6AE9C@wustl.edu>	<E83D9D3C-96F2-4B5A-B503-09C3860586D0@gmx.net>	<D7111143-D173-42DE-AAEF-C2365AA453A0@wustl.edu>	<18051.44281.831316.749586@almost.alerce.com>	<F5B048F4-CBA5-493A-8A5C-2033709D8A63@wustl.edu>
	<18051.61992.627473.323346@almost.alerce.com>
	<4684AF3D.5090907@sheffield.ac.uk>
	<843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu>
	<468628AC.9060200@sheffield.ac.uk>
	<461F64B9-87FD-458A-8945-8238E7076109@wustl.edu>
	<46869226.70203@sheffield.ac.uk>
Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu>


> [Nath]
> I think the list of seq formats recognised by Bioperl in Bio::SeqIO  
> and
> Bio::AlignIO would be a good start. As these are likely to be the ones
> that are sensitive to file format recognition and thus could break  
> tests
> if renamed.

Sounds good to me. I will do a quick tour of the rest of the repo  
looking for other common or important file extensions, but I don't  
expect there to be many if any.


> [still Nath]
> I think a lot of people have used "." in file names as an  
> alternative to
> a space. I think it would be beneficial to use an underscore "_" in
> these cases and leave the "." to represent the beginning of the file
> extension.

That's a great idea.


> [Chris]
> Do we need to define every filetype extension, or can there be a  
> fallback (eg if it isn't on the list or has no extension it's plain  
> text)?

For every file that's added, svn takes a peek to see if it's human- 
readable. If not, it's tagged with the generic MIME type application/ 
octet-stream. (It does this so it knows not to try to do diffs and  
merges on a binary file.)

So the default for a human-readable file is no MIME type, which I  
believe is essentially the same thing as text/plain.

And then regardless of the outcome of svn's peek, any matching auto- 
props are then applied, overriding svn's choice.

So if we don't define every extension, I think we'll be fine. It'd be  
nice to have everything tagged with a MIME type, though. For one  
thing, Apache will use it to do the right thing when people browse  
the repo over the web. And two, because metadata is cool. :)

One more thing: in the course of reading up on this, I learned that  
my earlier expectation about multiple auto-prop matches was  
incorrect. It's true that multiple unrelated matches means that  
multiple properties are set on the file. But when a file matches  
multiple *conflicting* auto-property patterns, there's no telling  
which value it'll get.


Dave


From hartzell at alerce.com  Sun Jul  1 12:29:29 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 1 Jul 2007 09:29:29 -0700
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
	<E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
Message-ID: <18055.54889.677775.868974@almost.alerce.com>

Hilmar Lapp writes:
 > It turns out that both files are also present on the release-0-9-3,  
 > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add
 > 
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ 
 > HUMBETGLOA.fasta
 > 
 > to the post-processing commands.
 > [...]

Will do.  Thanks for working out the incantations!

g.


From cjfields at uiuc.edu  Mon Jul  2 09:26:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:26:06 -0500
Subject: [Bioperl-l] test data
Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>

I am planing on adding test data to cvs for eutils and have run  
across some stuff in bugzilla that needs to be added as well.

Should we, as convention, start adding data sequestered to a fold  
with the test name, within t/data?  This might make life easier in  
the long run (keep track of files, get rid of old files, etc), and  
may make it easier for wrapping up the correct data with tests if we  
start submitting single module CPAN updates.

chris


From cjfields at uiuc.edu  Mon Jul  2 09:52:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:52:27 -0500
Subject: [Bioperl-l] test data
In-Reply-To: <468901C1.8020505@sendu.me.uk>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>

On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am planing on adding test data to cvs for eutils and have run  
>> across some stuff in bugzilla that needs to be added as well.
>> Should we, as convention, start adding data sequestered to a fold  
>> with the test name, within t/data?
>
> I'd actually argue that this shouldn't be done: data is sometimes  
> reused amongst multiple different test scripts, and when looking  
> for data to reuse its easier to spot it in a single directory  
> compared to searching through multiple directories.
>
>
>> This might make life easier in the long run (keep track of files,  
>> get rid of old files, etc), and may make it easier for wrapping up  
>> the correct data with tests if we start submitting single module  
>> CPAN updates.
>
> I don't think that will be an issue. The automated process would  
> read the test script and see what input files it uses, copying  
> those into the archive. So, just be sure to standardise on using  
> test_input_file() to make that possible.
>
>
> That said, I wouldn't mind especially either way. Just don't do it  
> now, since test script names (and therefore the name of the  
> directory you'd want to store the input files in) might all change.
>
>
> In fact we can imagine that we have a test script t/ 
> BioZombieKitten.t which stores its test data in t/data/ 
> BioZombieKitten/input.file but the script gets the path to this  
> file by:
> my $input_file = test_input_file('input.file');
>
> test_input_file() is then implemented to look for the file in the  
> subdir of data corresponding to the script name if we're dealing  
> with the 900-modules-in-a-package checkout-type situation, but just  
> in t/data if we're in the one-module-in-a-package situation.
>
> In any case, things will be most flexible if you drop files  
> directly into t/data for now and reference them without any subdirs  
> in the call to test_input_file().

Fine by me, I just find it very cluttered.

BioZombieKitten?!?

chris


From bix at sendu.me.uk  Mon Jul  2 10:00:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 15:00:37 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
	<61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
Message-ID: <46890505.1070707@sendu.me.uk>

Chris Fields wrote:
> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:
> Fine by me, I just find it very cluttered.

Yes, I agree. I also wish we had a decent naming convention for files. 
(Ie. it would be nice to have a good idea what a file was for without 
having to study the test script that uses it.)


> BioZombieKitten?!?

I get Bio/perl/ and Bio/ware/ confused in my head ;)
http://forums.bioware.com/viewtopic.html?topic=562916&forum=84


From bix at sendu.me.uk  Mon Jul  2 09:46:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 14:46:41 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
Message-ID: <468901C1.8020505@sendu.me.uk>

Chris Fields wrote:
> I am planing on adding test data to cvs for eutils and have run across 
> some stuff in bugzilla that needs to be added as well.
> 
> Should we, as convention, start adding data sequestered to a fold with 
> the test name, within t/data?

I'd actually argue that this shouldn't be done: data is sometimes reused 
amongst multiple different test scripts, and when looking for data to 
reuse its easier to spot it in a single directory compared to searching 
through multiple directories.


> This might make life easier in the long 
> run (keep track of files, get rid of old files, etc), and may make it 
> easier for wrapping up the correct data with tests if we start 
> submitting single module CPAN updates.

I don't think that will be an issue. The automated process would read 
the test script and see what input files it uses, copying those into the 
archive. So, just be sure to standardise on using test_input_file() to 
make that possible.


That said, I wouldn't mind especially either way. Just don't do it now, 
since test script names (and therefore the name of the directory you'd 
want to store the input files in) might all change.


In fact we can imagine that we have a test script t/BioZombieKitten.t 
which stores its test data in t/data/BioZombieKitten/input.file but the 
script gets the path to this file by:
my $input_file = test_input_file('input.file');

test_input_file() is then implemented to look for the file in the subdir 
of data corresponding to the script name if we're dealing with the 
900-modules-in-a-package checkout-type situation, but just in t/data if 
we're in the one-module-in-a-package situation.

In any case, things will be most flexible if you drop files directly 
into t/data for now and reference them without any subdirs in the call 
to test_input_file().


From hlapp at gmx.net  Mon Jul  2 16:02:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 16:02:37 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>

Just FYI, after applying the changes I've been sending, I was able to  
check out the repository in its entirety.

	-hilmar

On Jun 30, 2007, at 8:48 PM, George Hartzell wrote:

>
> There's a second cut at the subversion repository.  I've done a better
> job of setting svn:keywords and svn:eol-style on various files.  The
> defaults were more cautious and I used an auto-props files based on
> the wiki version.
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>
> The old repository's still around as
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>
> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it, you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From wrp at virginia.edu  Mon Jul  2 16:08:04 2007
From: wrp at virginia.edu (William R. Pearson)
Date: Mon, 2 Jul 2007 16:08:04 -0400
Subject: [Bioperl-l] Course: Computational and Comparative Genomics
Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu>


Course announcement - Application deadline, July 15, 2007

================================================================

Cold Spring Harbor
COMPUTATIONAL & COMPARATIVE GENOMICS
November 7 - 13, 200
Application Deadline: July 15, 2007

INSTRUCTORS:

Pearson, William, Ph.D., University of Virginia, Charlottesville, VA
Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of
Prussia, PA

Beyond BLAST and FASTA - Alignment: from proteins to genomes - This
course presents a comprehensive overview of the theory and practice of
computational methods for extracting the maximum amount of information
from protein and DNA sequence similarity through sequence database
searches, statistical analysis, and multiple sequence alignment, and
genome scale alignment. Additional topics include gene finding,
dentifying signals in unaligned sequences, integration of genetic and
sequence information in biological databases.

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course makes extensive use of local WWW pages to present
problem sets and the computing tools to solve them. Students use
Windows and Mac workstations attached to a UNIX server.

The course is designed for biologists seeking advanced training in
biological sequence analysis, computational biology core resource
directors and staff, and for scientists in other disciplines, such as
computer science, who wish to survey current research problems in
biological sequence analysis and comparative genomics.

The primary focus of the Computational and Comparative Genomics Course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms. Cold Spring Harbor also offers a
"Programming for Biology" course, which focuses more on software
development.

For additional information and the lecture schedule and problem sets
for the 2006 course, see:

         http://fasta.bioch.virginia.edu/cshl06

================================================================

To apply to the course, fill out and send in the form at:

         http://meetings.cshl.edu/courses/courseapplication.asp

================================================================

Bill Pearson


From niels at genomics.dk  Mon Jul  2 16:45:07 2007
From: niels at genomics.dk (Niels Larsen)
Date: Mon, 02 Jul 2007 22:45:07 +0200
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
References: <18054.63942.316904.413911@almost.alerce.com>
	<F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
Message-ID: <468963D3.3000007@genomics.dk>

I write hoping someone could show me how to create a PrimarySeq
object without parsing features and all first. The lines below
return

"Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16."

whereas calling Bio::SeqIO-> gives no error, but a too big object.
The GenBank record after the __END__ is the "1.gb" file. I could not
find out how from the tutorial or the Bio::PrimarySeq description.

Niels L


#!/usr/bin/env perl

use strict;
use warnings FATAL => qw ( all );

use Data::Dumper;

use Bio::Seq;
use Bio::SeqIO;

my ( $seq_h, $seq );

$seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' );
# $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' );

$seq = $seq_h->next_seq();

# print Dumper( $seq );

__END__

LOCUS       X60065                     9 bp    mRNA    linear   MAM 14-NOV-2006
DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
ACCESSION   X60065 REGION: 1..9
VERSION     X60065.1  GI:5
KEYWORDS    beta-2 glycoprotein I.
SOURCE      Bos taurus (cattle)
   ORGANISM  Bos taurus
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
             Pecora; Bovidae; Bovinae; Bos.
REFERENCE   1
   AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and
             Kristensen,T.
   TITLE     Complete primary structure of bovine beta 2-glycoprotein I:
             localization of the disulfide bridges
   JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
    PUBMED   1567819
REFERENCE   2  (bases 1 to 9)
   AUTHORS   Kristensen,T.
   TITLE     Direct Submission
   JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology,
             University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C,
             DENMARK
FEATURES             Location/Qualifiers
      source          1..9
                      /organism="Bos taurus"
                      /mol_type="mRNA"
                      /db_xref="taxon:9913"
                      /clone="pBB2I"
                      /tissue_type="liver"
      gene            <1..>9
                      /gene="beta-2-gpI"
      CDS             <1..>9
                      /gene="beta-2-gpI"
                      /codon_start=1
                      /product="beta-2-glycoprotein I"
                      /protein_id="CAA42669.1"
                      /db_xref="GI:6"
                      /db_xref="GOA:P17690"
                      /db_xref="UniProtKB/Swiss-Prot:P17690"
                      /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
                      VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
                      ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
                      SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
                      PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
                      VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
                      DASDVKPC"
      sig_peptide     <1..>9
                      /gene="beta-2-gpI"
ORIGIN
         1 ccagcgctc
//


From Kevin.M.Brown at asu.edu  Mon Jul  2 17:35:12 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 2 Jul 2007 14:35:12 -0700
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <468963D3.3000007@genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>

Start by having a look at the following link:
http://bioperl.org/cgi-bin/deob_interface.cgi

SeqIO is how one reads or writes sequences to/from files.
Bio::PrimarySeq is just an object that holds information about a
sequence obtained from a file.

As for how to parse a Genbank file into a list of features:

$file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
while (my $seq = $file->next_seq())
{
	@features = $seq->all_SeqFeatures;
	# sort features by their primary tags
	for my $f (@features)
	{
		my $tag = $f->primary_tag;
		if ($tag eq 'CDS')
		{
			# @sorted_features holds all the Bio::PrimarySeq
features obtained from the genbank file
			push @sorted_features, $f; 
		}
	}
}
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Niels Larsen
> Sent: Monday, July 02, 2007 1:45 PM
> Cc: bioperl-l List
> Subject: [Bioperl-l] simple PrimarySeq question
> 
> I write hoping someone could show me how to create a 
> PrimarySeq object without parsing features and all first. The 
> lines below return
> 
> "Can't locate object method "next_seq" via package 
> "Bio::PrimarySeq" at ./tst2 line 16."
> 
> whereas calling Bio::SeqIO-> gives no error, but a too big object.
> The GenBank record after the __END__ is the "1.gb" file. I 
> could not find out how from the tutorial or the 
> Bio::PrimarySeq description.
> 
> Niels L
> 
> 
> #!/usr/bin/env perl
> 
> use strict;
> use warnings FATAL => qw ( all );
> 
> use Data::Dumper;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my ( $seq_h, $seq );
> 
> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 
> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", 
> -format => 'genbank' );
> 
> $seq = $seq_h->next_seq();
> 
> # print Dumper( $seq );
> 
> __END__
> 
> LOCUS       X60065                     9 bp    mRNA    linear 
>   MAM 14-NOV-2006
> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
> ACCESSION   X60065 REGION: 1..9
> VERSION     X60065.1  GI:5
> KEYWORDS    beta-2 glycoprotein I.
> SOURCE      Bos taurus (cattle)
>    ORGANISM  Bos taurus
>              Eukaryota; Metazoa; Chordata; Craniata; 
> Vertebrata; Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; 
> Cetartiodactyla; Ruminantia;
>              Pecora; Bovidae; Bovinae; Bos.
> REFERENCE   1
>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., 
> Sottrup-Jensen,L. and
>              Kristensen,T.
>    TITLE     Complete primary structure of bovine beta 
> 2-glycoprotein I:
>              localization of the disulfide bridges
>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>     PUBMED   1567819
> REFERENCE   2  (bases 1 to 9)
>    AUTHORS   Kristensen,T.
>    TITLE     Direct Submission
>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of 
> Mol Biology,
>              University of Aarhus, C F Mollers Alle 130, 
> DK-8000 Aarhus C,
>              DENMARK
> FEATURES             Location/Qualifiers
>       source          1..9
>                       /organism="Bos taurus"
>                       /mol_type="mRNA"
>                       /db_xref="taxon:9913"
>                       /clone="pBB2I"
>                       /tissue_type="liver"
>       gene            <1..>9
>                       /gene="beta-2-gpI"
>       CDS             <1..>9
>                       /gene="beta-2-gpI"
>                       /codon_start=1
>                       /product="beta-2-glycoprotein I"
>                       /protein_id="CAA42669.1"
>                       /db_xref="GI:6"
>                       /db_xref="GOA:P17690"
>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>                       
> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>                       
> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>                       
> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>                       
> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>                       
> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>                       
> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>                       DASDVKPC"
>       sig_peptide     <1..>9
>                       /gene="beta-2-gpI"
> ORIGIN
>          1 ccagcgctc
> //
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From niels at genomics.dk  Mon Jul  2 20:41:24 2007
From: niels at genomics.dk (niels at genomics.dk)
Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST)
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>

Kevin,

Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
gets entries from file, and from those large parsed entries I can get a
simplified primary_seq object. But the SeqIO object includes feature
and annotation objects etc that takes time to make, and I wish to know
if there is a way to get a primari_seq object without this overhead. I
apologize if I overlooked it in the docs.

Niels


> Start by having a look at the following link:
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> SeqIO is how one reads or writes sequences to/from files.
> Bio::PrimarySeq is just an object that holds information about a
> sequence obtained from a file.
>
> As for how to parse a Genbank file into a list of features:
>
> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
> while (my $seq = $file->next_seq())
> {
> 	@features = $seq->all_SeqFeatures;
> 	# sort features by their primary tags
> 	for my $f (@features)
> 	{
> 		my $tag = $f->primary_tag;
> 		if ($tag eq 'CDS')
> 		{
> 			# @sorted_features holds all the Bio::PrimarySeq
> features obtained from the genbank file
> 			push @sorted_features, $f;
> 		}
> 	}
> }
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Niels Larsen
>> Sent: Monday, July 02, 2007 1:45 PM
>> Cc: bioperl-l List
>> Subject: [Bioperl-l] simple PrimarySeq question
>>
>> I write hoping someone could show me how to create a
>> PrimarySeq object without parsing features and all first. The
>> lines below return
>>
>> "Can't locate object method "next_seq" via package
>> "Bio::PrimarySeq" at ./tst2 line 16."
>>
>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>> The GenBank record after the __END__ is the "1.gb" file. I
>> could not find out how from the tutorial or the
>> Bio::PrimarySeq description.
>>
>> Niels L
>>
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings FATAL => qw ( all );
>>
>> use Data::Dumper;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>>
>> my ( $seq_h, $seq );
>>
>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>> -format => 'genbank' );
>>
>> $seq = $seq_h->next_seq();
>>
>> # print Dumper( $seq );
>>
>> __END__
>>
>> LOCUS       X60065                     9 bp    mRNA    linear
>>   MAM 14-NOV-2006
>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>> ACCESSION   X60065 REGION: 1..9
>> VERSION     X60065.1  GI:5
>> KEYWORDS    beta-2 glycoprotein I.
>> SOURCE      Bos taurus (cattle)
>>    ORGANISM  Bos taurus
>>              Eukaryota; Metazoa; Chordata; Craniata;
>> Vertebrata; Euteleostomi;
>>              Mammalia; Eutheria; Laurasiatheria;
>> Cetartiodactyla; Ruminantia;
>>              Pecora; Bovidae; Bovinae; Bos.
>> REFERENCE   1
>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>> Sottrup-Jensen,L. and
>>              Kristensen,T.
>>    TITLE     Complete primary structure of bovine beta
>> 2-glycoprotein I:
>>              localization of the disulfide bridges
>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>     PUBMED   1567819
>> REFERENCE   2  (bases 1 to 9)
>>    AUTHORS   Kristensen,T.
>>    TITLE     Direct Submission
>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>> Mol Biology,
>>              University of Aarhus, C F Mollers Alle 130,
>> DK-8000 Aarhus C,
>>              DENMARK
>> FEATURES             Location/Qualifiers
>>       source          1..9
>>                       /organism="Bos taurus"
>>                       /mol_type="mRNA"
>>                       /db_xref="taxon:9913"
>>                       /clone="pBB2I"
>>                       /tissue_type="liver"
>>       gene            <1..>9
>>                       /gene="beta-2-gpI"
>>       CDS             <1..>9
>>                       /gene="beta-2-gpI"
>>                       /codon_start=1
>>                       /product="beta-2-glycoprotein I"
>>                       /protein_id="CAA42669.1"
>>                       /db_xref="GI:6"
>>                       /db_xref="GOA:P17690"
>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>
>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>
>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>
>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>
>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>
>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>
>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>                       DASDVKPC"
>>       sig_peptide     <1..>9
>>                       /gene="beta-2-gpI"
>> ORIGIN
>>          1 ccagcgctc
>> //
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From hlapp at gmx.net  Mon Jul  2 22:36:19 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 22:36:19 -0400
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
	<23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net>

Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have  
examples for what you want to do:

      use Bio::SeqIO;
      # usually you won't instantiate this yourself - a SeqIO object -
      # you will have one already
      my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
      my $builder = $seqin->sequence_builder();

      # if you need only sequence, id, and description (e.g. for
      # conversion to FASTA format):
      $builder->want_none();
      $builder->add_wanted_slot('display_id','desc','seq');

      # if you want everything except the sequence and features
      $builder->want_all(1); # this is the default if it's untouched
      $builder->add_unwanted_slot('seq','features');

Let us know if that doesn't answer your question.

Note that this is currently only implemented for Genbank format.

	-hilmar

On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:

> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can  
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> 	@features = $seq->all_SeqFeatures;
>> 	# sort features by their primary tags
>> 	for my $f (@features)
>> 	{
>> 		my $tag = $f->primary_tag;
>> 		if ($tag eq 'CDS')
>> 		{
>> 			# @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> 			push @sorted_features, $f;
>> 		}
>> 	}
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS       X60065                     9 bp    mRNA    linear
>>>   MAM 14-NOV-2006
>>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION   X60065 REGION: 1..9
>>> VERSION     X60065.1  GI:5
>>> KEYWORDS    beta-2 glycoprotein I.
>>> SOURCE      Bos taurus (cattle)
>>>    ORGANISM  Bos taurus
>>>              Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>              Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>>              Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE   1
>>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>>              Kristensen,T.
>>>    TITLE     Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>>              localization of the disulfide bridges
>>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>>     PUBMED   1567819
>>> REFERENCE   2  (bases 1 to 9)
>>>    AUTHORS   Kristensen,T.
>>>    TITLE     Direct Submission
>>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>>              University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>>              DENMARK
>>> FEATURES             Location/Qualifiers
>>>       source          1..9
>>>                       /organism="Bos taurus"
>>>                       /mol_type="mRNA"
>>>                       /db_xref="taxon:9913"
>>>                       /clone="pBB2I"
>>>                       /tissue_type="liver"
>>>       gene            <1..>9
>>>                       /gene="beta-2-gpI"
>>>       CDS             <1..>9
>>>                       /gene="beta-2-gpI"
>>>                       /codon_start=1
>>>                       /product="beta-2-glycoprotein I"
>>>                       /protein_id="CAA42669.1"
>>>                       /db_xref="GI:6"
>>>                       /db_xref="GOA:P17690"
>>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>>                       DASDVKPC"
>>>       sig_peptide     <1..>9
>>>                       /gene="beta-2-gpI"
>>> ORIGIN
>>>          1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ewijaya at gmail.com  Tue Jul  3 02:56:30 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 14:56:30 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at gmail.com  Tue Jul  3 03:00:16 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 15:00:16 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at i2r.a-star.edu.sg  Tue Jul  3 02:35:12 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 3 Jul 2007 14:35:12 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net>

 
Dear all, 
I was trying to perform check with this command:
 
$ perl -MGD -e 'print $GD::VERSION';

And it gave: 
 
GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

 
I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
 
Can anybody suggest how can I resolve my problem?
 
This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi
 
--
Edward

------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.--------------------------------------------------------


From lstein at cshl.edu  Tue Jul  3 10:41:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 3 Jul 2007 10:40:26 -0401
Subject: [Bioperl-l] Problem with GD.pm version 2.35
In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com>

This happens when there is a mismatch between the compiled (.so) portion of
GD and the perl (.pm) version. Typically it occurs when you have installed
GD incorrectly by, e.g., copying the .pm file into position rather than
using the make file.

Solution: Uninstall old versions of GD by manually finding all occurrences
of GD.so and GD.pm and removing them. Then reinstall the correct way.

Lincoln

On 7/3/07, Edward Wijaya <ewijaya at gmail.com> wrote:
>
> Dear all,
> I was trying to perform check with this command:
>
> $ perl -MGD -e 'print $GD::VERSION';
>
> And it gave:
>
> GD object version 2.32 does not match $GD::VERSION 2.35 at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
> Compilation failed in require.
> BEGIN failed--compilation aborted.
>
> Similarly my script that uses GD.pm doesn't execute.
>
>
> I have installed the latest version of libgd version 2.0.35 downloaded
> from
> http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
>
> Can anybody suggest how can I resolve my problem?
>
> This is my Perl version:
> This is perl, v5.8.8 built for i386-linux-thread-multi
>
> --
> Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Jul  4 01:45:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 00:45:16 -0500
Subject: [Bioperl-l] genbank2gff3 - Name attribute?
Message-ID: <C790FCC2-81E5-4BB4-A9CB-E2E59E5ABE27@uiuc.edu>

I noticed that genbank2gff3.pl doesn't have an explicitly defined way  
of converting the gene/locus/etc name to a Name tag (for, say,  
GBrowse).  Any particular reason?

Should I stick with GFF2 for now?

chris


From bix at sendu.me.uk  Wed Jul  4 06:00:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 04 Jul 2007 11:00:31 +0100
Subject: [Bioperl-l] Splitting Bioperl
Message-ID: <468B6FBF.1070708@sendu.me.uk>

To summarise some previous threads:
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409

# Bioperl is currently one monolithic distribution of ~900 modules
# There is some desire to split it up into smaller functional groups
# There are some problems with that proposal
# An extreme variant of that proposal is to make the groups individual 
modules


Following this discussion:
http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
(especially Adam Kennedy's postings of 4/07, soon to appear in that 
archive), the extreme variant doesn't seem like a good idea.


I'm now suggesting that Steve's original split idea, as 
modified/expanded by Adam's driver and other ideas, is the best choice. 
The problems I previously identified can be solved in the same way they 
were solved in my extreme variant: the splits are done by Build.PL 
automation working on a single repository/code-base, not by splitting 
things up at the repository level.


As I see it, the way forward now is for someone interested enough to 
decide on the specifics of how things will be split and offer them up to 
the group for discussion. I don't mean vague possibilities of what might 
work as a split, but rather some real thought should go into it to make 
sure the split makes sense and will actually work in practice.

Following that, the splits can be implemented by some automated dist 
action of Build.PL.


If there isn't sufficient interest to make this happen, I don't see that 
as a terrible thing. There are benefits to keeping Bioperl monolithic, 
and some of the problems (eg. lack of updates) can be solved without 
changing its nature.


From cjfields at uiuc.edu  Wed Jul  4 10:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 09:53:45 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <468B6FBF.1070708@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>


On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote:

> To summarise some previous threads:
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ 
> focus=15409
>
> # Bioperl is currently one monolithic distribution of ~900 modules
> # There is some desire to split it up into smaller functional groups
> # There are some problems with that proposal
> # An extreme variant of that proposal is to make the groups individual
> modules
>
>
> Following this discussion:
> http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
> (especially Adam Kennedy's postings of 4/07, soon to appear in that
> archive), the extreme variant doesn't seem like a good idea.

brian d foy made some sound arguments against it as well.

> I'm now suggesting that Steve's original split idea, as
> modified/expanded by Adam's driver and other ideas, is the best  
> choice.
> The problems I previously identified can be solved in the same way  
> they
> were solved in my extreme variant: the splits are done by Build.PL
> automation working on a single repository/code-base, not by splitting
> things up at the repository level.
>
> As I see it, the way forward now is for someone interested enough to
> decide on the specifics of how things will be split and offer them  
> up to
> the group for discussion. I don't mean vague possibilities of what  
> might
> work as a split, but rather some real thought should go into it to  
> make
> sure the split makes sense and will actually work in practice.

We've already identified a few (SearchIO, Tools, GBrowse-related, etc).
...
> If there isn't sufficient interest to make this happen, I don't see  
> that
> as a terrible thing. There are benefits to keeping Bioperl monolithic,
> and some of the problems (eg. lack of updates) can be solved without
> changing its nature.

If so, proposals that solve this problem need to be made as well.

If we stay monolithic, then here's mine: we start having fixed,  
regularly timed dev releases like Parrot, monthly or bimonthly (quite  
common on CPAN), with brief release reports on which bugs have been  
fixed, code has been added, so on.  Not every bug has to be fixed per  
dev release; if that were true there would never be releases for some  
of the XML parser packages.  No RCs for dev releases (it's a dev  
release!).  These would be 1.x.y.  We can then, every once in a  
while, have a bug-squashing session, hackathon, etc, and have regular  
non-dev release (1.x) that all core devs accept and that passes a  
particular milestone.

As for the advantage of a split approach, as mentioned previously it  
is to focus modules/tests/scripts into groups with related  
functions.  Even just splitting off ones with external reqs (XML  
parsers, GD, etc) into an 'aux' release would be an advantage, as it  
doesn't confront a new user with the burden of installing a large  
list of dependencies, some of which may be complicated for a perl  
newbie to either install from scratch (DBD::mysql, GD) or to get the  
latest bug-fixed prereq release for their OS (the recent debacle with  
XML::SAX::Expat issues come to mind, which wasn't immediately  
available for win32 as a PPM).

I'm fairly open to any approach as long as it's reasonably though  
out, though I am admittedly a bit biased towards the split approach.   
I do think some change is in order; I worry about there ever being a  
1.6 release at this point.

chris


From davila at ioc.fiocruz.br  Wed Jul  4 13:11:20 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Wed, 04 Jul 2007 14:11:20 -0300
Subject: [Bioperl-l] ESTs in EST format
Message-ID: <468BD4B8.5050105@ioc.fiocruz.br>

Dear All,

I am trying to get all ESTs from a given species (eg: Trypanosoma 
brucei) from Genbank in EST format (eg: 
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... 
while using Entrez I can "display" individual EST entries in EST format, 
this "EST format" is not an option in the main "display" menu for batch 
download ...

I dont see the EST format listed 
(http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO 
deal with, so wonder there would another BioPerl module to do this ? any 
tips, would be greatly appreciated ;-)

Kindest regards, Alberto


From jason at bioperl.org  Wed Jul  4 13:52:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 10:52:59 -0700
Subject: [Bioperl-l] ESTs in EST format
In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br>
References: <468BD4B8.5050105@ioc.fiocruz.br>
Message-ID: <D0D013CC-1D28-46D6-A94F-EA53C7EC5219@bioperl.org>

Currently we don't support this format as far as I know it isn't a  
published standard nor is it a format that you NCBI distributes this  
data in flat format for (i.e. genbank dumps).

Is there any reason why you can't get what you need from the GenBank  
format?
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb

-jason
On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote:

> Dear All,
>
> I am trying to get all ESTs from a given species (eg: Trypanosoma
> brucei) from Genbank in EST format (eg:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucest&id=10280980)...
> while using Entrez I can "display" individual EST entries in EST  
> format,
> this "EST format" is not an option in the main "display" menu for  
> batch
> download ...
>
> I dont see the EST format listed
> (http://www.bioperl.org/wiki/Sequence_formats) among the ones that  
> SeqIO
> deal with, so wonder there would another BioPerl module to do  
> this ? any
> tips, would be greatly appreciated ;-)
>
> Kindest regards, Alberto
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Wed Jul  4 14:37:22 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Jul 2007 13:37:22 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>


On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:

>  we start having fixed,
> regularly timed dev releases like Parrot, monthly or bimonthly (quite
> common on CPAN), with brief release reports on which bugs have been
> fixed, code has been added, so on.  Not every bug has to be fixed per
> dev release; if that were true there would never be releases for some
> of the XML parser packages.  No RCs for dev releases (it's a dev
> release!).  These would be 1.x.y.  We can then, every once in a
> while, have a bug-squashing session, hackathon, etc, and have regular
> non-dev release (1.x) that all core devs accept and that passes a
> particular milestone.


Regardless of whether we split or don't, I think these ideas of  
adding a little more structure to BioPerl's development cycles --  
especially having bug-squashing and hacking sessions, where we all  
band together and commit some time to cranking through a bunch of to- 
dos -- would be beneficial, particularly as a means to keeping a  
certain basal level of momentum in BioPerl.

Dave


From jason at bioperl.org  Wed Jul  4 15:45:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 12:45:29 -0700
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
Message-ID: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>

I definitely agree - we can live up to the unstable "living on the  
edge" nature of dev releases a bit more perhaps?


On Jul 4, 2007, at 11:37 AM, David Messina wrote:

>
> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>
>>  we start having fixed,
>> regularly timed dev releases like Parrot, monthly or bimonthly (quite
>> common on CPAN), with brief release reports on which bugs have been
>> fixed, code has been added, so on.  Not every bug has to be fixed per
>> dev release; if that were true there would never be releases for some
>> of the XML parser packages.  No RCs for dev releases (it's a dev
>> release!).  These would be 1.x.y.  We can then, every once in a
>> while, have a bug-squashing session, hackathon, etc, and have regular
>> non-dev release (1.x) that all core devs accept and that passes a
>> particular milestone.
>
>
> Regardless of whether we split or don't, I think these ideas of
> adding a little more structure to BioPerl's development cycles --
> especially having bug-squashing and hacking sessions, where we all
> band together and commit some time to cranking through a bunch of to-
> dos -- would be beneficial, particularly as a means to keeping a
> certain basal level of momentum in BioPerl.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Wed Jul  4 16:54:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 15:54:14 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
Message-ID: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>

I think what's partially responsible for slowing down releases is the  
expectation that each dev release is supposed to have all bugs fixed,  
work for every OS, etc.  In other words, act like a stable release.

A developer release by nature is living on the edge, so why not have  
regular dev releases?  We keep telling users to update to using  
bioperl-live whenever something breaks, anyway.  We could decide to  
split stuff off along the way into more 'stable' sections if there  
were more demand for it, and have the more API-volatile code  
(DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
'dev' tag until we feel it's ready for prime time.

chris

On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:

> I definitely agree - we can live up to the unstable "living on the
> edge" nature of dev releases a bit more perhaps?
>
>
> On Jul 4, 2007, at 11:37 AM, David Messina wrote:
>
>>
>> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>>
>>>  we start having fixed,
>>> regularly timed dev releases like Parrot, monthly or bimonthly  
>>> (quite
>>> common on CPAN), with brief release reports on which bugs have been
>>> fixed, code has been added, so on.  Not every bug has to be fixed  
>>> per
>>> dev release; if that were true there would never be releases for  
>>> some
>>> of the XML parser packages.  No RCs for dev releases (it's a dev
>>> release!).  These would be 1.x.y.  We can then, every once in a
>>> while, have a bug-squashing session, hackathon, etc, and have  
>>> regular
>>> non-dev release (1.x) that all core devs accept and that passes a
>>> particular milestone.
>>
>>
>> Regardless of whether we split or don't, I think these ideas of
>> adding a little more structure to BioPerl's development cycles --
>> especially having bug-squashing and hacking sessions, where we all
>> band together and commit some time to cranking through a bunch of to-
>> dos -- would be beneficial, particularly as a means to keeping a
>> certain basal level of momentum in BioPerl.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Thu Jul  5 04:09:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 09:09:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
Message-ID: <468CA721.4020804@sheffield.ac.uk>

Chris Fields wrote:
> I think what's partially responsible for slowing down releases is the  
> expectation that each dev release is supposed to have all bugs fixed,  
> work for every OS, etc.  In other words, act like a stable release.
>
> A developer release by nature is living on the edge, so why not have  
> regular dev releases?  We keep telling users to update to using  
> bioperl-live whenever something breaks, anyway.  We could decide to  
> split stuff off along the way into more 'stable' sections if there  
> were more demand for it, and have the more API-volatile code  
> (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
> 'dev' tag until we feel it's ready for prime time.
>
> chris
>
> On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:
>
>   
-- snip --

I agree, although would the dev releases still need to pass all the 
tests? I'm thinking of people installing via CPAN.

I also agree with what was said in a previous post about bringing back 
bioperl-run (and some others) back into the same repository as 
bioperl-core (after a successful move over to svn) and have Build.PL 
deal with creating the packages etc for CPAN. This would hopefully help 
keep the run package (and others) up to speed with the core package.

I also agree with previous posts about organising and/or having some 
naming convention for test data files. I think an approach whereby data 
files were organised into directory trees (1 - 3 deep) with names that 
elude to the type of data in that subtree/file rather than the tests 
that use it etc. For example:

t/data
    |__ formats
    |           |__ seq
    |           |        |__ legal_fasta
    |           |        |              |__ extension.fas
    |           |        |              |__ extension.fasta
    |           |        |              |__ extension.foo
    |           |        |              |__ extension.bar
    |           |        |              |__ no_extension
    |           |        |              |__ interleaved.fas
    |           |        |              |__ non_interleaved.fas
    |           |        |              |__ single_seq.fas
    |           |        |              |__ multiple_seq.fas
    |           |        |              |__ desc_line1.fas
    |           |        |              |__ desc_line2.fas
    |           |        |
    |           |        |__ illegal_fasta
    |           |        |              |__ illegal_chars.fas
    |           |        |              |__ 
some_other_illegal_alternative.fas
    |           |        |
    |           |        |__ legal_genbank
    |           |        |              |__ etc etc
    |           |        |
    |           |        |__ illegal_genank
    |           |                      |__ etc etc
    |           |
    |           |__ aln
    |           |__ blast
    |           |        |__ legal_blastx
    |           |        |
    |           |        |__ legal_blastp
    |           |        |
    |           |        |__ legal_tblastx
    |           |        |
    |           |        |__ legal_plastpsi
    |           |        |
    |           |        |__ legal_wublast
    |           |__ foo
    |           |__ bar
    |           |__ misc
    |
    |__ etc

This type of setup, might lend itself to having a test script simply try 
to parse all the files in a directory to ensure nothing fails (for legal 
file formats) and fails for illegal formats. Naming of the file paths 
would help test authors to identify a suitable data file for their own 
tests before adding their own to the t/data dir. It might also help to 
identify areas where example test data is currently lacking.

Thinking about this a little more, I think it would be a good idea to 
include Test::Exception in t/lib. We should also be testing that 
warnings and exceptions are generated when expected - e.g. illegal 
characters in seq files etc etc. Without these sorts of tests we are 
only getting half the story. This testing might account for a large 
chunk of the poor test coverage, particularly when it comes to branches 
in the code.

Anyway, this type of reorganisation couldn't take place until the svn 
repo is up and working.

I'd appreciate any comments on the above!
Nath


From bix at sendu.me.uk  Thu Jul  5 04:55:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 09:55:25 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <468CB1FD.7060301@sendu.me.uk>

Nathan S. Haigh wrote:
> I agree, although would the dev releases still need to pass all the 
> tests? I'm thinking of people installing via CPAN.

Yes, they'd all have to pass. 'Developer release' should never have the 
connotation of 'broken release'. However, getting all tests to pass is a 
lot easier than fixing all bugs in bugzilla.

(... which actually goes to show how poor our tests are)

Worst case, if we were forced to stick to a schedule but couldn't fix a 
failing test, we could always make it a 'todo' test.


> I also agree with what was said in a previous post about bringing back 
> bioperl-run (and some others) back into the same repository as 
> bioperl-core (after a successful move over to svn)

Agree (with myself essentially).


> I also agree with previous posts about organising and/or having some 
> naming convention for test data files. I think an approach whereby data 
> files were organised into directory trees (1 - 3 deep) with names that 
> elude to the type of data in that subtree/file rather than the tests 
> that use it etc. For example:
> 
> t/data
>     |__ formats
>     |           |__ seq
>     |           |        |__ legal_fasta
>     |           |        |              |__ extension.fas
[snip]

At that level, files don't need extensions and can have fully 
informative names that explain what's interesting or special about them.


> This type of setup, might lend itself to having a test script simply try 
> to parse all the files in a directory to ensure nothing fails (for legal 
> file formats) and fails for illegal formats.

Great idea.


> Thinking about this a little more, I think it would be a good idea to 
> include Test::Exception in t/lib.

Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.


> Anyway, this type of reorganisation couldn't take place until the svn 
> repo is up and working.

Agree.


From bix at sendu.me.uk  Thu Jul  5 05:39:10 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 10:39:10 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>
	<468CB1FD.7060301@sendu.me.uk>
Message-ID: <468CBC3E.1020408@sendu.me.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Thinking about this a little more, I think it would be a good idea to 
>> include Test::Exception in t/lib.
> 
> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.

I've now done that: BioperlTest loads Test::Exception, from the copy in 
t/lib if necessary.

So, in BioperlTest-using scripts you now have access to the methods 
dies_ok, lives_ok, throws_ok and lives_and.


From N.Haigh at sheffield.ac.uk  Thu Jul  5 06:01:04 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 11:01:04 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk>

Quoting Sendu Bala <bix at sendu.me.uk>:

-- snip --
> 
> 
> > I also agree with previous posts about organising and/or having some 
> > naming convention for test data files. I think an approach whereby data 
> > files were organised into directory trees (1 - 3 deep) with names that 
> > elude to the type of data in that subtree/file rather than the tests 
> > that use it etc. For example:
> > 
> > t/data
> >     |__ formats
> >     |           |__ seq
> >     |           |        |__ legal_fasta
> >     |           |        |              |__ extension.fas
> [snip]
> 
> At that level, files don't need extensions and can have fully 
> informative names that explain what's interesting or special about them.
> 

You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside
the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to
check that the peek inside the file correctly determines the format.

-- snip --


From bix at sendu.me.uk  Thu Jul  5 06:04:16 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:04:16 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
Message-ID: <468CC220.804@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Sendu Bala <bix at sendu.me.uk>:
> 
> -- snip --
>> 
>>> I also agree with previous posts about organising and/or having
>>> some naming convention for test data files. I think an approach
>>> whereby data files were organised into directory trees (1 - 3
>>> deep) with names that elude to the type of data in that
>>> subtree/file rather than the tests that use it etc. For example:
>>> 
>>> t/data |__ formats |           |__ seq |           |        |__
>>> legal_fasta |           |        |              |__ extension.fas
>>> 
>> [snip]
>> 
>> At that level, files don't need extensions and can have fully 
>> informative names that explain what's interesting or special about
>> them.
>> 
> 
> You may be correct in most cases, however, isn't there a method for
> detecting the file format from the file extension and failing that it
> peeks inside the file? Therefore there should be a file extension for
> each of these to get good code coverage as well as each format not
> having an extension to check that the peek inside the file correctly
> determines the format.

Yes, you're quite correct.


From bix at sendu.me.uk  Thu Jul  5 06:47:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:47:12 +0100
Subject: [Bioperl-l] Warnings
Message-ID: <468CCC30.90406@sendu.me.uk>

I'm trying to get Test::Warn to work with Bioperl warnings as produced 
by Bio::Root::RootI::warn(). However, afaict the warnings must be 
generated with CORE::warn(), not print STDERR.

Is there any particular reason RootI::warn is done with print and not 
CORE::warn ? Can I change it to a warn?


From bix at sendu.me.uk  Thu Jul  5 09:04:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:04:50 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
Message-ID: <468CEC72.4090909@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> My guess is that using 'print STDERR' avoids showing sometimes annoying 
>    errordescription  at programname line  NN
> syntax being used.

Afaik,

CORE::warn "anything\n";

never includes the line number: messages with a new line always disable 
that feature. Bio::Root::RootI::warn /always/ puts new lines into the 
message, so they /never/ have the line number.


> On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
> objects is to find where warnings are coming from. Maybe extra text in 
> warnings leads to easier debugging.
> 
> I favour changing it.

So its my understanding there will be absolutely no difference in 
behaviour following this change (except that warning can be caught by 
Test::Warn). I just wanted to confirm my understanding.


From hlapp at gmx.net  Thu Jul  5 09:07:27 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Jul 2007 09:07:27 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>


On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I think what's partially responsible for slowing down releases is the
>> expectation that each dev release is supposed to have all bugs fixed,
>> work for every OS, etc.  In other words, act like a stable release.
>>

It doesn't. A stable release has a stable API that will be supported  
until the next stable release through point releases.

>> A developer release by nature is living on the edge, so why not have
>> regular dev releases?

There's no problem with regular dev releases, but tests will need to  
pass. There was never a stipulation that all bugs need to have been  
fixed. But all tests need to pass, so in an ideal world (in which  
everything is being tested) all tests passing would imply all (known)  
bugs fixed. Obviously, we don't live in an ideal world ...

If not everything passes then what is the big difference to a code  
snapshot? If using cvs (or svn) is too difficult for most people, we  
can consider creating a mechanism that puts up nightly snapshots for  
download.

> -- snip --
>
> I agree, although would the dev releases still need to pass all the
> tests? I'm thinking of people installing via CPAN.

For example, that's another point.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From heikki at sanbi.ac.za  Thu Jul  5 09:12:37 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 15:12:37 +0200
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <200707051512.38185.heikki@sanbi.ac.za>


One more suggestion:

It would be extemaly useful if we had a standard way of testing that a when a 
file is read into a bioperl object and then written out again into a same 
format, the input and output files are identical. If not, the test should 
show where the the differences start (showing all the differences would just 
clutter the screen).

This standard method/subroutine should be used to test all sequence and other 
text file IO.

Any takers? 

	-Heikki

On Thursday 05 July 2007 11:39:10 Sendu Bala wrote:
> Sendu Bala wrote:
> > Nathan S. Haigh wrote:
> >> Thinking about this a little more, I think it would be a good idea to
> >> include Test::Exception in t/lib.
> >
> > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
>
> I've now done that: BioperlTest loads Test::Exception, from the copy in
> t/lib if necessary.
>
> So, in BioperlTest-using scripts you now have access to the methods
> dies_ok, lives_ok, throws_ok and lives_and.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jul  5 08:58:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 14:58:59 +0200
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CCC30.90406@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
Message-ID: <200707051458.59921.heikki@sanbi.ac.za>

My guess is that using 'print STDERR' avoids showing sometimes annoying 
   errordescription  at programname line  NN
syntax being used.

On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
objects is to find where warnings are coming from. Maybe extra text in 
warnings leads to easier debugging.

I favour changing it.

	-Heikki


On Thursday 05 July 2007 12:47:12 Sendu Bala wrote:
> I'm trying to get Test::Warn to work with Bioperl warnings as produced
> by Bio::Root::RootI::warn(). However, afaict the warnings must be
> generated with CORE::warn(), not print STDERR.
>
> Is there any particular reason RootI::warn is done with print and not
> CORE::warn ? Can I change it to a warn?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Jul  5 09:44:08 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:44:08 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF5A8.7040402@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that
> a when a file is read into a bioperl object and then written out
> again into a same format, the input and output files are identical.

As Hilmar has pointed out in the past, Bioperl doesn't aim for the files 
to be identical, only for none of the information to be lost and to be 
ouput in the correct format.

So a round-trip test should read in the original, store all the parsed 
data, write it out, then read in the written version and see if the new 
parsed data matches the original.


For simpler or ultra-strict file formats, though...

> If not, the test should show where the the differences start (showing
> all the differences would just clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence
> and other text file IO.
> 
> Any takers?

There's already something along these lines in t/SeqIO.t (the section
that uses Algorithm::Diff).

I copied that over from the old testformats.pl script but haven't really
taken the time to see if its a good way of doing the test.

Is it? Can someone come up with something better? Can someone generalise
it if necessary?

I imagine you could just read the files into arrays and use 
Test::More::is_deeply(). If that would be satisfactory I could easily 
add a little method to BioperlTest that did that.


From n.haigh at sheffield.ac.uk  Thu Jul  5 09:47:24 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 14:47:24 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF66C.2070907@sheffield.ac.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that a when a 
> file is read into a bioperl object and then written out again into a same 
> format, the input and output files are identical. If not, the test should 
> show where the the differences start (showing all the differences would just 
> clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence and other 
> text file IO.
> 
> Any takers? 
> 
> 	-Heikki
> 

Wouldn't this require info about the formatting of the file to be stored 
in the object as well, such that the same formatting could be used when 
writing the file?

Wouldn't a better approach be to read the contents of file1 into ojb1, 
write obj1 to file2 in the same format, and then read file2 into obj2 
and compare obj1 to obj2 to ensure we have all the same data.

Nath


From cjfields at uiuc.edu  Thu Jul  5 09:52:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 08:52:12 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <BECE91CB-980B-4063-8E85-291CC85DCDC1@uiuc.edu>


On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote:

> ...
> I agree, although would the dev releases still need to pass all the  
> tests? I'm thinking of people installing via CPAN.

Remains to be decided.  All current tests (net and non-non) should  
pass.  Any bug fixes should try to have added tests if possible, with  
in-process stuff as TODO's.  Network tests are left up to user  
discretion, so if they fail for any particular reason there is a way  
around them.

> I also agree with what was said in a previous post about bringing  
> back bioperl-run (and some others) back into the same repository as  
> bioperl-core (after a successful move over to svn) and have  
> Build.PL deal with creating the packages etc for CPAN. This would  
> hopefully help keep the run package (and others) up to speed with  
> the core package.

It's up to how we want to have everything split.  I don't think it's  
immediately prescient (there are more important priorities, i.e.  
bugs, svn) but I would say folding everything back into live and  
'splitting' them out using an automated Build process is a viable  
option.

> I also agree with previous posts about organising and/or having  
> some naming convention for test data files. I think an approach  
> whereby data files were organised into directory trees (1 - 3 deep)  
> with names that elude to the type of data in that subtree/file  
> rather than the tests that use it etc. For example:
>
> t/data
>    |__ formats
>    |           |__ seq
>    |           |        |__ legal_fasta
>    |           |        |              |__ extension.fas
>    |           |        |              |__ extension.fasta
>    |           |        |              |__ extension.foo
>    |           |        |              |__ extension.bar
>    |           |        |              |__ no_extension
>    |           |        |              |__ interleaved.fas
>    |           |        |              |__ non_interleaved.fas
>    |           |        |              |__ single_seq.fas
>    |           |        |              |__ multiple_seq.fas
>    |           |        |              |__ desc_line1.fas
>    |           |        |              |__ desc_line2.fas
>    |           |        |
>    |           |        |__ illegal_fasta
>    |           |        |              |__ illegal_chars.fas
>    |           |        |              |__  
> some_other_illegal_alternative.fas
>    |           |        |
>    |           |        |__ legal_genbank
>    |           |        |              |__ etc etc
>    |           |        |
>    |           |        |__ illegal_genank
>    |           |                      |__ etc etc
>    |           |
>    |           |__ aln
>    |           |__ blast
>    |           |        |__ legal_blastx
>    |           |        |
>    |           |        |__ legal_blastp
>    |           |        |
>    |           |        |__ legal_tblastx
>    |           |        |
>    |           |        |__ legal_plastpsi
>    |           |        |
>    |           |        |__ legal_wublast
>    |           |__ foo
>    |           |__ bar
>    |           |__ misc
>    |
>    |__ etc
>
> This type of setup, might lend itself to having a test script  
> simply try to parse all the files in a directory to ensure nothing  
> fails (for legal file formats) and fails for illegal formats.  
> Naming of the file paths would help test authors to identify a  
> suitable data file for their own tests before adding their own to  
> the t/data dir. It might also help to identify areas where example  
> test data is currently lacking.

...
This seems like more of a 'guess sequence' and format validation  
issue, something we've talked about before:

http://bugzilla.open-bio.org/show_bug.cgi?id=1508

The way I feel about it is sequence format validation and sequence  
parsing should be separate issues and therefore in separate classes  
(with parsing optionally preceded by validation), but that's  
something for another discussion.

> Thinking about this a little more, I think it would be a good idea  
> to include Test::Exception in t/lib. We should also be testing that  
> warnings and exceptions are generated when expected - e.g. illegal  
> characters in seq files etc etc. Without these sorts of tests we  
> are only getting half the story. This testing might account for a  
> large chunk of the poor test coverage, particularly when it comes  
> to branches in the code.
>
> Anyway, this type of reorganisation couldn't take place until the  
> svn repo is up and working.
>
> I'd appreciate any comments on the above!
> Nath

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:08:29 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:08:29 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CF5A8.7040402@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk>
Message-ID: <468CFB5D.6080406@sheffield.ac.uk>

Is there a way to install all the modules that are used in the tests? I 
mean there are cases where tests are skipped and pass if the required 
module for testing is not installed. Therefore, missing out a chunk of 
the tests. It would be desirable to be able to install all these modules 
in order to complete they whole test suite - any ideas if/how this can 
be done?

Cheers
Nath


From bix at sendu.me.uk  Thu Jul  5 10:15:34 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 15:15:34 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <468CFD06.3080604@sendu.me.uk>

Nathan S. Haigh wrote:
> Is there a way to install all the modules that are used in the tests? I 
> mean there are cases where tests are skipped and pass if the required 
> module for testing is not installed. Therefore, missing out a chunk of 
> the tests. It would be desirable to be able to install all these modules 
> in order to complete they whole test suite - any ideas if/how this can 
> be done?

Yes, add them as recommended (or perhaps 'build_requires') modules in 
Build.PL, then run Build.PL and install the modules when it asks you.

Everything should be in Build.PL already. If I missed something, please 
add it.


From cjfields at uiuc.edu  Thu Jul  5 10:18:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:08 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <C3B6AF09-B395-4303-9B50-953C0FAAE8A7@uiuc.edu>


On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote:

> Is there a way to install all the modules that are used in the  
> tests? I
> mean there are cases where tests are skipped and pass if the required
> module for testing is not installed. Therefore, missing out a chunk of
> the tests. It would be desirable to be able to install all these  
> modules
> in order to complete they whole test suite - any ideas if/how this can
> be done?
>
> Cheers
> Nath

That's optionally done upon 'perl Build.PL', correct?  So if you  
choose not to install a particular prereq (i.e. XML::SAX), you  
shouldn't be forced to install it later just for tests.  Or am I  
misunderstanding you?

chris


From cjfields at uiuc.edu  Thu Jul  5 10:18:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:23 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CC220.804@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
Message-ID: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>


On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:

> Nathan S. Haigh wrote:
>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>> ...<snip snips>
>>> At that level, files don't need extensions and can have fully
>>> informative names that explain what's interesting or special about
>>> them.
>>>
>>
>> You may be correct in most cases, however, isn't there a method for
>> detecting the file format from the file extension and failing that it
>> peeks inside the file? Therefore there should be a file extension for
>> each of these to get good code coverage as well as each format not
>> having an extension to check that the peek inside the file correctly
>> determines the format.
>
> Yes, you're quite correct.

I actually like Sendu's idea more, or the idea of each test suite  
having it's own directory.

Tests which need to guess/validate the format are probably best left  
sequestered to a specific suite focused on format guessing/ 
validation, at least in my opinion.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:22:40 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:22:40 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFD06.3080604@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk>
Message-ID: <468CFEB0.80201@sheffield.ac.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Is there a way to install all the modules that are used in the tests? 
>> I mean there are cases where tests are skipped and pass if the 
>> required module for testing is not installed. Therefore, missing out a 
>> chunk of the tests. It would be desirable to be able to install all 
>> these modules in order to complete they whole test suite - any ideas 
>> if/how this can be done?
> 
> Yes, add them as recommended (or perhaps 'build_requires') modules in 
> Build.PL, then run Build.PL and install the modules when it asks you.
> 
> Everything should be in Build.PL already. If I missed something, please 
> add it.
> 

OK, to clarify using the test file Sendu mentioned in a previous post: 
t/SeqIO.t

This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String 
are not installed (the first two are not mentioned in Build.PL). 
However, if there are a lot of such skips in the whole test suite then 
there maybe few system with all these modules installed in order to 
conduct a complete test. These are the modules I'm referring to.

Nath


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:30:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:30:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
Message-ID: <468D006D.6050806@sheffield.ac.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:
> 
>> Nathan S. Haigh wrote:
>>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>>> ...<snip snips>
>>>> At that level, files don't need extensions and can have fully
>>>> informative names that explain what's interesting or special about
>>>> them.
>>>>
>>>
>>> You may be correct in most cases, however, isn't there a method for
>>> detecting the file format from the file extension and failing that it
>>> peeks inside the file? Therefore there should be a file extension for
>>> each of these to get good code coverage as well as each format not
>>> having an extension to check that the peek inside the file correctly
>>> determines the format.
>>
>> Yes, you're quite correct.
> 
> I actually like Sendu's idea more, or the idea of each test suite having 
> it's own directory.
> 
> Tests which need to guess/validate the format are probably best left 
> sequestered to a specific suite focused on format guessing/validation, 
> at least in my opinion.
> 
> chris


How easily would this lend itself to using the same data for multiple 
tests, or is it likely to lead to/exacerbate a culture of adding 
duplicate data files in each "test suite" rather than reusing?

Nath


From cjfields at uiuc.edu  Thu Jul  5 10:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:33:46 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
Message-ID: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>


On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote:

> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:
>
>> Chris Fields wrote:
>>> I think what's partially responsible for slowing down releases is  
>>> the
>>> expectation that each dev release is supposed to have all bugs  
>>> fixed,
>>> work for every OS, etc.  In other words, act like a stable release.
>
> It doesn't. A stable release has a stable API that will be  
> supported until the next stable release through point releases.

I agree, but I think there is still an expectation that 1.5.2 and  
beyond are more like true 'stable' releases even though we still  
designate them as 'developer.'   We unfortunately reinforce that when  
we tell users they need to update to v. 1.5.2 or bioperl-live to fix  
a particular bug in the 1.4 release.

There's nothing we can do about that now (hindsight is always 20/20,  
and 1.4 is just too old).  We (pumpkin, core devs) can try correcting  
that by ensuring any bug fixes be committed to any new stable branch  
as well as to live, at least until it becomes too problematic to  
maintain that particular stable branch (at which point we would go  
about getting ready for the next 'stable' and repeat the cycle over  
again).

>>> A developer release by nature is living on the edge, so why not have
>>> regular dev releases?
>
> There's no problem with regular dev releases, but tests will need  
> to pass. There was never a stipulation that all bugs need to have  
> been fixed. But all tests need to pass, so in an ideal world (in  
> which everything is being tested) all tests passing would imply all  
> (known) bugs fixed. Obviously, we don't live in an ideal world ...

...particularly when it comes to network-related tests and remote  
server problems (but those are by default not run, so there is a way  
around test fails there).  I agree here as well (all tests must  
pass).  As for the bug fixes, we can just stipulate which ones were  
fixed with the release (in a RELEASE_NOTES or similar), and maybe  
have TODO's in the test suite designating they are being worked on.

Basically, at regular intervals, maybe with a few weeks of lead time,  
the pumpkin would announce an impending dev. release.  Go through  
rounds of tests, bug fixes, etc.  When all tests pass post it on CPAN  
as a dev. release.  If we have a stable release branch with relevant  
bug fixes we can post that as well, again to the point where it  
becomes too problematic.

Would we just take a snapshot of MAIN and any relevant stable branch  
at that particular point for the CPAN release, just increasing the  
version number (1.x.y)?  Would it make sense to have a 1.x.y branch  
for each release (I don't think so, but maybe others disagree)?

> If not everything passes then what is the big difference to a code  
> snapshot? If using cvs (or svn) is too difficult for most people,  
> we can consider creating a mechanism that puts up nightly snapshots  
> for download.

If we feel a nightly snapshot is warranted we could do that though.   
I personally don't think there is a need, particularly since we have  
several means to obtain the latest code at any point in time  
(including the browsable CVS 'Download tarball').  We could state the  
next dev/stable CPAN release (pending on date dd/mm/yy) will have the  
bug fix, and if they want it immediately then pick it up from CVS.

>> -- snip --
>>
>> I agree, although would the dev releases still need to pass all the
>> tests? I'm thinking of people installing via CPAN.
>
> For example, that's another point.
>
>  	-hilmar

Yes, I agree.

As an aside, I don't think dev. releases pop up when you run a simple  
'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may  
know the answer to that.

chris 


From cjfields at uiuc.edu  Thu Jul  5 10:34:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:34:22 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>


On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:

>
> One more suggestion:
>
> It would be extemaly useful if we had a standard way of testing  
> that a when a
> file is read into a bioperl object and then written out again into  
> a same
> format, the input and output files are identical. If not, the test  
> should
> show where the the differences start (showing all the differences  
> would just
> clutter the screen).
>
> This standard method/subroutine should be used to test all sequence  
> and other
> text file IO.
>
> Any takers?
>
> 	-Heikki
...

I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
that do some checking, I think, but something like this would be of  
use.  However, what if the test file is old (as many in t/data are)  
and the format has changed?  GenBank and EMBL, for instance, have  
gone through several changes to format.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:43:51 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:43:51 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <468D03A7.3090408@sheffield.ac.uk>

Chris Fields wrote:
-- snip --

>>>
>>> I agree, although would the dev releases still need to pass all the
>>> tests? I'm thinking of people installing via CPAN.
>>
>> For example, that's another point.
>>
>>      -hilmar
> 
> Yes, I agree.
> 
> As an aside, I don't think dev. releases pop up when you run a simple 
> 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know 
> the answer to that.
> 
> chris


Thats right, it'll only install the non-developer releases (1.4 
currently). If you want to install the developer release from CPAN you 
need to know the path the archive and then do:

cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz

as detailed on the wiki:
http://www.bioperl.org/wiki/Release_1.5.2

Nath


From cjfields at uiuc.edu  Thu Jul  5 10:49:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:49:33 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFEB0.80201@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>


On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:

> Sendu Bala wrote:
>> ...
>> Yes, add them as recommended (or perhaps 'build_requires') modules in
>> Build.PL, then run Build.PL and install the modules when it asks you.
>>
>> Everything should be in Build.PL already. If I missed something,  
>> please
>> add it.
>>
>
> OK, to clarify using the test file Sendu mentioned in a previous post:
> t/SeqIO.t
>
> This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> IO::String
> are not installed (the first two are not mentioned in Build.PL).
> However, if there are a lot of such skips in the whole test suite then
> there maybe few system with all these modules installed in order to
> conduct a complete test. These are the modules I'm referring to.
>
> Nath

If they are only necessary for tests, work for all OSs, and are pure  
Perl they should be added to t/lib, like Test::More and the rest.  If  
they only work for some OSs they could be added to t/lib and skip  
based on OS, but they still must be pure Perl.  I would avoid  
anything that requires any compiling for XS or Inline altogether (I  
don't want to go down the nightmare road of OS-dependent compiler  
issues for a few tests).

Finally, if they are needed for core modules (not just tests) then  
they should be added to the core prereqs in Build.

chris


From cjfields at uiuc.edu  Thu Jul  5 10:52:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:52:58 -0500
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CEC72.4090909@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>


On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:

> ...
>
> So its my understanding there will be absolutely no difference in
> behaviour following this change (except that warning can be caught by
> Test::Warn). I just wanted to confirm my understanding.

You can always just try it out and run tests.  Might be interesting  
to see if anything breaks.

chris


From N.Haigh at sheffield.ac.uk  Thu Jul  5 10:58:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 15:58:30 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
> 
> >
> > One more suggestion:
> >
> > It would be extemaly useful if we had a standard way of testing  
> > that a when a
> > file is read into a bioperl object and then written out again into  
> > a same
> > format, the input and output files are identical. If not, the test  
> > should
> > show where the the differences start (showing all the differences  
> > would just
> > clutter the screen).
> >
> > This standard method/subroutine should be used to test all sequence  
> > and other
> > text file IO.
> >
> > Any takers?
> >
> > 	-Heikki
> ...
> 
> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
> that do some checking, I think, but something like this would be of  
> use.  However, what if the test file is old (as many in t/data are)  
> and the format has changed?  GenBank and EMBL, for instance, have  
> gone through several changes to format.
> 
> chris
> 
> 

Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes?

Nath


From N.Haigh at sheffield.ac.uk  Thu Jul  5 11:04:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 16:04:30 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:
> 
> > Sendu Bala wrote:
> >> ...
> >> Yes, add them as recommended (or perhaps 'build_requires') modules in
> >> Build.PL, then run Build.PL and install the modules when it asks you.
> >>
> >> Everything should be in Build.PL already. If I missed something,  
> >> please
> >> add it.
> >>
> >
> > OK, to clarify using the test file Sendu mentioned in a previous post:
> > t/SeqIO.t
> >
> > This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> > IO::String
> > are not installed (the first two are not mentioned in Build.PL).
> > However, if there are a lot of such skips in the whole test suite then
> > there maybe few system with all these modules installed in order to
> > conduct a complete test. These are the modules I'm referring to.
> >
> > Nath
> 
> If they are only necessary for tests, work for all OSs, and are pure  
> Perl they should be added to t/lib, like Test::More and the rest.  If  
> they only work for some OSs they could be added to t/lib and skip  
> based on OS, but they still must be pure Perl.  I would avoid  
> anything that requires any compiling for XS or Inline altogether (I  
> don't want to go down the nightmare road of OS-dependent compiler  
> issues for a few tests).

If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!?

> 
> Finally, if they are needed for core modules (not just tests) then  
> they should be added to the core prereqs in Build.
> 
> chris
> 


From bix at sendu.me.uk  Thu Jul  5 11:13:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:13:35 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <468D0A9F.4010709@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Chris Fields <cjfields at uiuc.edu>:
>>> OK, to clarify using the test file Sendu mentioned in a previous
>>> post: t/SeqIO.t
>>> 
>>> This test skips tests if Algorithm::Diff, IO::ScalarArray or 
>>> IO::String are not installed
>> 
>> If they are only necessary for tests, work for all OSs, and are
>> pure Perl they should be added to t/lib, like Test::More and the
>> rest.  If they only work for some OSs they could be added to t/lib
>> and skip based on OS, but they still must be pure Perl.  I would
>> avoid anything that requires any compiling for XS or Inline
>> altogether (I don't want to go down the nightmare road of
>> OS-dependent compiler issues for a few tests).
> 
> If this is the case, there surely is no need to skip the tests if
> they should be provided in the t/lib dir. Am I missing something!?

That skip in SeqIO.t is new and I simply didn't think of them as 
important enough to make anyone install them or include them in t/lib.

I'd go ahead and add those modules, but like I say, it may make more 
sense just to use is_deeply(), removing the dependency on 
Algorithm::Diff and IO::ScalarArray completely.


From cjfields at uiuc.edu  Thu Jul  5 11:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:35:41 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <F97172F8-F59A-4CCD-9BBD-B763675EB92F@uiuc.edu>


On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote:

> ...
>> If they are only necessary for tests, work for all OSs, and are pure
>> Perl they should be added to t/lib, like Test::More and the rest.  If
>> they only work for some OSs they could be added to t/lib and skip
>> based on OS, but they still must be pure Perl.  I would avoid
>> anything that requires any compiling for XS or Inline altogether (I
>> don't want to go down the nightmare road of OS-dependent compiler
>> issues for a few tests).
>
> If this is the case, there surely is no need to skip the tests if  
> they should be provided in the t/lib dir. Am I missing something!?

No, you are correct, but these are currently not in t/lib (unless  
someone snuck them in....)

Of the modules you listed above, only one (IO::String) is required by  
the core modules.  The others are not.  Users shouldn't be forced to  
install Algorithm::Diff or IO::ScalarArray just to run tests, so  
anything not required should go into t/lib if at all possible.

If there any reasons (OS issues, list of prereqs) which preclude  
adding these to t/lib we need to ask ourselves (1) why we are using  
that module in the first place?  And, if there is a good reason, (2)  
can we skip them if they aren't present?  Both of those options are  
already available.

chris


From cjfields at uiuc.edu  Thu Jul  5 11:50:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:50:55 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468D006D.6050806@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
	<468D006D.6050806@sheffield.ac.uk>
Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu>


On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote:

> ...
>> I actually like Sendu's idea more, or the idea of each test suite  
>> having it's own directory.
>> Tests which need to guess/validate the format are probably best  
>> left sequestered to a specific suite focused on format guessing/ 
>> validation, at least in my opinion.
>> chris
>
>
> How easily would this lend itself to using the same data for  
> multiple tests, or is it likely to lead to/exacerbate a culture of  
> adding duplicate data files in each "test suite" rather than reusing?
>
> Nath

If there is a group of test data used for more than one test suite we  
can group those together into a common use folder, or we can go by  
format.  I'm pretty open to anything, really, as long as it is more  
organized.

My point is really concerned more with validation/guessing.  I think  
we should limit those tests to their respective specific test suites,  
or even to sections within a particular test suite (for instance,  
genbank.t), but not to force sequence guessing or validation in other  
cases.  To me validation, guessing, and parsing are three distinct  
issues (much like XML parsers handle things), so they require three  
distinct tests.

As for true sequence validation, there is no official format  
validation scheme yet in BioPerl.  It's sort of unofficially  
intergrated into the sequence parsers themselves (something which I  
find to be problematic for several reasons too long to outline here).

chris


From cjfields at uiuc.edu  Thu Jul  5 11:54:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:54:42 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
	<1183647510.468d07168963c@webmail.shef.ac.uk>
Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu>


On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:

> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> 	-Heikki
>> ...
>>
>> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use.  However, what if the test file is old (as many in t/data are)
>> and the format has changed?  GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just  
> layout? e.g. a version number of the likes?
>
> Nath

I don't think so; this veers back into the whole validation issue  
(i.e. does the record fit certain specifications).  There are  
examples of seq records from different sources which bioperl is  
expected to parse, for example Ensembl GenBank records.  Some of  
those have feature tags or annotation fields which may not appear in  
output when using write_seq().

I don't think it's as important to replicate the output data exactly  
like the input as much as it's important to have the data represented  
in a Bio::Seq object (or any other Bio* instance) in a consistent  
manner and have the ability to incorporate new fields (such as the  
recent addition of genome projects) transparently.  The latter is  
hard to do with the current genbank parser (you have to specifically  
code for it), but it is a bit easier to do with the driver-handler  
model I'm working on.

chris


From bix at sendu.me.uk  Thu Jul  5 11:56:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:56:29 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <468D14AD.8050007@sendu.me.uk>

Sendu Bala wrote:
> Sendu Bala wrote:
>> Nathan S. Haigh wrote:
>>> Thinking about this a little more, I think it would be a good idea to 
>>> include Test::Exception in t/lib.
>> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
> 
> I've now done that: BioperlTest loads Test::Exception, from the copy in 
> t/lib if necessary.
> 
> So, in BioperlTest-using scripts you now have access to the methods 
> dies_ok, lives_ok, throws_ok and lives_and.

And I've also now added in support for Test::Warn, giving you 
warning_is, warnings_are, warning_like and warnings_like.

I've updated the HOWTO as well:
http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

You can see these things in action in t/seq_quality.t


From bix at sendu.me.uk  Thu Jul  5 11:57:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:57:23 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
	<2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
Message-ID: <468D14E3.6030104@sendu.me.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:
> 
>> ...
>>
>> So its my understanding there will be absolutely no difference in
>> behaviour following this change (except that warning can be caught by
>> Test::Warn). I just wanted to confirm my understanding.
> 
> You can always just try it out and run tests.  Might be interesting to 
> see if anything breaks.

I've made the change. Everything seems ok as far as I can tell.


From dmessina at wustl.edu  Thu Jul  5 12:02:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:02:26 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>


On Jul 5, 2007, at 9:33 AM, Chris Fields wrote:
> I agree, but I think there is still an expectation that 1.5.2 and
> beyond are more like true 'stable' releases even though we still
> designate them as 'developer.'   We unfortunately reinforce that when
> we tell users they need to update to v. 1.5.2 or bioperl-live to fix
> a particular bug in the 1.4 release.

I know this has been discussed before, but while we're talking about  
future release plans, it might be worth revisiting the BioPerl policy  
of designating only even-numbered releases as 'stable'. It's taking  
so long to get from 1.4 to 1.6. While the principle of keeping a  
stable API between 'stable' releases is valid in the ideal case, I  
think that continuing to label 1.5.2 (or whatever the latest 'dev'  
release is) as a developer release (which implies potentially  
unstable or bleeding-edge code) is highly misleading since we would  
never ever tell anyone to get 1.4 instead.

Alternatively, if we adopt a more aggressive release schedule as  
Chris proposed a couple days ago, then perhaps we could agree to push  
out an even-numbered release once a year or so, so that there is a  
'stable' release we could recommend.


> If we feel a nightly snapshot is warranted we could do that though.
> I personally don't think there is a need, particularly since we have
> several means to obtain the latest code at any point in time
> (including the browsable CVS 'Download tarball').  We could state the
> next dev/stable CPAN release (pending on date dd/mm/yy) will have the
> bug fix, and if they want it immediately then pick it up from CVS.

To make it easier for people to obtain the latest tarball, we could  
put the 'download tarball' link directly on the 'Getting_BioPerl'  
wiki page instead of only a link to the viewcvs interface. That way  
they wouldn't have to navigate the source tree to figure out which  
tarball they want (which is almost always going to be the bioperl- 
live tarball).

I think the actual URL underlying the 'Download tarball' link on  
viewcvs is stable:

	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
live.tar.gz?tarball=1


Dave


From cjfields at uiuc.edu  Thu Jul  5 12:13:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:13:30 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
Message-ID: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>


On Jul 5, 2007, at 11:02 AM, David Messina wrote:

> ...
> I know this has been discussed before, but while we're talking  
> about future release plans, it might be worth revisiting the  
> BioPerl policy of designating only even-numbered releases as  
> 'stable'. It's taking so long to get from 1.4 to 1.6. While the  
> principle of keeping a stable API between 'stable' releases is  
> valid in the ideal case, I think that continuing to label 1.5.2 (or  
> whatever the latest 'dev' release is) as a developer release (which  
> implies potentially unstable or bleeding-edge code) is highly  
> misleading since we would never ever tell anyone to get 1.4 instead.
>
> Alternatively, if we adopt a more aggressive release schedule as  
> Chris proposed a couple days ago, then perhaps we could agree to  
> push out an even-numbered release once a year or so, so that there  
> is a 'stable' release we could recommend.

I think the idea of 'stable' is best summarized back in Hilmar's post  
(i.e. we support a particular API for that release).  The 1.5  
releases I believe break some aspects of 1.4 API (some of the Feature/ 
Annotation stuff introduced before the official 1.5 release).  We  
still need to address some of those issues before a 1.6 which seems  
to be the only real stumbling block, but they are unfortunately not  
well-documented and are somewhat interwoven with GMOD code.

> ...
> To make it easier for people to obtain the latest tarball, we could  
> put the 'download tarball' link directly on the 'Getting_BioPerl'  
> wiki page instead of only a link to the viewcvs interface. That way  
> they wouldn't have to navigate the source tree to figure out which  
> tarball they want (which is almost always going to be the bioperl- 
> live tarball).
>
> I think the actual URL underlying the 'Download tarball' link on  
> viewcvs is stable:
>
> 	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
> live.tar.gz?tarball=1
>
> Dave

Sounds reasonable enough.  Do you want to do the honors?

chris


From dmessina at wustl.edu  Thu Jul  5 12:44:28 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:44:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>


> [Chris]
> The 1.5 releases I believe break some aspects of 1.4 API

Yes, this is true.

I question, though, whether it's relevant given that virtually no one  
uses 1.4 anymore. In any case, I would venture that the number of  
people who would be bitten by the 1.4->1.5 API change is much smaller  
than the number of people who download 1.4 and then ask us why it  
doesn't work.

I think that, rather than continuing to call 1.5.x the developer  
release in order to adhere to the API guarantee, it would be much  
clearer to users if we state clearly that everyone should download  
1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
changes.


>> [me]
>> we could put the 'download tarball' link directly on the  
>> 'Getting_BioPerl' wiki page
>
> [Chris]
> Sounds reasonable enough.  Do you want to do the honors?

Done.


Dave


From cjfields at uiuc.edu  Thu Jul  5 12:57:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:57:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>

On Jul 5, 2007, at 11:44 AM, David Messina wrote:

>
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no  
> one uses 1.4 anymore. In any case, I would venture that the number  
> of people who would be bitten by the 1.4->1.5 API change is much  
> smaller than the number of people who download 1.4 and then ask us  
> why it doesn't work.
>
> I think that, rather than continuing to call 1.5.x the developer  
> release in order to adhere to the API guarantee, it would be much  
> clearer to users if we state clearly that everyone should download  
> 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
> changes.

You'd be surprised how many are still using bioperl 1.2.3 (Ensembl)  
and 1.4 (any admin too scared to go with a 'dev' release).  The real  
answer is to get out a stable 1.6 ASAP.  The problem we currently  
have is (horrible Texas pun) 'too many pokers in the fire.'  We have  
svn migration, major changes in the test suite, talk about splitting  
bioperl, a lot of bugs to sort through, new code to add or work on,  
etc.  Not to mention our $jobs!

I think we should just bite the bullet and proceed with pulling out  
the controversial operator overloading in Bio::Annotation*, deprecate  
the tag methods in AnnotatableI, and go about fixing everything up.   
If that occurs (which seems to be the major impediment) and we get  
GMOD/GBrowse playing well with BioPerl then we can aim for a new  
stable release, and then institute a regular release cycle.

chris


From bpederse at gmail.com  Thu Jul  5 13:58:24 2007
From: bpederse at gmail.com (Brent Pedersen)
Date: Thu, 5 Jul 2007 10:58:24 -0700
Subject: [Bioperl-l] slippy map for genomic features.
Message-ID: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>

hi,
here's a side project i've been tinkering on in googlecode svn that
may be useful to some.
http://code.google.com/p/genome-browser/
it's a simple hack on top of OpenLayers (openlayers.org) to provide a
javascript slippy map interface and API to view and browse genomic
features. It can be used with any image generation program that can
accept &xmin= and &xmax= parameters through the url. -- though i
havent had it working it bioperl as bioperl generates images of
different height depending on the number of tracks.

there's a live example of the code in SVN here:
http://toxic.berkeley.edu/bpederse/genome-browser/
with images generated by a colleague's modules on first request. those
images are then cached by a simple perl script included in the SVN
repo. all subsequent requests are returned from the cache.
an image request (automatically generated by the javascript) looks like:
http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
but any implementation need only implement xmin and xmax. all other
parameters will be used for caching but are not required.

if anyone is interested in getting this going with bioperl image
generation--or improving the project in any way, let me know and i'll
add you as a committer and provide any javascript support that i can.

-brent

tar ball download:
http://genome-browser.googlecode.com/files/genome-browser-0.02.tar


From dmessina at wustl.edu  Thu Jul  5 14:39:16 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 13:39:16 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <DD6F2CE5-FE79-48D2-9410-FACA35AFEF9C@wustl.edu>

> The real answer is to get out a stable 1.6 ASAP.  The problem we  
> currently have is (horrible Texas pun) 'too many pokers in the  
> fire.'  We have svn migration, major changes in the test suite,  
> talk about splitting bioperl, a lot of bugs to sort through, new  
> code to add or work on, etc.  Not to mention our $jobs!

Yep, I hear ya.


> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*,  
> deprecate the tag methods in AnnotatableI, and go about fixing  
> everything up.  If that occurs (which seems to be the major  
> impediment) and we get GMOD/GBrowse playing well with BioPerl then  
> we can aim for a new stable release, and then institute a regular  
> release cycle.

That's a great plan. You're right -- better to devote energy to 1.6  
than to interim solutions.

Alright, I give, I give! :)
Dave


From glauberwagner at yahoo.com.br  Thu Jul  5 15:56:43 2007
From: glauberwagner at yahoo.com.br (Glauber Wagner)
Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART)
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com>

Dear All,

I have a problem if Bio::DB::Query::GenBank module. I
am trying to count the number of protein sequences and
the module did not return the expected number by count
object.

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

$query_string = "Trypanosoma cruzi[Organism]";

  my $query =
Bio::DB::Query::GenBank->new(-db=>'protein',
                                           
-query=>$query_string);
   my $count = $query->count;
   my @ids   = $query->ids;

print "$count\n";

Thanks.
Glauber


____________________________________________________________________________________
Novo Yahoo! Cad?? - Experimente uma nova busca.
http://yahoo.com.br/oqueeuganhocomisso 


From cjfields at uiuc.edu  Thu Jul  5 16:21:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 15:21:49 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>

NCBI esearch doesn't seem to be working at the moment.  I'm getting  
'Internal Server Error' at this time.  Try back again at a later point.

chris

On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:

> Dear All,
>
> I have a problem if Bio::DB::Query::GenBank module. I
> am trying to count the number of protein sequences and
> the module did not return the expected number by count
> object.
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query_string = "Trypanosoma cruzi[Organism]";
>
>   my $query =
> Bio::DB::Query::GenBank->new(-db=>'protein',
>
> -query=>$query_string);
>    my $count = $query->count;
>    my @ids   = $query->ids;
>
> print "$count\n";
>
> Thanks.
> Glauber
>
>
>
>
> ______________________________________________________________________ 
> ______________
> Novo Yahoo! Cad?? - Experimente uma nova busca.
> http://yahoo.com.br/oqueeuganhocomisso
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mitch_skinner at berkeley.edu  Thu Jul  5 17:22:38 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 05 Jul 2007 14:22:38 -0700
Subject: [Bioperl-l] slippy map for genomic features.
In-Reply-To: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
References: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
Message-ID: <468D611E.7020904@berkeley.edu>

Hi,

FWIW, we've been working on something similar:
http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html
based on GBrowse/Bio::Graphics and javascript that Andrew wrote from 
scratch (with the prototype library).  When our project was starting up 
(fall 05) Andrew looked but didn't find openlayers; I'm not sure if it 
was public back then but their current svn only goes back to 2006.

I think that things like layout (bumping) ought to be done in advance on 
a chromosome-wide basis; otherwise it's difficult to keep features from 
ending up at different heights on neighboring tiles.  And it would be 
difficult for the server to know what was being clicked on.  So we've 
been doing some up-front work to either do layout or to just render all 
the tiles in advance:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup
which is driven by this script:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup

Or you could just not bump at all, I guess.  I think of that as 
important functionality but I'd be interested in hearing about use cases 
where it's not necessary.  It's not just bumping, though; things like 
text labels also make it difficult to predict exactly what pixels a 
feature will span if you only have its genomic coordinates.

To make features clickable we've been using imagemaps; it simplifies the 
server code but it bogs down the client quite a bit.

I'd certainly be interested in seeing if there are ways we could work 
together; if you're at Berkeley maybe we could meet.

Regards,
Mitch

Brent Pedersen wrote:
> hi,
> here's a side project i've been tinkering on in googlecode svn that
> may be useful to some.
> http://code.google.com/p/genome-browser/
> it's a simple hack on top of OpenLayers (openlayers.org) to provide a
> javascript slippy map interface and API to view and browse genomic
> features. It can be used with any image generation program that can
> accept &xmin= and &xmax= parameters through the url. -- though i
> havent had it working it bioperl as bioperl generates images of
> different height depending on the number of tracks.
>
> there's a live example of the code in SVN here:
> http://toxic.berkeley.edu/bpederse/genome-browser/
> with images generated by a colleague's modules on first request. those
> images are then cached by a simple perl script included in the SVN
> repo. all subsequent requests are returned from the cache.
> an image request (automatically generated by the javascript) looks like:
> http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
> but any implementation need only implement xmin and xmax. all other
> parameters will be used for caching but are not required.
>
> if anyone is interested in getting this going with bioperl image
> generation--or improving the project in any way, let me know and i'll
> add you as a committer and provide any javascript support that i can.
>
> -brent
>
> tar ball download:
> http://genome-browser.googlecode.com/files/genome-browser-0.02.tar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From cjfields at uiuc.edu  Thu Jul  5 17:42:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 16:42:40 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
	<190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu>

Update: seems to be back up.  Give it a try now.

chris

On Jul 5, 2007, at 3:21 PM, Chris Fields wrote:

> NCBI esearch doesn't seem to be working at the moment.  I'm getting
> 'Internal Server Error' at this time.  Try back again at a later  
> point.
>
> chris
>
> On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:
>
>> Dear All,
>>
>> I have a problem if Bio::DB::Query::GenBank module. I
>> am trying to count the number of protein sequences and
>> the module did not return the expected number by count
>> object.
>>
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>>
>> $query_string = "Trypanosoma cruzi[Organism]";
>>
>>   my $query =
>> Bio::DB::Query::GenBank->new(-db=>'protein',
>>
>> -query=>$query_string);
>>    my $count = $query->count;
>>    my @ids   = $query->ids;
>>
>> print "$count\n";
>>
>> Thanks.
>> Glauber
>>
>>
>>
>>
>> _____________________________________________________________________ 
>> _
>> ______________
>> Novo Yahoo! Cad?? - Experimente uma nova busca.
>> http://yahoo.com.br/oqueeuganhocomisso
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Jul  6 03:09:17 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 08:09:17 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <468DEA9D.6010809@sheffield.ac.uk>

David Messina wrote:
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>>     
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no one  
> uses 1.4 anymore. In any case, I would venture that the number of  
> people who would be bitten by the 1.4->1.5 API change is much smaller  
> than the number of people who download 1.4 and then ask us why it  
> doesn't work.
>   

I'm not really up-to-speed with how the API should remain stable etc. Is 
the idea that the API should be stable from 1.4 though the 1.5 dev and 
then the next stale release can change that API? So any stable to stable 
upgrade could involve an API change while a stable to dev upgrade should 
have the same API? Does a stable API mean that the same method calls are 
available in a newer release....what about adding new methods to a newer 
release?

How are these API changes currently tracked? It seems to me that 
Test::More might be able to help in testing the API:

can_ok($module, @methods);


Nath


From n.haigh at sheffield.ac.uk  Fri Jul  6 07:10:14 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 12:10:14 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
Message-ID: <468E2316.1030804@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm taking a look at the tests for Bio::Variation::RNAChange.

If you create a new oject without arguments:
my $obj = Bio::Variation::RNAChange->new();

What do you expect the following to return:
$obj->label();

I thought it would probably be:
'inframe'

However you get:
'inframe, deletion'

Can anyone in the know explain what behaviour would be expected?

Cheers
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit
B8DxDViDOcx2gTFjSwQ2kNg=
=SroY
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 08:54:33 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 13:54:33 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E2316.1030804@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
Message-ID: <468E3B89.3090202@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan S. Haigh wrote:
> I'm taking a look at the tests for Bio::Variation::RNAChange.
> 
> If you create a new oject without arguments:
> my $obj = Bio::Variation::RNAChange->new();
> 
> What do you expect the following to return:
> $obj->label();
> 
> I thought it would probably be:
> 'inframe'
> 
> However you get:
> 'inframe, deletion'
> 
> Can anyone in the know explain what behaviour would be expected?
> 
> Cheers
> Nath

Following on from this, AAChange has the following two methods:
add_Allele() and allele_mut()

It appears that allele_mut is only capable of remembering 1 allele at a
time, whereas add_Allele() is provided to add support for mutliple
alleles - is that correct?

However, add_Allele() also calls allele_mut(), such that mutliple calls
to add_Allele will result in the overwriting of the allele being
remembered by allele_mut(). Things are further complicated by the fact
that label() uses allele_mut() to decide on the label to return.
Shouldn't label know aout multiple alleles set by multiple calls to
add_Allele?

It may be my lack of understanding alleles and what these classes are
intending to do, but trying to rewrite the test scripts to improve code
coverage has let me a little confused!

Thanks
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I
b8ZOENvDDDIxphAoxeKg8/E=
=f/sa
-----END PGP SIGNATURE-----


From tanzeem.mb at gmail.com  Thu Jul  5 02:39:34 2007
From: tanzeem.mb at gmail.com (tanzeem)
Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT)
Subject: [Bioperl-l] Problem working with remoteblast submit method in
 webbrowser.
In-Reply-To: <11114623.post@talk.nabble.com>
References: <11114623.post@talk.nabble.com>
Message-ID: <11441586.post@talk.nabble.com>


Ifound it myself.run apache as root and disable selinux, the problem will not
recur.

tanzeem wrote:
> 
>  I have a program which uses the Bio perl remoteblast module which
> compares a aminoacid  fasta file with swissprot database. The
> submit_blast() method  works successfully when   run  from commandline.But
> when the program is run from web browser it returns -1. I was trying to
> adapt the code from Remoteblast synopsis for my need.
> 

-- 
View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Fri Jul  6 09:00:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 06 Jul 2007 09:00:32 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <1183726832.2566.34.camel@localhost.localdomain>

On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
> 
> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*, deprecate  
> the tag methods in AnnotatableI, and go about fixing everything up.   
> If that occurs (which seems to be the major impediment) and we get  
> GMOD/GBrowse playing well with BioPerl then we can aim for a new  
> stable release, and then institute a regular release cycle.
> 
I think this sounds like a good idea to me too.  I'm planning on having
a GMOD hackathon at the end of the summer; if I had a new API by then,
we could focus on fixing anything that gets broken by the changes.

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070706/d77c2d90/attachment-0002.bin>

From cjfields at uiuc.edu  Fri Jul  6 09:10:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 6 Jul 2007 08:10:41 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
Message-ID: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>


On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:

> David Messina wrote:
>>> [Chris]
>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>
>>
>> Yes, this is true.
>>
>> I question, though, whether it's relevant given that virtually no one
>> uses 1.4 anymore. In any case, I would venture that the number of
>> people who would be bitten by the 1.4->1.5 API change is much smaller
>> than the number of people who download 1.4 and then ask us why it
>> doesn't work.
>>
>
> I'm not really up-to-speed with how the API should remain stable  
> etc. Is
> the idea that the API should be stable from 1.4 though the 1.5 dev and
> then the next stale release can change that API? So any stable to  
> stable
> upgrade could involve an API change while a stable to dev upgrade  
> should
> have the same API? Does a stable API mean that the same method  
> calls are
> available in a newer release....what about adding new methods to a  
> newer
> release?
>
> How are these API changes currently tracked? It seems to me that
> Test::More might be able to help in testing the API:
>
> can_ok($module, @methods);
>
>
> Nath	

It's basically a 'contract' of sorts between the devs (us) and users  
(us/them) that the API won't change for the extent of that release  
series, thus ensuring any scripts out there generating tons of data  
won't break down if they attempt to call a renamed method.  We try to  
maintain the API state anyway for those reasons, but in a dev release  
series we might decide to change some method names for consistency  
and deprecate older ambiguously-named methods (see below).  For a  
stable release it's critical the API remain intact.

There are a few methods which are considered deprecated or will be  
deprecated.  For instance, we recently talked about changes to method  
names which use case to specify whether you're receiving an object  
(get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
nested list, or whether to use each_* vs next_* for iterators.   
Consistency is nice!

chris 


From heikki at sanbi.ac.za  Fri Jul  6 09:20:26 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 6 Jul 2007 15:20:26 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E3B89.3090202@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
Message-ID: <200707061520.27000.heikki@sanbi.ac.za>

Hi Nat,

These modules have not been touched for a while and were developed for a 
specific task. A revire is defiitely in order.

The way RNAChange->label was written, it should return 'inframe' when given no 
alleles, but 'no change' would actually be better.

The multiple alleles were originally though to be a good idea, but the 
vocabulary for labels was developed for single allele, only, The use of the 
module ended up being limited to single allele, so add_allele() behaviour was  
conveniently ignored but not removed. :(

	-Heikki


On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> Nathan S. Haigh wrote:
> > I'm taking a look at the tests for Bio::Variation::RNAChange.
> >
> > If you create a new oject without arguments:
> > my $obj = Bio::Variation::RNAChange->new();
> >
> > What do you expect the following to return:
> > $obj->label();
> >
> > I thought it would probably be:
> > 'inframe'
> >
> > However you get:
> > 'inframe, deletion'
> >
> > Can anyone in the know explain what behaviour would be expected?
> >
> > Cheers
> > Nath
>
> Following on from this, AAChange has the following two methods:
> add_Allele() and allele_mut()
>
> It appears that allele_mut is only capable of remembering 1 allele at a
> time, whereas add_Allele() is provided to add support for mutliple
> alleles - is that correct?
>
> However, add_Allele() also calls allele_mut(), such that mutliple calls
> to add_Allele will result in the overwriting of the allele being
> remembered by allele_mut(). Things are further complicated by the fact
> that label() uses allele_mut() to decide on the label to return.
> Shouldn't label know aout multiple alleles set by multiple calls to
> add_Allele?
>
> It may be my lack of understanding alleles and what these classes are
> intending to do, but trying to rewrite the test scripts to improve code
> coverage has let me a little confused!
>
> Thanks
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From schlesi at ebi.ac.uk  Fri Jul  6 10:24:05 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Fri, 6 Jul 2007 15:24:05 +0100
Subject: [Bioperl-l] Unrooting a tree
Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>

Hi,

I am reading a rooted tree in newick format from a string (i.e. a
bifurcation at the root) and would like to unroot it (i.e. a
trifurcation at the root). I tried getting a grandchild of the root
and adding it as a direct child, but that does not seem to work (the
root still only has two descendents and the tree structure gets messed
up). Is there a nice way to do this directly in bioperl? Doing it on
the newick string is possible of course, but not nice.

Thanks
  Felix


From n.haigh at sheffield.ac.uk  Fri Jul  6 11:37:19 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:37:19 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
Message-ID: <468E61AF.9040106@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:
> 
> On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:
> 
>> David Messina wrote:
>>>> [Chris]
>>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>>
>>>
>>> Yes, this is true.
>>>
>>> I question, though, whether it's relevant given that virtually no one
>>> uses 1.4 anymore. In any case, I would venture that the number of
>>> people who would be bitten by the 1.4->1.5 API change is much smaller
>>> than the number of people who download 1.4 and then ask us why it
>>> doesn't work.
>>>
>>
>> I'm not really up-to-speed with how the API should remain stable etc. Is
>> the idea that the API should be stable from 1.4 though the 1.5 dev and
>> then the next stale release can change that API? So any stable to stable
>> upgrade could involve an API change while a stable to dev upgrade should
>> have the same API? Does a stable API mean that the same method calls are
>> available in a newer release....what about adding new methods to a newer
>> release?
>>
>> How are these API changes currently tracked? It seems to me that
>> Test::More might be able to help in testing the API:
>>
>> can_ok($module, @methods);
>>
>>
>> Nath   
> 
> It's basically a 'contract' of sorts between the devs (us) and users
> (us/them) that the API won't change for the extent of that release
> series, thus ensuring any scripts out there generating tons of data
> won't break down if they attempt to call a renamed method.  We try to
> maintain the API state anyway for those reasons, but in a dev release
> series we might decide to change some method names for consistency and
> deprecate older ambiguously-named methods (see below).  For a stable
> release it's critical the API remain intact.

Hmm, still not 100% clear - it is Friday!

So, someone running a script that was designed when 1.4 was released
should still be able to run their script for all future releases. So all
changes need to be backward compatible?

So you have several situations regarding method names:
1) Adding new methods should e fine since past scripts don't know about
them and won't have used them
2) Removing methods would break past scripts that used them
3) Renamed methods would break past scripts that used the old name

A stable API to me, means the same method calls should still be able to
accept the same arguments (inc the constructor) and return the same
object/data etc.

What if a module is pretty outdated and would benefit from a rewrite -
should all the old method names be included, what if this makes coding
difficult?

> 
> There are a few methods which are considered deprecated or will be
> deprecated.  For instance, we recently talked about changes to method
> names which use case to specify whether you're receiving an object
> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested
> list, or whether to use each_* vs next_* for iterators.  Consistency is
> nice!
> 

You mean the use of case to signify objects vs data being returned are
to be deprecated or encouraged? What was the outcome of the each_* vs
next_*?

Nath


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk
kAWH1zVa1ycopijl761cvkQ=
=fppH
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 11:43:41 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:43:41 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
Message-ID: <468E632D.4090801@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heikki Lehvaslaiho wrote:
> Hi Nat,
> 
> These modules have not been touched for a while and were developed for a 
> specific task. A revire is defiitely in order.
> 
> The way RNAChange->label was written, it should return 'inframe' when given no 
> alleles, but 'no change' would actually be better.

Wouldn't this effectively be changing the API since past scripts "could"
expect "inframe" to be returned.

> 
> The multiple alleles were originally though to be a good idea, but the 
> vocabulary for labels was developed for single allele, only, The use of the 
> module ended up being limited to single allele, so add_allele() behaviour was  
> conveniently ignored but not removed. :(

So add_Allele() and each_Allele() should be deprecated in favour of
allele_mut()?

- From my post about API's.....how should the capitalisation of
add_Allele() and each_Allele() be changed?

Cheers
Nath


> 
> 	-Heikki
> 
> 
> 
> On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
>> Nathan S. Haigh wrote:
>>> I'm taking a look at the tests for Bio::Variation::RNAChange.
>>>
>>> If you create a new oject without arguments:
>>> my $obj = Bio::Variation::RNAChange->new();
>>>
>>> What do you expect the following to return:
>>> $obj->label();
>>>
>>> I thought it would probably be:
>>> 'inframe'
>>>
>>> However you get:
>>> 'inframe, deletion'
>>>
>>> Can anyone in the know explain what behaviour would be expected?
>>>
>>> Cheers
>>> Nath
>> Following on from this, AAChange has the following two methods:
>> add_Allele() and allele_mut()
>>
>> It appears that allele_mut is only capable of remembering 1 allele at a
>> time, whereas add_Allele() is provided to add support for mutliple
>> alleles - is that correct?
>>
>> However, add_Allele() also calls allele_mut(), such that mutliple calls
>> to add_Allele will result in the overwriting of the allele being
>> remembered by allele_mut(). Things are further complicated by the fact
>> that label() uses allele_mut() to decide on the label to return.
>> Shouldn't label know aout multiple alleles set by multiple calls to
>> add_Allele?
>>
>> It may be my lack of understanding alleles and what these classes are
>> intending to do, but trying to rewrite the test scripts to improve code
>> coverage has let me a little confused!
>>
>> Thanks
>> Nath
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue
GBHuSHfsesX1ko55s+ME2Zc=
=tkG8
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Sat Jul  7 16:57:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 15:57:37 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
	<1183726832.2566.34.camel@localhost.localdomain>
Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu>

We'll prob. get a start soon, then.  I'll let you know when we start.

chris

On Jul 6, 2007, at 8:00 AM, Scott Cain wrote:

> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
>>
>> I think we should just bite the bullet and proceed with pulling out
>> the controversial operator overloading in Bio::Annotation*, deprecate
>> the tag methods in AnnotatableI, and go about fixing everything up.
>> If that occurs (which seems to be the major impediment) and we get
>> GMOD/GBrowse playing well with BioPerl then we can aim for a new
>> stable release, and then institute a regular release cycle.
>>
> I think this sounds like a good idea to me too.  I'm planning on  
> having
> a GMOD hackathon at the end of the summer; if I had a new API by then,
> we could focus on fixing anything that gets broken by the changes.
>
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Jul  7 17:17:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 16:17:14 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468E61AF.9040106@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
	<468E61AF.9040106@sheffield.ac.uk>
Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu>


On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote:

> ...
> Hmm, still not 100% clear - it is Friday!
>
> So, someone running a script that was designed when 1.4 was released
> should still be able to run their script for all future releases.  
> So all
> changes need to be backward compatible?

It helps.  For instance, if we change method names (rename each_Foo  
as next_Foo), we should have each_Foo delegate to next_Foo for the  
time being.  If we plan on deprecating the old method altogether we  
would add a warning message when it's called, then delegate.

It's a better solution than just changing the method outright, which  
means the user has to search through docs to find the renamed method.

> So you have several situations regarding method names:
> 1) Adding new methods should e fine since past scripts don't know  
> about
> them and won't have used them
> 2) Removing methods would break past scripts that used them
> 3) Renamed methods would break past scripts that used the old name
>
> A stable API to me, means the same method calls should still be  
> able to
> accept the same arguments (inc the constructor) and return the same
> object/data etc.

Yes.

> What if a module is pretty outdated and would benefit from a rewrite -
> should all the old method names be included, what if this makes coding
> difficult?

It depends on the module.  If a complete rewrite is needed then maybe  
starting with a new module/interface is best, and we could deprecate  
the older module completely.  That has been done already with  
Bio::Tools::BPLite (in favor of SearchIO) and a few other modules.

>> There are a few methods which are considered deprecated or will be
>> deprecated.  For instance, we recently talked about changes to method
>> names which use case to specify whether you're receiving an object
>> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
>> nested
>> list, or whether to use each_* vs next_* for iterators.   
>> Consistency is
>> nice!
>>
>
> You mean the use of case to signify objects vs data being returned are
> to be deprecated or encouraged? What was the outcome of the each_* vs
> next_*?
>
> Nath

Here's the section I added to the wiki (it started in a thread a few  
weeks or so ago, so it's a summary really):

http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names

Feel free to add to it or make suggestions.

BTWm Hilmar mentioned there was a movement to rename methods in old  
code to follow these recs but it was never completed.  It should be  
taken up again at some point but the recommendations are mainly here  
for newer code.

chris


From heikki at sanbi.ac.za  Sun Jul  8 03:32:21 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 8 Jul 2007 09:32:21 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E632D.4090801@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
	<468E632D.4090801@sheffield.ac.uk>
Message-ID: <200707080932.21818.heikki@sanbi.ac.za>

On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote:
> Heikki Lehvaslaiho wrote:
> > Hi Nat,
> >
> > These modules have not been touched for a while and were developed for a
> > specific task. A revire is defiitely in order.
> >
> > The way RNAChange->label was written, it should return 'inframe' when
> > given no alleles, but 'no change' would actually be better.
>
> Wouldn't this effectively be changing the API since past scripts "could"
> expect "inframe" to be returned.

Checking tha actal usage and what happens when you do change of a nucleotide 
to itself, you get the label 'silent'. I guess that would be a valid lable 
value even when the alleles are not initialised, too.

> > The multiple alleles were originally though to be a good idea, but the
> > vocabulary for labels was developed for single allele, only, The use of
> > the module ended up being limited to single allele, so add_allele()
> > behaviour was conveniently ignored but not removed. :(
>
> So add_Allele() and each_Allele() should be deprecated in favour of
> allele_mut()?

Yes.

> From my post about API's.....how should the capitalisation of
> add_Allele() and each_Allele() be changed?

Definitely, keept the current ones as deprecated alternatives.


    -Heikki

> Cheers
> Nath
>
> > 	-Heikki
> >
> > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> >> Nathan S. Haigh wrote:
> >>> I'm taking a look at the tests for Bio::Variation::RNAChange.
> >>>
> >>> If you create a new oject without arguments:
> >>> my $obj = Bio::Variation::RNAChange->new();
> >>>
> >>> What do you expect the following to return:
> >>> $obj->label();
> >>>
> >>> I thought it would probably be:
> >>> 'inframe'
> >>>
> >>> However you get:
> >>> 'inframe, deletion'
> >>>
> >>> Can anyone in the know explain what behaviour would be expected?
> >>>
> >>> Cheers
> >>> Nath
> >>
> >> Following on from this, AAChange has the following two methods:
> >> add_Allele() and allele_mut()
> >>
> >> It appears that allele_mut is only capable of remembering 1 allele at a
> >> time, whereas add_Allele() is provided to add support for mutliple
> >> alleles - is that correct?
> >>
> >> However, add_Allele() also calls allele_mut(), such that mutliple calls
> >> to add_Allele will result in the overwriting of the allele being
> >> remembered by allele_mut(). Things are further complicated by the fact
> >> that label() uses allele_mut() to decide on the label to return.
> >> Shouldn't label know aout multiple alleles set by multiple calls to
> >> add_Allele?
> >>
> >> It may be my lack of understanding alleles and what these classes are
> >> intending to do, but trying to rewrite the test scripts to improve code
> >> coverage has let me a little confused!
> >>
> >> Thanks
> >> Nath
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From xing.y.hu at gmail.com  Mon Jul  9 02:26:40 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Mon, 09 Jul 2007 14:26:40 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
Message-ID: <4691D520.60700@gmail.com>

Hi friends,

    I wrote a script for getting genomic sequence file from GenBank. To 
fulfill that target, I used DB::GenBank module to get the sequence via 
get_Seq_by_acc, and it works well. But this time, facing enormous amount 
of ESTs, I have no idea how to download them swiftly and elegantly.

    PROBLEM DESCRIPTION:
    goal: download all EST files of a specific species from GenBank, say 
Arabidopsis Thaliana or Oryza sativa(rice).
    other: whether all of ESTs are in a single file or separatedly 
placed does not matter.

    Can I use a bioperl script to achieve that? And How? I really 
appreciate.

Xing.


From akozik at atgc.org  Mon Jul  9 08:25:14 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Mon, 09 Jul 2007 05:25:14 -0700
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4691D520.60700@gmail.com>
References: <4691D520.60700@gmail.com>
Message-ID: <4692292A.1080900@atgc.org>

To download genomic sequences or ESTs for any organism (in various 
formats) you can use NCBI Taxonomy Browser:
http://www.ncbi.nlm.nih.gov/Taxonomy/

you can use taxonomy id to access different organisms, Arabidopsis for 
example (3702):
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702

or by direct web link:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1

assembled genomes can be accessed via ftp:
ftp://ftp.ncbi.nih.gov/genomes/

To download large amount of selected sequences (ESTs for example) you 
can use batch Entrez:
http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
(select EST for EST, it's critical)

It seems, to solve the problem you describe, you don't need to use 
bioperl. NCBI GenBank Entrez provides all necessary tools to work on 
these simple and frequent tasks.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


Xing Hu wrote:
> Hi friends,
> 
>     I wrote a script for getting genomic sequence file from GenBank. To 
> fulfill that target, I used DB::GenBank module to get the sequence via 
> get_Seq_by_acc, and it works well. But this time, facing enormous amount 
> of ESTs, I have no idea how to download them swiftly and elegantly.
> 
>     PROBLEM DESCRIPTION:
>     goal: download all EST files of a specific species from GenBank, say 
> Arabidopsis Thaliana or Oryza sativa(rice).
>     other: whether all of ESTs are in a single file or separatedly 
> placed does not matter.
> 
>     Can I use a bioperl script to achieve that? And How? I really 
> appreciate.
> 
> Xing.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jul  9 10:17:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Jul 2007 09:17:23 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4692292A.1080900@atgc.org>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>

Caveat: if you have millions of ESTs please consider NOT using my  
eutil script below or NCBI Batch Entrez, which would repeatedly hit  
the NCBI server thousands of times.  At least try looking for other  
ways to retrieve the data you want (ftp, organism-specific resources  
like Ensembl, so on), or run any scripts or data retrieval in off  
hours so you don't overtax the NCBI server.

There is a way you can use BioPerl if you don't mind living on the  
bleeding edge by using bioperl-live (core code from CVS).  I have  
been working on a set of modules for the last year  
(Bio::DB::EUtilities) which interact with all the various eutils for  
building data pipelines which uses the NCBI CGI interface.  You could  
possibly retrieve all relevant ESTs using a variation of the example  
script here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch

Note that the code examples do NOT work with rel. 1.5.2 code as the  
API has changed quite a bit; I'm working to rectify some of that.

The script I would use is below.  It retrieves batches of 500  
sequences (in fasta format) at a time, for a total of 10000 max seq  
records, saving the raw record data directly to a file (appending as  
you go along).  I added an eval block to check the server status and  
redo the call up to 4 times before giving up completely.  Using eval  
this way hasn't been extensively tested but should work.

---------------------------------------

use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'nucest',
                                        -term => 'txid3702',
                                        -usehistory => 'y',
                                        -keep_histories => 1);

my $count = $factory->get_count;

print "Count: $count\n";

if (my $hist = $factory->next_History) {
     print "History returned\n";
     # note db carries over from above
     $factory->set_parameters(-eutil => 'efetch',
                              -rettype => 'fasta',
                              -history => $hist);
     my ($retmax, $retstart) = (500,0);
     my $retry = 1;
     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq  
records to return
     RETRIEVE_SEQS:
     while ($retstart < $maxcount) {
         print "Returning from ",$retstart+1," to ",$retstart+ 
$retmax,"\n";
         $factory->set_parameters(-retmax => $retmax,
                                 -retstart => $retstart);
         # check in case of server error
         eval{
             $factory->get_Response(-file => ">>ESTs.fas");
         };
         if ($@) {
             die "Server error: $@.  Try again later" if $retry == 5;
             print STDERR "Server error, redo #$retry\n";
             $retry++ && redo RETRIEVE_SEQS;
         }
         $retstart += $retmax;
     }
}


---------------------------------------


chris

On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:

> To download genomic sequences or ESTs for any organism (in various
> formats) you can use NCBI Taxonomy Browser:
> http://www.ncbi.nlm.nih.gov/Taxonomy/
>
> you can use taxonomy id to access different organisms, Arabidopsis for
> example (3702):
> http://www.ncbi.nlm.nih.gov/sites/entrez? 
> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>
> or by direct web link:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>
> assembled genomes can be accessed via ftp:
> ftp://ftp.ncbi.nih.gov/genomes/
>
> To download large amount of selected sequences (ESTs for example) you
> can use batch Entrez:
> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
> (select EST for EST, it's critical)
>
> It seems, to solve the problem you describe, you don't need to use
> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
> these simple and frequent tasks.
>
> -Alex
>
> -- 
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 East Health Sciences Drive
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Xing Hu wrote:
>> Hi friends,
>>
>>     I wrote a script for getting genomic sequence file from  
>> GenBank. To
>> fulfill that target, I used DB::GenBank module to get the sequence  
>> via
>> get_Seq_by_acc, and it works well. But this time, facing enormous  
>> amount
>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>
>>     PROBLEM DESCRIPTION:
>>     goal: download all EST files of a specific species from  
>> GenBank, say
>> Arabidopsis Thaliana or Oryza sativa(rice).
>>     other: whether all of ESTs are in a single file or separatedly
>> placed does not matter.
>>
>>     Can I use a bioperl script to achieve that? And How? I really
>> appreciate.
>>
>> Xing.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon Jul  9 14:08:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 9 Jul 2007 11:08:07 -0700
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>

I don't think there is a function for this yet but it would be a good  
one to have.
I assume you don't really want to take a shot at writing it though?

To make this work I think you have to create a new node which  
contains the trifurcation and this node is what the root is set to.

-jason

On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:

> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From lstein at cshl.edu  Mon Jul  9 17:35:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 9 Jul 2007 17:35:49 -0400
Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager
Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com>

Hi Folks,

Sorry for the job spam. We're looking for a manager of the Cold Spring
Harbor Laboratory bioinformatics core facility. This is a semi-independent
staff position supporting  CSHL scientific researchers by providing
consultation, data mining and software development activities. You will have
a software staff of two, a  nice salary, good health benefits, and an
exciting and dynamic environment to work in. I'm looking for someone with a
strong bioinformatics background, at least five years experience programming
Perl, Java or Python in a academic or commercial environment, and management
experience. If you are interested, please send your CV and cover letter to
me.

Thanks,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From stewarta at nmrc.navy.mil  Mon Jul  9 18:16:12 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Mon, 9 Jul 2007 18:16:12 -0400
Subject: [Bioperl-l] rpsblast
Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil>

When I run...   $result = $factory->rpsblast($seq);   ... where $seq  
is a Bio::Seq object, it seems to simply copy the $seq object to  
$result;  When I run something similar... $rpsblast('/path/to/ 
myFile');    ... the value of $result then becomes '/path/to/myFile'.

Anyone else encounter this?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason_stajich at berkeley.edu  Mon Jul  9 21:36:10 2007
From: jason_stajich at berkeley.edu (Jason Stajich)
Date: Mon, 9 Jul 2007 18:36:10 -0700
Subject: [Bioperl-l] BOSC2007
Message-ID: <E6F5077E-50A3-489E-94B0-109FCAE6200F@berkeley.edu>

I posted a quick note about meeting up at BOSC/ISMB this year. If you  
are attending, please sign your name on the page or at least express  
an interest on whether you are interested in a BoF.  We'll try and  
discuss some of the current topics in BioPerl development as well try  
and use the time to coordinate any development that benefits from the  
face-to-face time.

http://bioperl.org/wiki/BOSC2007_Meetup
http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/

-jason
--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From schlesi at ebi.ac.uk  Tue Jul 10 08:58:00 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Tue, 10 Jul 2007 13:58:00 +0100
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
	<22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com>

Hi,

>  I don't think there is a function for this yet but it would be a good one
> to have.
> I assume you don't really want to take a shot at writing it though?
> To make this work I think you have to create a new node which contains the
> trifurcation and this node is what the root is set to.

Creating a new root is fine, but what would the (3) children of that
node be? I took a different approach now, where I iterate over all
(indirect) descendents of the root, find the first one which does not
have the root as its direct ancestor and move it up the tree, i.e.

foreach my $d ($root->get_all_Descendents){
  if ($d->ancestor != $root){
    $d->ancestor->remove_Descendent($d);
    if ($root->add_Descendent($d, 1) == 3){
    last;
  }}}

This will make the old root a trifurcation. It does the right thing
for what I am trying to do, but is not general I believe (it does for
example at the moment not worry about branch length). Also instead of
taking the first, taking the most distant possible subtree of a clade
up to the root might be better.

Felix


> On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:
>
> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


From xing.y.hu at gmail.com  Tue Jul 10 09:29:36 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Tue, 10 Jul 2007 21:29:36 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
Message-ID: <469389C0.5060303@gmail.com>

Thanks you guys.

I had to confess that how stupid I was. The easiest way seems to be the 
way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
fact, I knew that but I thought it was necessary to have all items 
selected before pressing save to launch download. So I was desperate to 
find a button that could achieve that without hundreds of thousands of 
clicking by me. "What about select none of those items at all?" -- This 
idea finally came to me after days of struggling and the problem was solved.

Xing


Chris Fields wrote:
> Caveat: if you have millions of ESTs please consider NOT using my 
> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
> the NCBI server thousands of times.  At least try looking for other 
> ways to retrieve the data you want (ftp, organism-specific resources 
> like Ensembl, so on), or run any scripts or data retrieval in off 
> hours so you don't overtax the NCBI server.
>
> There is a way you can use BioPerl if you don't mind living on the 
> bleeding edge by using bioperl-live (core code from CVS).  I have been 
> working on a set of modules for the last year (Bio::DB::EUtilities) 
> which interact with all the various eutils for building data pipelines 
> which uses the NCBI CGI interface.  You could possibly retrieve all 
> relevant ESTs using a variation of the example script here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>
> Note that the code examples do NOT work with rel. 1.5.2 code as the 
> API has changed quite a bit; I'm working to rectify some of that.
>
> The script I would use is below.  It retrieves batches of 500 
> sequences (in fasta format) at a time, for a total of 10000 max seq 
> records, saving the raw record data directly to a file (appending as 
> you go along).  I added an eval block to check the server status and 
> redo the call up to 4 times before giving up completely.  Using eval 
> this way hasn't been extensively tested but should work.
>
> ---------------------------------------
>
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                        -db => 'nucest',
>                                        -term => 'txid3702',
>                                        -usehistory => 'y',
>                                        -keep_histories => 1);
>
> my $count = $factory->get_count;
>
> print "Count: $count\n";
>
> if (my $hist = $factory->next_History) {
>     print "History returned\n";
>     # note db carries over from above
>     $factory->set_parameters(-eutil => 'efetch',
>                              -rettype => 'fasta',
>                              -history => $hist);
>     my ($retmax, $retstart) = (500,0);
>     my $retry = 1;
>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
> records to return
>     RETRIEVE_SEQS:
>     while ($retstart < $maxcount) {
>         print "Returning from ",$retstart+1," to 
> ",$retstart+$retmax,"\n";
>         $factory->set_parameters(-retmax => $retmax,
>                                 -retstart => $retstart);
>         # check in case of server error
>         eval{
>             $factory->get_Response(-file => ">>ESTs.fas");
>         };
>         if ($@) {
>             die "Server error: $@.  Try again later" if $retry == 5;
>             print STDERR "Server error, redo #$retry\n";
>             $retry++ && redo RETRIEVE_SEQS;
>         }
>         $retstart += $retmax;
>     }
> }
>
>
> ---------------------------------------
>
>
> chris
>
> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>
>> To download genomic sequences or ESTs for any organism (in various
>> formats) you can use NCBI Taxonomy Browser:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>
>> you can use taxonomy id to access different organisms, Arabidopsis for
>> example (3702):
>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>
>>
>> or by direct web link:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>
>>
>> assembled genomes can be accessed via ftp:
>> ftp://ftp.ncbi.nih.gov/genomes/
>>
>> To download large amount of selected sequences (ESTs for example) you
>> can use batch Entrez:
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>> (select EST for EST, it's critical)
>>
>> It seems, to solve the problem you describe, you don't need to use
>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>> these simple and frequent tasks.
>>
>> -Alex
>>
>> --Alexander Kozik
>> Bioinformatics Specialist
>> Genome and Biomedical Sciences Facility
>> 451 East Health Sciences Drive
>> University of California
>> Davis, CA 95616-8816
>> Phone: (530) 754-9127
>> email#1: akozik at atgc.org
>> email#2: akozik at gmail.com
>> web: http://www.atgc.org/
>>
>>
>>
>> Xing Hu wrote:
>>> Hi friends,
>>>
>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>> amount
>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>
>>>     PROBLEM DESCRIPTION:
>>>     goal: download all EST files of a specific species from GenBank, 
>>> say
>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>     other: whether all of ESTs are in a single file or separatedly
>>> placed does not matter.
>>>
>>>     Can I use a bioperl script to achieve that? And How? I really
>>> appreciate.
>>>
>>> Xing.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From davila at ioc.fiocruz.br  Tue Jul 10 09:58:29 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Tue, 10 Jul 2007 10:58:29 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <469389C0.5060303@gmail.com>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com>
Message-ID: <46939085.40906@ioc.fiocruz.br>

Hi Xing,

Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
and 13971 from T. cruzi 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
  that I cannot download at once in GenBank format... even when I select 
"GenBank" format in the Display menu I can only see and get/download 500 
ESTs each time...

I also downloaded all ESTs from GenBank (a pity there are not subsets of 
them !) but merging all them generate a file bigger than 120GB to be 
processed...

Just asked Diogo (my student) to give a try to the script sent by Chris 
Fields.. so finger crossed ;-)

Cheers, Alberto


Xing Hu wrote:
> Thanks you guys.
> 
> I had to confess that how stupid I was. The easiest way seems to be the 
> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
> fact, I knew that but I thought it was necessary to have all items 
> selected before pressing save to launch download. So I was desperate to 
> find a button that could achieve that without hundreds of thousands of 
> clicking by me. "What about select none of those items at all?" -- This 
> idea finally came to me after days of struggling and the problem was solved.
> 
> Xing
> 
> 
> 
> Chris Fields wrote:
>> Caveat: if you have millions of ESTs please consider NOT using my 
>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>> the NCBI server thousands of times.  At least try looking for other 
>> ways to retrieve the data you want (ftp, organism-specific resources 
>> like Ensembl, so on), or run any scripts or data retrieval in off 
>> hours so you don't overtax the NCBI server.
>>
>> There is a way you can use BioPerl if you don't mind living on the 
>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>> which interact with all the various eutils for building data pipelines 
>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>> relevant ESTs using a variation of the example script here:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>
>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>> API has changed quite a bit; I'm working to rectify some of that.
>>
>> The script I would use is below.  It retrieves batches of 500 
>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>> records, saving the raw record data directly to a file (appending as 
>> you go along).  I added an eval block to check the server status and 
>> redo the call up to 4 times before giving up completely.  Using eval 
>> this way hasn't been extensively tested but should work.
>>
>> ---------------------------------------
>>
>> use Bio::DB::EUtilities;
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                        -db => 'nucest',
>>                                        -term => 'txid3702',
>>                                        -usehistory => 'y',
>>                                        -keep_histories => 1);
>>
>> my $count = $factory->get_count;
>>
>> print "Count: $count\n";
>>
>> if (my $hist = $factory->next_History) {
>>     print "History returned\n";
>>     # note db carries over from above
>>     $factory->set_parameters(-eutil => 'efetch',
>>                              -rettype => 'fasta',
>>                              -history => $hist);
>>     my ($retmax, $retstart) = (500,0);
>>     my $retry = 1;
>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>> records to return
>>     RETRIEVE_SEQS:
>>     while ($retstart < $maxcount) {
>>         print "Returning from ",$retstart+1," to 
>> ",$retstart+$retmax,"\n";
>>         $factory->set_parameters(-retmax => $retmax,
>>                                 -retstart => $retstart);
>>         # check in case of server error
>>         eval{
>>             $factory->get_Response(-file => ">>ESTs.fas");
>>         };
>>         if ($@) {
>>             die "Server error: $@.  Try again later" if $retry == 5;
>>             print STDERR "Server error, redo #$retry\n";
>>             $retry++ && redo RETRIEVE_SEQS;
>>         }
>>         $retstart += $retmax;
>>     }
>> }
>>
>>
>> ---------------------------------------
>>
>>
>> chris
>>
>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>
>>> To download genomic sequences or ESTs for any organism (in various
>>> formats) you can use NCBI Taxonomy Browser:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>
>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>> example (3702):
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>
>>>
>>> or by direct web link:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>
>>>
>>> assembled genomes can be accessed via ftp:
>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>
>>> To download large amount of selected sequences (ESTs for example) you
>>> can use batch Entrez:
>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>> (select EST for EST, it's critical)
>>>
>>> It seems, to solve the problem you describe, you don't need to use
>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>> these simple and frequent tasks.
>>>
>>> -Alex
>>>
>>> --Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 East Health Sciences Drive
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>>
>>>
>>> Xing Hu wrote:
>>>> Hi friends,
>>>>
>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>> amount
>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>
>>>>     PROBLEM DESCRIPTION:
>>>>     goal: download all EST files of a specific species from GenBank, 
>>>> say
>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>     other: whether all of ESTs are in a single file or separatedly
>>>> placed does not matter.
>>>>
>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>> appreciate.
>>>>
>>>> Xing.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>


From cjfields at uiuc.edu  Tue Jul 10 10:05:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:05:43 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>

Just make sure you're using the latest from CVS.  Let me know if it  
doesn't work and I'll look into it.

chris

On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote:

> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei  
> ESTs
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=8)
> and 13971 from T. cruzi
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=11)
>   that I cannot download at once in GenBank format... even when I  
> select
> "GenBank" format in the Display menu I can only see and get/ 
> download 500
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not  
> subsets of
> them !) but merging all them generate a file bigger than 120GB to be
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by  
> Chris
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to  
>> be the
>> way using NCBI Taxonomy Browser which suggested by alex. As a  
>> matter of
>> fact, I knew that but I thought it was necessary to have all items
>> selected before pressing save to launch download. So I was  
>> desperate to
>> find a button that could achieve that without hundreds of  
>> thousands of
>> clicking by me. "What about select none of those items at all?" --  
>> This
>> idea finally came to me after days of struggling and the problem  
>> was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>> Caveat: if you have millions of ESTs please consider NOT using my
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit
>>> the NCBI server thousands of times.  At least try looking for other
>>> ways to retrieve the data you want (ftp, organism-specific resources
>>> like Ensembl, so on), or run any scripts or data retrieval in off
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the
>>> bleeding edge by using bioperl-live (core code from CVS).  I have  
>>> been
>>> working on a set of modules for the last year (Bio::DB::EUtilities)
>>> which interact with all the various eutils for building data  
>>> pipelines
>>> which uses the NCBI CGI interface.  You could possibly retrieve all
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. 
>>> 3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500
>>> sequences (in fasta format) at a time, for a total of 10000 max seq
>>> records, saving the raw record data directly to a file (appending as
>>> you go along).  I added an eval block to check the server status and
>>> redo the call up to 4 times before giving up completely.  Using eval
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms,  
>>>> Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez? 
>>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
>>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for  
>>>> example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to  
>>>> work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from  
>>>>> GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the  
>>>>> sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and  
>>>>> elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from  
>>>>> GenBank,
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From diogoat at gmail.com  Tue Jul 10 10:15:20 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 10 Jul 2007 11:15:20 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>

Deal All,
I use this script bellow, and it`s work very fine!
I only changed the query! And the script gave me the 5133 EST from T.
brucei.

#################################################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'gbdiv est[prop] AND Trypanosoma
brucei [organism]',
                                db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'Genbank',
                          -file => '>>Tbrucei.EST.fasta');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
####################################################################

Diogo Tschoeke/Fiocruz (Alberto`s Student)


From cjfields at uiuc.edu  Tue Jul 10 10:35:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:35:03 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
	<638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu>

That will work as well; the key difference between my example and  
this one is that the seq stream retrieved using Bio::DB::GenBank  
passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq  
record directly to a file (or callback or HTTP::Response) for  
optionally parsing later.

If you have problems with Bio::SeqIO you can always use  
Bio::DB::EUtilities to get around the issue until we resolve it.

chris

On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote:

> Deal All,
> I use this script bellow, and it`s work very fine!
> I only changed the query! And the script gave me the 5133 EST from T.
> brucei.
>
> ###################################################################### 
> ###########
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'gbdiv est[prop] AND  
> Trypanosoma
> brucei [organism]',
>                                 db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'Genbank',
>                           -file => '>>Tbrucei.EST.fasta');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> ####################################################################
>
> Diogo Tschoeke/Fiocruz (Alberto`s Student)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hartzell at alerce.com  Tue Jul 10 12:50:31 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 12:50:31 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
Message-ID: <18067.47319.254632.538811@almost.alerce.com>

Jason Stajich writes:
 > [...]
 > Do you know how to have svn commit messages generate summary emails  
 > as well?

I've made a local installation of the SVN::Notify bits in my home
directory and set up its notification script.  If folks are happy with
it then I'll work on getting The Powers That Be to do a real install
and we'll use it for the real repository.

It's currently configured to include diffs inline in the message.  I
prefer them as an attachment, but the current configuration of the
bioperl-guts-l list stalls messages w/ attachments and requires admin
intervention.  I have a support@ request going on it and will change
it if/when we get the issue resolved.

So, to review:

   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/

is the top of the repository and

   svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk 

will get you the main branch of bioperl-live.

Remember that the repository is transient, don't put anything
important in there....

Have at it, but remember that the entire world will see your commit
messages.

g.


From xing.y.hu at gmail.com  Tue Jul 10 13:08:35 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Wed, 11 Jul 2007 01:08:35 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>	<469389C0.5060303@gmail.com>
	<46939085.40906@ioc.fiocruz.br>
Message-ID: <4693BD13.2070509@gmail.com>

Hi Alberto,

Yes, I know that there is only choice for showing no more than 500 
entries on the NCBI website. However, I completely ignored that (doesn't 
mean that I have not seen that), and pulled down the "send to" and chose 
"file". Then a small window popped up, after saying yes to that, the 
downloading started. You might ask me how I know that it was not a batch 
of only 5 (default selection) or 500 ESTs? To be honest, I don't know at 
the first time. But the download has accumulated to millions bytes since 
then(due to my bad network condition, I have no idea when it will reach 
the end), and that doesn't look like a little batch of ESTs less than 
one thousand. Actually, I wrote a script to count the sequences within 
the temporary file and got a number much bigger than ten thousand. So I 
guess it works.

BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys!

Xing


Alberto Davila wrote:
> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
> and 13971 from T. cruzi 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
>   that I cannot download at once in GenBank format... even when I select 
> "GenBank" format in the Display menu I can only see and get/download 500 
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not subsets of 
> them !) but merging all them generate a file bigger than 120GB to be 
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by Chris 
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>   
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to be the 
>> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
>> fact, I knew that but I thought it was necessary to have all items 
>> selected before pressing save to launch download. So I was desperate to 
>> find a button that could achieve that without hundreds of thousands of 
>> clicking by me. "What about select none of those items at all?" -- This 
>> idea finally came to me after days of struggling and the problem was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> Caveat: if you have millions of ESTs please consider NOT using my 
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>>> the NCBI server thousands of times.  At least try looking for other 
>>> ways to retrieve the data you want (ftp, organism-specific resources 
>>> like Ensembl, so on), or run any scripts or data retrieval in off 
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the 
>>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>>> which interact with all the various eutils for building data pipelines 
>>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500 
>>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>>> records, saving the raw record data directly to a file (appending as 
>>> you go along).  I added an eval block to check the server status and 
>>> redo the call up to 4 times before giving up completely.  Using eval 
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to 
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>       
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>         
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from GenBank, 
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>       
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bix at sendu.me.uk  Tue Jul 10 13:14:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Jul 2007 18:14:29 +0100
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
Message-ID: <4693BE75.4090005@sendu.me.uk>

George Hartzell wrote:
> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails  
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.

Can I put a vote in that you don't? I search through email body text in 
my archive of guts to find certain diffs, so really like the diffs inline.

Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
in the subject? Seems redundant and makes it harder to see what was 
changed in a small email client window.


From aaron.j.mackey at gsk.com  Tue Jul 10 13:20:15 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 10 Jul 2007 13:20:15 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
Message-ID: <OF37443F52.13AE1143-ON85257314.005D5FF0-85257314.005F432E@gsk.com>

George, this is all very nice to finally have, thank you for your efforts!

Any chance that the diff-as-attachment vs. diffs-inline question can be 
different for each subscriber?  The utility of the "guts" mailing list (to 
me) is that it's an encyclopedia of browsable, skimmable, and searchable 
diffs, not just a date-stamped record of diffs (if so, why provide an 
attachment at all, just provide a URL to the diff in the respository).

Thanks again,

-Aaron


bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM:

> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails 
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.
> 
> So, to review:
> 
>    svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/
> 
> is the top of the repository and
> 
>    svn co svn+ssh://dev.open-bio.
> org/home/hartzell/bioperl_take2/bioperl-live/trunk 
> 
> will get you the main branch of bioperl-live.
> 
> Remember that the repository is transient, don't put anything
> important in there....
> 
> Have at it, but remember that the entire world will see your commit
> messages.
> 
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Tue Jul 10 14:18:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 13:18:07 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>


On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Jason Stajich writes:
>>> [...]
>>> Do you know how to have svn commit messages generate summary emails
>>> as well?
>>
>> I've made a local installation of the SVN::Notify bits in my home
>> directory and set up its notification script.  If folks are happy  
>> with
>> it then I'll work on getting The Powers That Be to do a real install
>> and we'll use it for the real repository.
>>
>> It's currently configured to include diffs inline in the message.  I
>> prefer them as an attachment, but the current configuration of the
>> bioperl-guts-l list stalls messages w/ attachments and requires admin
>> intervention.  I have a support@ request going on it and will change
>> it if/when we get the issue resolved.
>
> Can I put a vote in that you don't? I search through email body  
> text in
> my archive of guts to find certain diffs, so really like the diffs  
> inline.
>
> Also, is there any way to get rid of the 'bioperl' in [bioperl  
> revision]
> in the subject? Seems redundant and makes it harder to see what was
> changed in a small email client window.

Agree on both counts; the devs have gotten used to seeing the diffs  
inline.

We prob. need to schedule a specific day/time when the switchover  
would take place so we can announce (so everyone knows and no one can  
gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
some tools a while ago...

chris


From hartzell at alerce.com  Tue Jul 10 16:09:09 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:09:09 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <18067.59237.519166.454578@almost.alerce.com>

Sendu Bala writes:
 > George Hartzell wrote:
 > > Jason Stajich writes:
 > >  > [...]
 > >  > Do you know how to have svn commit messages generate summary emails  
 > >  > as well?
 > > 
 > > I've made a local installation of the SVN::Notify bits in my home
 > > directory and set up its notification script.  If folks are happy with
 > > it then I'll work on getting The Powers That Be to do a real install
 > > and we'll use it for the real repository.
 > > 
 > > It's currently configured to include diffs inline in the message.  I
 > > prefer them as an attachment, but the current configuration of the
 > > bioperl-guts-l list stalls messages w/ attachments and requires admin
 > > intervention.  I have a support@ request going on it and will change
 > > it if/when we get the issue resolved.
 > 
 > Can I put a vote in that you don't? I search through email body text in 
 > my archive of guts to find certain diffs, so really like the diffs inline.

Ok, three votes against attachments.  Anyone want to vote in support,
otherwise I'll just leave 'em inline.

 > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
 > in the subject? Seems redundant and makes it harder to see what was 
 > changed in a small email client window.

Sure.  The default's just [RevisionNumber].  Does that work for folk?

g.


From hartzell at alerce.com  Tue Jul 10 16:11:36 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:11:36 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
Message-ID: <18067.59384.247108.463648@almost.alerce.com>

Chris Fields writes:
 > [...]
 > We prob. need to schedule a specific day/time when the switchover  
 > would take place so we can announce (so everyone knows and no one can  
 > gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
 > some tools a while ago...

I haven't done anything about it.

I think that we also need to have some input from the admin/support
folk about access methods (https, etc...).

Are we going to want to mirror the repository anywhere?

g.


From hartzell at alerce.com  Wed Jul 11 09:17:08 2007
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 11 Jul 2007 09:17:08 -0400
Subject: [Bioperl-l] extra hook functionality for svn repos?
Message-ID: <18068.55380.626778.486775@almost.alerce.com>


There are a bunch of "contributed" hook scripts at

  http://subversion.tigris.org/tools_contrib.html#hook_scripts

Given that many bioperl users depend on case-preserving but
case-insensitive file systems, I'm wondering if hooking up the
case-insensitive.py script might be worthwhile.

Likewise, the check-mime-type.pl script might help us keep
svn:mime-type and svn:eol-style properties up to date.

There are others there, but none that I found interesting.

How big-brother do we want the repository to be?

g.


From cjfields at uiuc.edu  Wed Jul 11 09:40:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Jul 2007 08:40:54 -0500
Subject: [Bioperl-l] extra hook functionality for svn repos?
In-Reply-To: <18068.55380.626778.486775@almost.alerce.com>
References: <18068.55380.626778.486775@almost.alerce.com>
Message-ID: <A13F608F-16FA-4432-AA2F-83674E3A73F4@uiuc.edu>


On Jul 11, 2007, at 8:17 AM, George Hartzell wrote:

>
> There are a bunch of "contributed" hook scripts at
>
>   http://subversion.tigris.org/tools_contrib.html#hook_scripts
>
> Given that many bioperl users depend on case-preserving but
> case-insensitive file systems, I'm wondering if hooking up the
> case-insensitive.py script might be worthwhile.

I'm not sure how often we run into this, though.  Anyone know?

> Likewise, the check-mime-type.pl script might help us keep
> svn:mime-type and svn:eol-style properties up to date.

The latter two might be nice.  I thought we planned on defaulting to  
a simple 'plain text' mime type on commits if it isn't specifically  
predefined, but maybe this way is better?

> There are others there, but none that I found interesting.
>
> How big-brother do we want the repository to be?
>
> g.

'Friendly' big-brother, not 'dystopian' big-brother.

chris


From marian.thieme at lycos.de  Wed Jul 11 05:05:18 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 09:05:18 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178019848@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/eec1aa42/attachment-0002.html>

From dmessina at wustl.edu  Wed Jul 11 16:14:17 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 11 Jul 2007 15:14:17 -0500
Subject: [Bioperl-l] submitting code
In-Reply-To: <188661178019848@lycos-europe.com>
References: <188661178019848@lycos-europe.com>
Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu>

Hi Marian,

Thanks so much for contributing! The best way would be to create a  
Bugzilla ticket and then attach the code to that ticket. One of the  
developers will check it in and give you feedback if there are any  
little tweaks that would be helpful*.

Would you be able to include documentation and test cases with your  
module?

Dave


* For more info:
http://www.bioperl.org/wiki/FAQ#I. 
27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F
http://www.bioperl.org/wiki/Developer_Information
http://www.bioperl.org/wiki/Becoming_a_developer
http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From marian.thieme at lycos.de  Wed Jul 11 11:12:20 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 15:12:20 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178030343@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/c95991b8/attachment-0002.html>

From e-just at northwestern.edu  Thu Jul 12 10:37:03 2007
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 12 Jul 2007 09:37:03 -0500
Subject: [Bioperl-l] Job opening in Chicago
Message-ID: <fa1fe35c0707120737i71c6c26fq7635e350da9bf23f@mail.gmail.com>

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago)
for a Bioinformatics Software Engineer.  This job involves writing and
maintaining software for a genome database using Chado/OO-Perl/Bioperl
and many other state of the art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From cjfields at uiuc.edu  Thu Jul 12 12:09:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Jul 2007 11:09:02 -0500
Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question
Message-ID: <A8310D54-F800-43BE-B6C3-3879206CE697@uiuc.edu>

I have been running into some GFF formatting issues where the  
attributes column is left undef (no '.'), which causes  
GFF3Loader::parse_attributes() to complain with an 'use of undefined  
string with split' warning.  Would it be okay with the powers that be  
(Scott, Lincoln) to add a warning or exception there?  I'm guessing a  
warning is better in this case, as just returning works fine.

chris


From jason at bioperl.org  Fri Jul 13 13:30:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 13:30:05 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.59384.247108.463648@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>

I'll try and look into this and other stuff with the migration in  
next week or so - maybe we'll make some time to talk it through  
during BOSC.  I don't know yet when I'll actually have time to think  
about it properly.

I am still worried about doing https because of the current system we  
have supporting user logins and that we didn't want to run a web  
server on the main repository machine and we'll have to install DAV  
on the main repository machine.  if ssh+svn is going to be sufficient  
hurdle for people, note it was already a hurdle for them with CVS,  
but we'll have to think a bit more on it.

We might be able to do some sort of NFS (or other exported FS) but  
exported to the webserver machine but that is may be a recipe for  
disaster.

-jason
On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:

> Chris Fields writes:
>> [...]
>> We prob. need to schedule a specific day/time when the switchover
>> would take place so we can announce (so everyone knows and no one can
>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>> some tools a while ago...
>
> I haven't done anything about it.
>
> I think that we also need to have some input from the admin/support
> folk about access methods (https, etc...).
>
> Are we going to want to mirror the repository anywhere?
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri Jul 13 14:29:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 13:29:22 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu>

I don't think there's a huge rush on this since BOSC is imminent. If  
devs really want https then we can try adding it after migration, but  
if it becomes too much of a headache (particularly for the web  
admins) I wouldn't worry about it.

chris

On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote:

> I'll try and look into this and other stuff with the migration in
> next week or so - maybe we'll make some time to talk it through
> during BOSC.  I don't know yet when I'll actually have time to think
> about it properly.
>
> I am still worried about doing https because of the current system we
> have supporting user logins and that we didn't want to run a web
> server on the main repository machine and we'll have to install DAV
> on the main repository machine.  if ssh+svn is going to be sufficient
> hurdle for people, note it was already a hurdle for them with CVS,
> but we'll have to think a bit more on it.
>
> We might be able to do some sort of NFS (or other exported FS) but
> exported to the webserver machine but that is may be a recipe for
> disaster.
>
> -jason
> On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:
>
>> Chris Fields writes:
>>> [...]
>>> We prob. need to schedule a specific day/time when the switchover
>>> would take place so we can announce (so everyone knows and no one  
>>> can
>>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>>> some tools a while ago...
>>
>> I haven't done anything about it.
>>
>> I think that we also need to have some input from the admin/support
>> folk about access methods (https, etc...).
>>
>> Are we going to want to mirror the repository anywhere?
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sheris at eps.berkeley.edu  Fri Jul 13 14:42:32 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Fri, 13 Jul 2007 11:42:32 -0700
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
Message-ID: <200707131142.32366.sheris@eps.berkeley.edu>

Hi,
I have a collection of sequencing reads aligned with a consensus sequence that 
I input into a Bio::PopGen::Population object in order to calculate allele 
frequencies. The consensus sequence is included to force clustalw to give a 
better alignment. However,  I need to remove the consensus sequence before 
calculating allele frequencies in the individual reads. I'm having trouble 
with this part of it. I get the following error message:

"Can't locate object method "person_id" via package "Bio::PopGen::Individual" 		
at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line 49."

Here is the code snippet producing the error. $pop is a 
Bio::PopGen::Population object.

	my @consensus = "gene_consensus";
	$pop->remove_Individuals(@consensus);

I also tried:
	my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); 
	$pop->remove_Individuals(@consensus);

which produced the same error. Can anyone send me in the right direction? I 
suspect this is a simple problem.

Sheri

-- 
Sheri Simmons
Department of Earth and Planetary Sciences
University of California, Berkeley
Berkeley, CA 94720-4767


From jason at bioperl.org  Fri Jul 13 16:17:31 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 16:17:31 -0400
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu>
References: <200707131142.32366.sheris@eps.berkeley.edu>
Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org>

Hi Sheri -

Shoot - that was my fault - bug in the code where I was only using  
"Person" not Individuals for the code when I was testing.

I've commited a bugfix to CVS - do you need me to send you the  
updated file or are you comfortable grabbing the code from CVS or  
http://code.open-bio.org

This is the change - you may have a different version of BioPerl than  
what is in CVS so you may have to make the changes on line 260 rather  
than 282 -- or you can upgrade to latest code via CVS (although this  
is probably harder for you since you've got stuff installed in /usr/ 
share)':

RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ 
Population.pm,v
retrieving revision 1.22
diff -r1.22 Population.pm
282c282
<       unshift @tosplice, $i if( $namehash{$ind->person_id} );
---
 >       unshift @tosplice, $i if( $namehash{$ind->unique_id} );

-jason
On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote:

> Hi,
> I have a collection of sequencing reads aligned with a consensus  
> sequence that
> I input into a Bio::PopGen::Population object in order to calculate  
> allele
> frequencies. The consensus sequence is included to force clustalw  
> to give a
> better alignment. However,  I need to remove the consensus sequence  
> before
> calculating allele frequencies in the individual reads. I'm having  
> trouble
> with this part of it. I get the following error message:
>
> "Can't locate object method "person_id" via package  
> "Bio::PopGen::Individual" 		
> at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line  
> 49."
>
> Here is the code snippet producing the error. $pop is a
> Bio::PopGen::Population object.
>
> 	my @consensus = "gene_consensus";
> 	$pop->remove_Individuals(@consensus);
>
> I also tried:
> 	my @consensus = $pop->get_Individuals(-unique_id =>  
> "gene_consensus");
> 	$pop->remove_Individuals(@consensus);
>
> which produced the same error. Can anyone send me in the right  
> direction? I
> suspect this is a simple problem.
>
> Sheri
>
> -- 
> Sheri Simmons
> Department of Earth and Planetary Sciences
> University of California, Berkeley
> Berkeley, CA 94720-4767
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From hartzell at alerce.com  Fri Jul 13 16:34:14 2007
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 13 Jul 2007 16:34:14 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <18071.57798.130368.703488@almost.alerce.com>

Jason Stajich writes:
 > I'll try and look into this and other stuff with the migration in  
 > next week or so - maybe we'll make some time to talk it through  
 > during BOSC.  I don't know yet when I'll actually have time to think  
 > about it properly.
 > 
 > I am still worried about doing https because of the current system we  
 > have supporting user logins and that we didn't want to run a web  
 > server on the main repository machine and we'll have to install DAV  
 > on the main repository machine.  if ssh+svn is going to be sufficient  
 > hurdle for people, note it was already a hurdle for them with CVS,  
 > but we'll have to think a bit more on it.
 > [...]

How are you thinking about providing anonymous readonly non-dev access
to the repository?  svn+ssh using an anonymous/guest account (can it
be screwed down tightly enough?)  svn-mirror the repo onto the public
machine and do DAV there w/out having to worry about authenticating
the devs?

g.


From jason at bioperl.org  Fri Jul 13 17:33:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 17:33:29 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18071.57798.130368.703488@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
	<18071.57798.130368.703488@almost.alerce.com>
Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org>


On Jul 13, 2007, at 4:34 PM, George Hartzell wrote:

> Jason Stajich writes:
>> I'll try and look into this and other stuff with the migration in
>> next week or so - maybe we'll make some time to talk it through
>> during BOSC.  I don't know yet when I'll actually have time to think
>> about it properly.
>>
>> I am still worried about doing https because of the current system we
>> have supporting user logins and that we didn't want to run a web
>> server on the main repository machine and we'll have to install DAV
>> on the main repository machine.  if ssh+svn is going to be sufficient
>> hurdle for people, note it was already a hurdle for them with CVS,
>> but we'll have to think a bit more on it.
>> [...]
>
> How are you thinking about providing anonymous readonly non-dev access
> to the repository?  svn+ssh using an anonymous/guest account (can it
> be screwed down tightly enough?)  svn-mirror the repo onto the public
> machine and do DAV there w/out having to worry about authenticating
> the devs?
>
We'll do svn on the public anonymous machine like we already do with  
CVS and with SVN

See:
http://code.open-bio.org
  AND
http://code.open-bio.org/svnweb/
See blipkit.

-jason
> g.
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From scrosson at uchicago.edu  Fri Jul 13 18:15:30 2007
From: scrosson at uchicago.edu (Sean Crosson)
Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC)
Subject: [Bioperl-l] ace to fasta conversion
Message-ID: <loom.20070714T000856-94@post.gmane.org>

I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
and it works great.  We're now trying to convert a big (250 MB) .ace file to
fasta.  The documentation suggests I can do this, but everytime I run the script
below, it outputs an empty .fas file.  Does anyone have any suggestions on how
to make this script work?  Does SeqIO really convert between these file types? 
Thanks for your help.

#!/usr/bin/perl -w

use Bio::SeqIO;


$in  = Bio::SeqIO->new(-file => "454Contigs.ace",
                       -format => 'ace');
$out = Bio::SeqIO->new(-file => ">454Contigs.fas",
                       -format => 'fasta');
while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }


From cvillamar at gmail.com  Fri Jul 13 19:24:04 2007
From: cvillamar at gmail.com (Carlos Villacorta)
Date: Fri, 13 Jul 2007 16:24:04 -0700
Subject: [Bioperl-l] beginner problem with fasta headers
Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>

hi all,
I have a embl sequence file, when formatting to fasta with Seqio it
gives a long string header for each sequence that my following
phylogenetic software cannot handle...
Does anyone knows how to format those embl or genbank files to fasta
but retrieving in the headers just two or three fields (e.g. id | gene
| sp_name)?
Any advice with this problem would be very appreciated, thanks!


From j_martin at lbl.gov  Fri Jul 13 20:05:45 2007
From: j_martin at lbl.gov (Joel Martin)
Date: Fri, 13 Jul 2007 17:05:45 -0700
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <loom.20070714T000856-94@post.gmane.org>
References: <loom.20070714T000856-94@post.gmane.org>
Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org>

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote:
> I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
> and it works great.  We're now trying to convert a big (250 MB) .ace file to
> fasta.  The documentation suggests I can do this, but everytime I run the script
> below, it outputs an empty .fas file.  Does anyone have any suggestions on how
> to make this script work?  Does SeqIO really convert between these file types? 
> Thanks for your help.
> 
> #!/usr/bin/perl -w
> 
> use Bio::SeqIO;
> 
> 
> $in  = Bio::SeqIO->new(-file => "454Contigs.ace",
>                        -format => 'ace');
> $out = Bio::SeqIO->new(-file => ">454Contigs.fas",
>                        -format => 'fasta');
> while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Jul 14 00:06:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 23:06:27 -0500
Subject: [Bioperl-l] beginner problem with fasta headers
In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu>

Some reading material...

http://www.bioperl.org/wiki/ 
FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files
http://www.bioperl.org/wiki/ 
FAQ#I_would_like_to_make_my_own_custom_fasta_header_- 
_how_do_I_do_this.3F
http://www.bioperl.org/wiki/FASTA_sequence_format#Note

Quiz on Monday!

chris

On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote:

> hi all,
> I have a embl sequence file, when formatting to fasta with Seqio it
> gives a long string header for each sequence that my following
> phylogenetic software cannot handle...
> Does anyone knows how to format those embl or genbank files to fasta
> but retrieving in the headers just two or three fields (e.g. id | gene
> | sp_name)?
> Any advice with this problem would be very appreciated, thanks!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scrosson at uchicago.edu  Fri Jul 13 23:43:59 2007
From: scrosson at uchicago.edu (scrosson)
Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT)
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org>
References: <loom.20070714T000856-94@post.gmane.org>
	<20070714000544.GB29841@eniac.jgi-psf.org>
Message-ID: <11590811.post@talk.nabble.com>


This problem now makes sense.  I've been playing with Bio::Assembly::IO,
which does indeed read phrap .ace files.  Does anyone have an idea how to
pull the assembled contigs out of a Bio::Assembly object and write them out
as multi-fasta (or strings for that matter)?  None of our workstations are
running phrap/consed and I'd love to see these contigs.

Sean 
       

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try
poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

-- 
View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bioperlanand at yahoo.com  Sat Jul 14 13:55:53 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT)
Subject: [Bioperl-l] a question on obtain PDB records using bioperl
Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com>

Hi everybody,

Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records.

Thanks in advance,

Anand

       
---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. 


From johnsonm at gmail.com  Tue Jul 17 14:23:58 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 17 Jul 2007 13:23:58 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
Message-ID: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>

I'm tinkering with parsing iprscan reports with BioPerl.  I noticed that this:

  my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro');

  while (my $seq = $seqio->next_seq()) {
      ...
  }

Does not work unless I first 'use XML::DOM::XPath'.  I get this error:

  Can't locate object method "findnodes" via package
"XML::DOM::Document" at
bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
30.

I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
suck in XML::DOM::Xpath.  I see that t/interpro.t requires
XML::DOM::XPath:

test_begin(-tests => 17,
                -requires_module => 'XML::DOM::XPath');

Is suppose the reason the test specs a require XML::DOM::XPath is so
that tests can be skipped if XML::DOM::XPath is not available.
Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?


From sac at bioperl.org  Tue Jul 17 15:49:32 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 17 Jul 2007 12:49:32 -0700
Subject: [Bioperl-l] Ohloh account for bioperl
Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>

I came across a web app that tracks various metrics for open source
projects, noticed that bioperl wasn't listed, and added it:

http://www.ohloh.net/projects/6685

Seems like an interesting resource that could help add some
visibility. It creates metrics by directly processing the source code
repository. I hooked it up to the CVS repos for bioperl-live, -db,
-run, and -pipeline. It has yet to do its analysis at this point.

Feel free to create Ohloh accounts for yourselves. When you add
yourself as a contributor to Bioperl, you can indicate the username
associated with your commits, but this requires that it first process
the commit logs to figure out what the usernames are. You can still
create an account, just update it later with your username.

Steve


From cjfields at uiuc.edu  Tue Jul 17 17:04:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Jul 2007 16:04:44 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>


On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:

> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed  
> that this:
>
>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>  
> 'interpro');
>
>   while (my $seq = $seqio->next_seq()) {
>       ...
>   }
>
> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
>
>   Can't locate object method "findnodes" via package
> "XML::DOM::Document" at
> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> 30.
>
> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> XML::DOM::XPath:
>
> test_begin(-tests => 17,
>                 -requires_module => 'XML::DOM::XPath');
>
> Is suppose the reason the test specs a require XML::DOM::XPath is so
> that tests can be skipped if XML::DOM::XPath is not available.
> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?

You're right; I think tests passed b/c XML::DOM::XPath (if present),  
was eval'd as a required module.  When I commented out the spot where  
it is eval'd in the test suite I can replicate this error.  I have  
added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it  
passes fine.

Thanks for the heads up!

chris


From xianranli78 at yahoo.com.cn  Wed Jul 18 01:55:19 2007
From: xianranli78 at yahoo.com.cn (Xianran Li)
Date: Wed, 18 Jul 2007 13:55:19 +0800
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file
Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>

Hi,

I want to extract some infomation  from the gff3 file like:

12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
   
The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?

Thanks for your help.


Xianran Li


From georg.otto at tuebingen.mpg.de  Wed Jul 18 05:32:26 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 18 Jul 2007 11:32:26 +0200
Subject: [Bioperl-l] run megablast
Message-ID: <m1r6n66or9.fsf@tuebingen.mpg.de>


Hi,

is there a module to run megablast in a script (equivalent to ncbi
blast in StandAloneBlast.pm)?

Cheers,

Georg


From jeevitesh at ibab.ac.in  Wed Jul 18 06:03:24 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 03:15:33 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in>

Hi Friends,

we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES.

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 04:45:50 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From cain.cshl at gmail.com  Wed Jul 18 09:10:40 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 09:10:40 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from
	gff3	file
In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
Message-ID: <1184764240.2570.31.camel@localhost.localdomain>

Hi Xianran Li,

Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
as Bio::DB::GFF3), then you can use the attributes method to get
anything in the ninth column:

  my ($name) = $gene->attributes('Name');

The parenthesis are needed around $name because the attributes method
returns a list and the parens capture the first item of the list into
$name.

Scott


On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> Hi,
> 
> I want to extract some infomation  from the gff3 file like:
> 
> 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
>    
> The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> 
> Thanks for your help.
> 
> 
> Xianran Li
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/c66ec18b/attachment-0002.bin>

From johnsonm at gmail.com  Wed Jul 18 16:53:00 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 18 Jul 2007 15:53:00 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <469DB6C6.9010702@pasteur.fr>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
	<5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>
	<469DB6C6.9010702@pasteur.fr>
Message-ID: <ebf5eb170707181352v4d59ec81kfb6f706ca4643cc7@mail.gmail.com>

The output from InterProScan, invoked thusly:

iprscan -cli -seqtype p -i input_file -o output_file -format xml

On 7/18/07, Emmanuel Quevillon <tuco at pasteur.fr> wrote:
> Hi guys,
>
> I read your email and I wondered which iprscan file you've
> been talking about? Is it the file produced by InterProScan
> or the file called match.xml representing the whole uniprot
> database against InterPro? Reading the xml parser
> implemented into Bio::SeqIO::interpro, I guess it is the
> second one?
> In such case, I just want to let you know that the xml
> schema changed and the file name also. It is now called
> match_complete.xml.
> I attached the DTD to be able to see the new structure.
> Here is an example of the new data representation.
>
>
> <protein id="A0A000" name="A0A000_9ACTO" length="394"
> crc64="F1DD0C1042811B48">
>      <match id="G3DSA:3.40.640.10"
> name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D"
> status="T" evd="HMMPfam">
>        <ipr id="IPR015421" name="Pyridoxal
> phosphate-dependent transferase, major region, subdomain 1"
> type="Domain" />
>        <lcn start="52" end="288" score="4.30000170645879E-75" />
>      </match>
>      <match id="PTHR13693:SF7" name="PTHR13693:SF7"
> dbname="PANTHER" status="T" evd="not_rel">
>        <lcn start="33" end="389" score="0.0" />
>      </match>
> </protein>
>
> As you can see some time there is no interpro info (no ipr
> element).
>
> I think it would be good to change also the interpro parser ?
>
> Regards
>
> Emmanuel
>
> Chris Fields wrote:
> > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:
> >
> >> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed
> >> that this:
> >>
> >>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>
> >> 'interpro');
> >>
> >>   while (my $seq = $seqio->next_seq()) {
> >>       ...
> >>   }
> >>
> >> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
> >>
> >>   Can't locate object method "findnodes" via package
> >> "XML::DOM::Document" at
> >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> >> 30.
> >>
> >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> >> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> >> XML::DOM::XPath:
> >>
> >> test_begin(-tests => 17,
> >>                 -requires_module => 'XML::DOM::XPath');
> >>
> >> Is suppose the reason the test specs a require XML::DOM::XPath is so
> >> that tests can be skipped if XML::DOM::XPath is not available.
> >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?
> >
> > You're right; I think tests passed b/c XML::DOM::XPath (if present),
> > was eval'd as a required module.  When I commented out the spot where
> > it is eval'd in the test suite I can replicate this error.  I have
> > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it
> > passes fine.
> >
> > Thanks for the heads up!
> >
> > chris
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cain.cshl at gmail.com  Wed Jul 18 22:47:53 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 22:47:53 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from	gff3
	file
In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
	<1184764240.2570.31.camel@localhost.localdomain>
	<008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
Message-ID: <1184813273.2570.96.camel@localhost.localdomain>

[Please always reply to the mailing list so that answers can archived]


Yes, because commas are not allowed in GFF3 in an unescaped form.
Essentially, you are doing this with your GFF3:

  Name=receptor kinase ORK10;Name= putative

and when you do this:

  my ($name) = $gene->attributes('Name');

you are getting the first item in the list of names, and I suspect which
one you get is random.

To fix it, you need to replace the comma with %2C (the URL escape code
for a comma).  If you generated this GFF3, you will need to add a step
to URI encode your attribute strings.  If you got it from someone else,
you should point out to them that their GFF is flawed.

Scott


On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote:
> However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? 
> 
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
> as Bio::DB::GFF3), then you can use the attributes method to get
> anything in the ninth column:
> 
>   my ($name) = $gene->attributes('Name');
> 
> The parenthesis are needed around $name because the attributes method
> returns a list and the parens capture the first item of the list into
> $name.
> 
> Scott
> 
> 
> On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> > Hi,
> > 
> > I want to extract some infomation  from the gff3 file like:
> > 
> > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
> >    
> > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> > 
> > Thanks for your help.
> > 
> > 
> > Xianran Li
> ----- Original Message ----- 
> From: "Scott Cain" <cain.cshl at gmail.com>
> To: "Xianran Li" <xianranli78 at yahoo.com.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 18, 2007 9:10 PM
> Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file
> 
> 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l&#0;??i??'?????h??&
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/86cf671f/attachment-0002.bin>

From acutter at eeb.utoronto.ca  Thu Jul 19 22:25:08 2007
From: acutter at eeb.utoronto.ca (Asher Cutter)
Date: Thu, 19 Jul 2007 22:25:08 -0400
Subject: [Bioperl-l] tree comparisons with bioperl
Message-ID: <46A01D04.5040209@eeb.utoronto.ca>

I was reading over the functions for working with trees in bioperl. I am 
looking for something that will compare two topologies and report back 
if they are equivalent. i.e. something like:

does ((a,(b,c)) == ((A,B),C) ? (in this case, no)

But of course in reality they would be more complicated topologies. This 
would be useful for simulating random trees to compare with some given 
topology of interest.

I saw the methods for testing for monophyly and paraphyly, but not much 
beyond that...perhaps I have missed something?

Any suggestions?

Thanks,
Asher

-- 

___________________________________
Asher D. Cutter
Assistant Professor
Department of Ecology & Evolutionary Biology
University of Toronto
25 Harbord St.
Toronto, ON, M5S 3G5

tel: 416-978-4602
email: acutter at eeb.utoronto.ca
http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130
___________________________________


From jeevitesh at ibab.ac.in  Fri Jul 20 00:25:22 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From n.haigh at sheffield.ac.uk  Sun Jul 22 07:34:58 2007
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sun, 22 Jul 2007 12:34:58 +0100
Subject: [Bioperl-l] Ohloh account for bioperl
In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
Message-ID: <46A340E2.4040505@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Chervitz wrote:
> I came across a web app that tracks various metrics for open source
> projects, noticed that bioperl wasn't listed, and added it:
> 
> http://www.ohloh.net/projects/6685
> 
> Seems like an interesting resource that could help add some
> visibility. It creates metrics by directly processing the source code
> repository. I hooked it up to the CVS repos for bioperl-live, -db,
> -run, and -pipeline. It has yet to do its analysis at this point.
> 
> Feel free to create Ohloh accounts for yourselves. When you add
> yourself as a contributor to Bioperl, you can indicate the username
> associated with your commits, but this requires that it first process
> the commit logs to figure out what the usernames are. You can still
> create an account, just update it later with your username.
> 
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Nice to see the graphs of number of commits each developer has made over
the last 5 years and how new developers have arisen while those more
"seasoned" developers can relax a little more -proof of an excellent
open source project!

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO
4JWvG5Gy+H/UqpeXYAcSCX0=
=LrFt
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Sun Jul 22 23:53:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 22 Jul 2007 22:53:48 -0500
Subject: [Bioperl-l] run megablast
In-Reply-To: <m1r6n66or9.fsf@tuebingen.mpg.de>
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>

StandAloneBlast runs the megablast executable directly, though I  
think you can specify a MegaBlast search using blastall with the '-n'  
flag.

We could probably add this functionality in fairly easily since  
SearchIO can parse megablast output; no one's had the need to code it  
yet.

chris

On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:

>
> Hi,
>
> is there a module to run megablast in a script (equivalent to ncbi
> blast in StandAloneBlast.pm)?
>
> Cheers,
>
> Georg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jeevitesh at ibab.ac.in  Mon Jul 23 06:34:36 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6.

We need to find the shared distance as said above.

Kindly helps us it will help our research a lot.

With Thanks & regards
jeevitesh


From bix at sendu.me.uk  Mon Jul 23 07:08:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 23 Jul 2007 12:08:23 +0100
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared
	Distance
In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
Message-ID: <46A48C27.6060905@sendu.me.uk>

jeevitesh at ibab.ac.in wrote:
> Hi Friends,
> 
> We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
> A TREE.

Please stop sending this message. We heard you the first time. If no one 
answered, either no one knows the answer or no one understood you.


> The Distance method of TreeIO in Bioperl module gives the total distance.
> 
> But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
> illustrated
> in figure.
> 
> Suppose we have a tree
>     A                C
>      \              /
>       \2          2/
>        \__________/
>        /    6     \
>       /2          2\
>      /              \
>     B                D
> 
> The shared path between AB and AC is 2.
> and for AC and BD the shared path is 6.

I don't follow. But if you already know how to work the answer out, 
describe the algorithm in words and maybe someone can code it up for you.


From georg.otto at tuebingen.mpg.de  Mon Jul 23 09:56:46 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Mon, 23 Jul 2007 15:56:46 +0200
Subject: [Bioperl-l] run megablast
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
	<1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>
Message-ID: <m11weznrz5.fsf@tuebingen.mpg.de>

Thanks a lot! I guess I should have read the blast documentation more
carefully....

Best,

Georg

Chris Fields <cjfields at uiuc.edu> writes:
> StandAloneBlast runs the megablast executable directly, though I  
> think you can specify a MegaBlast search using blastall with the '-n'  
> flag.
>
> We could probably add this functionality in fairly easily since  
> SearchIO can parse megablast output; no one's had the need to code it  
> yet.
>
> chris
>
> On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:
>
>>
>> Hi,
>>
>> is there a module to run megablast in a script (equivalent to ncbi
>> blast in StandAloneBlast.pm)?
>>
>> Cheers,
>>
>> Georg
>>


From cjfields at uiuc.edu  Mon Jul 23 11:41:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Jul 2007 10:41:35 -0500
Subject: [Bioperl-l] Bio::Assembly bug/feature?
Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu>

To all:

I think I have found a major problem with Bio::Assembly; this was  
first noticed on Mac OS X in relation to bug 2320 and  
Bio::Assembly::IO.  I am uncertain whether this is meant to be a  
feature or a bug but it certainly needs to be documented or fixed as  
it leads to subtle errors.  I also can't see the advantage of this  
approach, but maybe I can be enlightened?  Either way, I think it's  
worth a discussion for those willing to follow.  I'll add as a bug  
later if needed.

A bit of background: each instance of a Bio::Assembly::Contig has a  
Bio::SeqFeature::Collection instance attached to it; each  
Bio::SeqFeature::Collection itself has a tied DB_File handle attached  
which remains open during the lifetime of the Bio::SF::Collection  
object.  When using Bio::Assembly one adds the various Contig objects  
to a Bio::Assembly::Scaffold.  So, for instance, if one had ~1000  
Contigs in a Scaffold, one would also have ~1000 open tied db  
handles, one per Contig instance.  So far, so good.

Unfortunately, when adding a ton of Contig objects to a  
Bio::Assembly::Scaffold one can run into a host of system-dependent  
issues based on resource usage limits (as one might expect).  This  
script:

------------------------------
use Bio::Assembly::Scaffold;
use Bio::Assembly::Contig;
use Bio::SeqFeature::Generic;

my $scaffold = Bio::Assembly::Scaffold->new();

for my $id (1..15000) {
     print "Contig #$id\n";
     my $contig = Bio::Assembly::Contig->new(-id => $id);
     my $feat = Bio::SeqFeature::Generic->new(-start=>1,
                                            -end=>10,
                                            -strand=>1);
     $contig->add_features([$feat]);
     $scaffold->add_contig($contig);
}
------------------------------

may fail on Mac OS X when one reaches the maximum number of open file  
descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - 
n'); the call to tie the DB_File handle in SF::Collection fails  
silently, so later on when called on you get the following:

...
Contig #251
Contig #252
Contig #253
Contig #254
Can't call method "put" on an undefined value at /Users/cjfields/src/ 
bioperl-live/Bio/SeqFeature/Collection.pm line 225.

I have added an exception to catch this.  On Mac OS X you can  
increase the file descriptor limit using ulimit, at least to a  
certain point.  However, when testing this out on dev.open-bio.org  
(Linux) the 'tie' sometimes fails (and the exception pops up), but it  
isn't dependent on 'ulimit -n'.  This is what happens more often:

...
Contig #10567
Contig #10568
Contig #10569
Contig #10570
Out of memory!

Sometimes followed by a seg fault.  Ick!

Any ideas? For instance, should we set this up so that one  
SF::Collection is used for all the Contigs (since each one has a  
unique ID anyway)?  Leave as is and document/track the issue as a  
bug?  Both?

chris


From ba6450 at wayne.edu  Mon Jul 23 16:06:14 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu>

Hello everyone:

I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:

[code]
use Bio::Tools::Run::Phylo::PAML::Codeml;
use Bio::AlignIO;
use Bio::TreeIO;

my $alignio = Bio::AlignIO->new(-format => 'phylip',
			         -file   => 'NM_000034.CDSalign.paml');

my $aln = $alignio->next_aln;

my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
my $tree   = $treeio->next_tree;

my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();

$codeml->alignment($aln);
$codeml->tree($tree);

my ($rc,$parser) = $codeml->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();
print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
[/code]

It gives the following error when I try to compile:

[error]
------------ EXCEPTION: Bio::Root::Exception -------------
MSG: unable to find or run executable for 'codeml'
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
-----------------------------------------------------------
Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
[/error]

Any idea, guys?

Munirul Islam
Phd Student
Computer Science
Wayne State University


From arareko at campus.iztacala.unam.mx  Mon Jul 23 17:19:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 23 Jul 2007 16:19:24 -0500
Subject: [Bioperl-l] error running codeml
In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx>

Apparently, your script isn't able to locate the codeml executable in 
your Windows environment. Do you have the PAML package installed? 
Instructions on how to install it are located here:

http://abacus.gene.ucl.ac.uk/software/paml.html

Regards,
Mauricio.

Munirul Islam wrote:
> Hello everyone:
> 
> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
> 
> [code]
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::AlignIO;
> use Bio::TreeIO;
> 
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
> 			         -file   => 'NM_000034.CDSalign.paml');
> 
> my $aln = $alignio->next_aln;
> 
> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> my $tree   = $treeio->next_tree;
> 
> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> 
> $codeml->alignment($aln);
> $codeml->tree($tree);
> 
> my ($rc,$parser) = $codeml->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> [/code]
> 
> It gives the following error when I try to compile:
> 
> [error]
> ------------ EXCEPTION: Bio::Root::Exception -------------
> MSG: unable to find or run executable for 'codeml'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> -----------------------------------------------------------
> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> [/error]
> 
> Any idea, guys?
> 
> Munirul Islam
> Phd Student
> Computer Science
> Wayne State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From ba6450 at wayne.edu  Mon Jul 23 19:53:22 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu>

Thanks Mauricio. 

I needed to add an environment variable for the paml directiory. 

$ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; 

One question ... I would like to save the temp files.  So, what modification do I need to make such that 
$obj->save_tempfiles returns 1 within codeml.pm? 

Regards 

Munir

---- Original message ----
>Date: Mon, 23 Jul 2007 16:19:24 -0500
>From: Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>  
>Subject: Re: [Bioperl-l] error running codeml  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Apparently, your script isn't able to locate the codeml executable in 
>your Windows environment. Do you have the PAML package installed? 
>Instructions on how to install it are located here:
>
>http://abacus.gene.ucl.ac.uk/software/paml.html
>
>Regards,
>Mauricio.
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
>> 
>> [code]
>> use Bio::Tools::Run::Phylo::PAML::Codeml;
>> use Bio::AlignIO;
>> use Bio::TreeIO;
>> 
>> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>> 			         -file   => 'NM_000034.CDSalign.paml');
>> 
>> my $aln = $alignio->next_aln;
>> 
>> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
>> my $tree   = $treeio->next_tree;
>> 
>> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
>> 
>> $codeml->alignment($aln);
>> $codeml->tree($tree);
>> 
>> my ($rc,$parser) = $codeml->run();
>> my $result = $parser->next_result;
>> my $MLmatrix = $result->get_MLmatrix();
>> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
>> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
>> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
>> [/code]
>> 
>> It gives the following error when I try to compile:
>> 
>> [error]
>> ------------ EXCEPTION: Bio::Root::Exception -------------
>> MSG: unable to find or run executable for 'codeml'
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
>> -----------------------------------------------------------
>> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
>> [/error]
>> 
>> Any idea, guys?
>> 
>> Munirul Islam
>> Phd Student
>> Computer Science
>> Wayne State University
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>
>-- 
>MAURICIO HERRERA CUADRA
>arareko at campus.iztacala.unam.mx
>Laboratorio de Gen?tica
>Unidad de Morfofisiolog?a y Funci?n
>Facultad de Estudios Superiores Iztacala, UNAM
>


From jason at bioperl.org  Tue Jul 24 03:19:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Jul 2007 09:19:18 +0200
Subject: [Bioperl-l] error running codeml
In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
	<46A51B5C.9080808@campus.iztacala.unam.mx>
Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com>

when you initialize the Codeml object just pass in my $codeml =
Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1);

OR do
$codeml->save_tempfiles(1);

You may want to set you TEMPDIR as well and you print out where the tempdir
is located with
print $codeml->tempdir;
and I think you can get the temp outfile.
my $name = $codeml->outfile_name;
print "name is $name\n";

-jason
On 7/23/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
>
> Apparently, your script isn't able to locate the codeml executable in
> your Windows environment. Do you have the PAML package installed?
> Instructions on how to install it are located here:
>
> http://abacus.gene.ucl.ac.uk/software/paml.html
>
> Regards,
> Mauricio.
>
>
> Munirul Islam wrote:
> > Hello everyone:
> >
> > I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is
> the code:
> >
> > [code]
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::AlignIO;
> > use Bio::TreeIO;
> >
> > my $alignio = Bio::AlignIO->new(-format => 'phylip',
> >                                -file   => 'NM_000034.CDSalign.paml');
> >
> > my $aln = $alignio->next_aln;
> >
> > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> > my $tree   = $treeio->next_tree;
> >
> > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> >
> > $codeml->alignment($aln);
> > $codeml->tree($tree);
> >
> > my ($rc,$parser) = $codeml->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> > [/code]
> >
> > It gives the following error when I try to compile:
> >
> > [error]
> > ------------ EXCEPTION: Bio::Root::Exception -------------
> > MSG: unable to find or run executable for 'codeml'
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> > -----------------------------------------------------------
> > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI
> (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> > [/error]
> >
> > Any idea, guys?
> >
> > Munirul Islam
> > Phd Student
> > Computer Science
> > Wayne State University
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Tue Jul 24 17:16:54 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu>

Hello everyone:

I am having problem loading a sequence file from within a directory.  

#############################################################
$dirname = "rundir";
opendir (DIR, $dirname) || die("can't open $dirname");
      
while (defined($file = readdir(DIR))) {
    next if $file =~ /^\.\.?$/;		# skip . and ..
    $abs_path = File::Spec->rel2abs( $file ) ;
    
    # gives a file not found exception for the following code
    my $alignio = Bio::AlignIO->new(-format => 'nexus',
				-file   => $abs_path);
    my $aln = $alignio->next_aln;
    @sequencenames -> $aln->_read_taxlabels;
	  		
    foreach $taxa (@sequencenames) {
	print $taxa . "\n";
    } 		
}        
#############################################################

Your suggestions please.

Regards,

Munirul Islam
PhD Student
Computer Science
Wayne State University
Detroit, Michigan, USA


From bix at sendu.me.uk  Tue Jul 24 18:39:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Jul 2007 23:39:33 +0100
Subject: [Bioperl-l] error loading sequence
In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu>
References: <20070724171654.EEX04380@mirapointms6.wayne.edu>
Message-ID: <46A67FA5.3070505@sendu.me.uk>

Munirul Islam wrote:
> Hello everyone:
> 
> I am having problem loading a sequence file from within a directory.  
> 
> #############################################################
> $dirname = "rundir";
> opendir (DIR, $dirname) || die("can't open $dirname");
>       
> while (defined($file = readdir(DIR))) {
>     next if $file =~ /^\.\.?$/;		# skip . and ..
>     $abs_path = File::Spec->rel2abs( $file ) ;
>     
>     # gives a file not found exception for the following code

This isn't a Bioperl problem. You're using the wrong File::Spec method. 
You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Tue Jul 24 20:10:04 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu>

Thanks.  That worked nicely.  I need your suggestion to load codeml control data from a file.  Consider the following code:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params =>	{'noisy' => 9,
		 'verbose' => 2,
		 'runmode' => 0,
		 'seqtype' => 1,
		 'CodonFreq' => 2,
		 'aaDist' => 0,
		 'model' => 2,
		 'NSsites' => 2,
		 'icode' => 0	});
-------------------------------------------------------------

Tried to modify it by passing a hash reference after loading data from a file.:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params => \%hashlist );
-------------------------------------------------------------

Still that didn't work.  Your suggestions pls.

Munir

---- Original message ----
>Date: Tue, 24 Jul 2007 23:39:33 +0100
>From: Sendu Bala <bix at sendu.me.uk>  
>Subject: Re: [Bioperl-l] error loading sequence  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am having problem loading a sequence file from within a directory.  
>> 
>> #############################################################
>> $dirname = "rundir";
>> opendir (DIR, $dirname) || die("can't open $dirname");
>>       
>> while (defined($file = readdir(DIR))) {
>>     next if $file =~ /^\.\.?$/;		# skip . and ..
>>     $abs_path = File::Spec->rel2abs( $file ) ;
>>     
>>     # gives a file not found exception for the following code
>
>This isn't a Bioperl problem. You're using the wrong File::Spec method. 
>You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Thu Jul 26 15:21:20 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT)
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu>

Hello Everyone:

I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.

my $alignio = Bio::AlignIO->new(-format => 'phylip',
				-file   => 'seq.txt');

I guess its not in valid phylip format.

I tried to change 'seq.txt' to sequential format.  Still that didn't work.

Any suggestions on how to load 'seq.txt' in bioperl?  

Thanks,

Munir
PhD Student
Computer Science
Wayne State University
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0002.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq.out
Type: application/octet-stream
Size: 24318 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0002.obj>

From jason at bioperl.org  Thu Jul 26 20:12:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 17:12:03 -0700
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu>
References: <20070726152120.EFA94600@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com>

You can try and pass in -interleaved => 0 as another option when you
init your AlignIO object.

On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
> Hello Everyone:
>
> I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.
>
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>                                 -file   => 'seq.txt');
>
> I guess its not in valid phylip format.
>
> I tried to change 'seq.txt' to sequential format.  Still that didn't work.
>
> Any suggestions on how to load 'seq.txt' in bioperl?
>
> Thanks,
>
> Munir
> PhD Student
> Computer Science
> Wayne State University
>
>      11     2202
>
> human
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> chimp
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> macaca
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG
> CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC
> GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC
> ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT
> ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG
> CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC
> GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG ---
> --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG
> CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG
> AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> mouse
> GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC
> ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG
> CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA
> AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA
> GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC
> TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG
> GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC
> TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC
> GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC
> CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG
> TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC
> CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC
> CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC
> TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT
> TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG
> AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA
> AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC
> ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC
> TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG
> TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT ---
> --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG
> CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT
> GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG
> AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC
> TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC
> TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG
> GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT
> rat
> GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG
> CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA
> AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA
> GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC
> TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC
> TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC
> GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC
> CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA
> TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT
> CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT
> CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC
> TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT
> TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG
> CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA
> AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC
> ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG
> TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT ---
> --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG
> CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG
> AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC
> TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC
> TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG
> GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT
> rabbit
> GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG
> AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC
> ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG
> CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC
> CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG
> GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC
> TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC
> CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG
> TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC
> CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC
> GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC
> TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA
> GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC
> TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG
> CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT
> --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG
> ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT
> ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG
> TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA
> GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG ---
> --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG
> CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG
> GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC
> AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG
> GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC
> ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG
> GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT
> dog
> GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG
> AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC
> ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG
> CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC
> TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT
> GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC
> TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT
> CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT
> GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC
> CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC
> CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC
> CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC
> ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC
> TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT
> TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG
> CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA
> CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC
> ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC
> ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG
> CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC
> AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG ---
> --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT
> GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT
> AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG
> GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC
> ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG
> GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT
> cow
> GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA
> CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC
> ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG
> CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG
> AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG
> GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC
> CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG
> ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT
> GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC
> TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT
> CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC
> TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC
> GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG
> TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC
> TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC
> ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG
> CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA
> CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC
> ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC
> ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC
> CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT
> AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG ---
> --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG
> TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT
> AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG
> GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC
> ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC
> TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG
> GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT
> elephant
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
> --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC
> ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA
> AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG
> GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG
> ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG
> GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC
> TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG
> TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC
> TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC
> GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC
> CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC
> CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN ---
> --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- ---
> --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN
> NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- ---
> --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN ---
> --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN
> NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG
> GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC
> ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT
> opossum
> GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA
> --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC
> ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA
> AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC
> GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG
> GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC
> CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG
> ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG
> ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT
> TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT
> CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC
> TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC
> CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC
> TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC
> CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC
> CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC
> ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC
> TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA
> GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC
> TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG
> CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG
> GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC
> AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC
> ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC
> ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG
> CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT
> CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC ---
> --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG
> CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA
> GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG
> CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC
> AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA
> GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC
> ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC
> TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG
> GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- ---
> chicken
> GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG
> --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC
> ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG
> CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG
> GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG
> GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC
> CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC
> ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC
> AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC
> TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT
> CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC
> TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT
> GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC
> CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC
> TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT
> CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC
> CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC
> ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC
> TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA
> GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC
> TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG
> CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC
> ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG ---
> --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- ---
> --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG
> GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- ---
> CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC
> AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC
> TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC
> CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG
> GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG
> TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC
> AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG
> GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC
> GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC
> TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG
> GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Thu Jul 26 21:20:11 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT)
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu>

Thanks.  The error is removed now.

I have a question.  Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file?

Munir

---- Original message ----
>Date: Thu, 26 Jul 2007 17:12:03 -0700
>From: "Jason Stajich" <jason at bioperl.org>  
>Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl)  
>To: "Munirul Islam" <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>You can try and pass in -interleaved => 0 as another option when you
>init your AlignIO object.
>


From jason at bioperl.org  Fri Jul 27 00:28:36 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 21:28:36 -0700
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu>
References: <20070726212011.EFB49252@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com>

Have you tried reading the documentation for the Bio::SimpleAlign object?

for my $seq ( $aln->each_seq ) {
 print $seq->display_id, "\n";
}

I'd appreciate if you added some of your questions with the answers to the
FAQ or to other places on the wiki so that other people can benefit from
your learning here.


On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
>
> Thanks.  The error is removed now.
>
> I have a question.  Is there any function that I can use to get the
> sequence list (human, chimp, etc.) after loading an alignment from file?
>
> Munir
>
> ---- Original message ----
> >Date: Thu, 26 Jul 2007 17:12:03 -0700
> >From: "Jason Stajich" <jason at bioperl.org>
> >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in
> bioperl)
> >To: "Munirul Islam" <ba6450 at wayne.edu>
> >Cc: bioperl-l at lists.open-bio.org
> >
> >You can try and pass in -interleaved => 0 as another option when you
> >init your AlignIO object.
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From arareko at campus.iztacala.unam.mx  Fri Jul 27 11:18:55 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 10:18:55 -0500
Subject: [Bioperl-l] Perl Survey 2007
Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx>

It really takes about 5 minutes:

http://perlsurvey.org/

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From dhoworth at mrc-lmb.cam.ac.uk  Fri Jul 27 12:07:17 2007
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Fri, 27 Jul 2007 17:07:17 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk>

Mauricio Herrera Cuadra wrote:
> It really takes about 5 minutes:
> http://perlsurvey.org/

and gives all your personal information including email address to
anybody who cares to snoop the HTTP POST message! So there's definitely
no anonymity.

Cheers, Dave


From spiros at lokku.com  Fri Jul 27 12:38:57 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Fri, 27 Jul 2007 17:38:57 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
Message-ID: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>

On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
> Mauricio Herrera Cuadra wrote:
> > It really takes about 5 minutes:
> > http://perlsurvey.org/
>
> and gives all your personal information including email address to
> anybody who cares to snoop the HTTP POST message! So there's definitely
> no anonymity.

Not to mention that it requires registration (?). Who is behind the
survey ? I am on a number of Perl and Perl related lists and haven't
seen it being mentioned.

Spiros


From arareko at campus.iztacala.unam.mx  Fri Jul 27 13:37:31 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 12:37:31 -0500
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
	<bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx>

Spiros Denaxas wrote:
> On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
>> Mauricio Herrera Cuadra wrote:
>>> It really takes about 5 minutes:
>>> http://perlsurvey.org/
>> and gives all your personal information including email address to
>> anybody who cares to snoop the HTTP POST message! So there's definitely
>> no anonymity.

I didn't provided any personal information other than my country and 
birthyear. As for my email, I always use the one I have for all the SPAM 
I'd like to subscribe to :)

> Not to mention that it requires registration (?). Who is behind the
> survey ? I am on a number of Perl and Perl related lists and haven't
> seen it being mentioned.

Registration is rather different from confirming your email (which 
prevents filling the DB multiple times by spambots/yourself, thus 
screwing the survey). Who's behind it, its purpose, privacy, etc., 
please read the FAQ:

http://perlsurvey.org/faq/

Cheers,
Mauricio.

> Spiros
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From Alicia.Amadoz at uv.es  Mon Jul 30 11:46:57 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
Message-ID: <1245168492amadoz@uv.es>

Hi, i'm trying to run a bioperl script in linux with standaloneblast
from a webserver but I have the following error:

-------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------

I have tried several things to fix it as setting some environment
variables both directly through the shell and adding some code in my
script with,

BEGIN {
$ENV{PATH} .= ':/usr/local/blast-2.2.16';
$ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; 
$ENV{BLASTDATADIR} = '/usr/local/data/';
}

and with,

$local->executable('/usr/local/bin');
my $blast_report = $local->blastall($inputfilename); 

I have also checked that the webserver has permission of read and
execute in all blast executables and directories. But trying all of
these things it keeps showing the same error above.

Any more idea to solve this problem? My script works well when I use it
as a simply script and I've reboot the system several times when changes
where performed. 

Thanks to anyone who will be able to help me!
Regards,
Alicia


From gyang at plantbio.uga.edu  Mon Jul 30 16:58:51 2007
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 30 Jul 2007 16:58:51 -0400
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>

I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this?  
Thanks a lot,  
   

Guojun Yang
University of Georgia
  
   
From grafman at graphcomp.com  Sun Jul 29 17:08:04 2007
From: grafman at graphcomp.com (Grafman Productions)
Date: Sun, 29 Jul 2007 14:08:04 -0700
Subject: [Bioperl-l] Perl 3D OpenGL
Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>

If this posting is inappropriate, please let me know - my apologies.

I recently came across an article on BioPerl, and it occurred to me that 
there might be some need for 3D rendering within your BioPerl project.

I released a number of new/updated Perl OpenGL (POGL) modules this year, 
along with benchmarks that demonstrate that it performs comparably to C.

If there's a need for 3D features within BioPerl, and if I can be of any 
assistance in helping to add such features, I would enjoy the opportunity. 


From torsten.seemann at infotech.monash.edu.au  Mon Jul 30 19:27:46 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 09:27:46 +1000
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <1245168492amadoz@uv.es>
References: <1245168492amadoz@uv.es>
Message-ID: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>

Alicia,

> Hi, i'm trying to run a bioperl script in linux with standaloneblast
> from a webserver but I have the following error:
> -------------------- WARNING ---------------------
> MSG: cannot find path to blastall
> ---------------------------------------------------
> $ENV{BLASTDATADIR} = '/usr/local/data/';
> $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';

I think the last one (or two) paths should be
'/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
BLAST installation is where the 'blastall' binary actually lives.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From cjfields at uiuc.edu  Mon Jul 30 20:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Jul 2007 19:53:45 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
Message-ID: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>


On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:

> I am running remoteblast and using readmethod "xml", I noticed that  
> it is printing the output repeatedly nonstop. It's like in a loop.  
> Did anybody notice this before? Can anybody help me getting out of  
> this?
> Thanks a lot,
>
>
> Guojun Yang
> University of Georgia

Not seeing that using bioperl-live; you may need to update  
RemoteBlast.pm as this sounds similar to an issue that popped up  
earlier in the spring.

chris


From torsten.seemann at infotech.monash.edu.au  Tue Jul 31 02:24:34 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 16:24:34 +1000
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
	<FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
Message-ID: <a79f6a4b0707302324t261687e7g1012e1f536500c09@mail.gmail.com>

> as this sounds similar to an issue that popped up
> earlier in the spring.

I could have sworn it was autumn! ;-)

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From Alicia.Amadoz at uv.es  Tue Jul 31 06:11:54 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
References: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
Message-ID: <2361686267amadoz@uv.es>

Hi, I tried what you suggested and that was it, it works perfectly.
Thank you very much. 

Regards,
Alicia

> Alicia,
> 
> > Hi, i'm trying to run a bioperl script in linux with standaloneblast
> > from a webserver but I have the following error:
> > -------------------- WARNING ---------------------
> > MSG: cannot find path to blastall
> > ---------------------------------------------------
> > $ENV{BLASTDATADIR} = '/usr/local/data/';
> > $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';
> 
> I think the last one (or two) paths should be
> '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
> BLAST installation is where the 'blastall' binary actually lives.
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> 
> 


From jay at jays.net  Tue Jul 31 08:00:56 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 31 Jul 2007 07:00:56 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>

On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
> If this posting is inappropriate, please let me know - my apologies.

Not at all. AFAIK this is the perfect place to discuss any  
contributions you're motivated to make to the BioPerl project.

> I recently came across an article on BioPerl, and it occurred to me  
> that
> there might be some need for 3D rendering within your BioPerl project.
>
> I released a number of new/updated Perl OpenGL (POGL) modules this  
> year,
> along with benchmarks that demonstrate that it performs comparably  
> to C.
>
> If there's a need for 3D features within BioPerl, and if I can be  
> of any
> assistance in helping to add such features, I would enjoy the  
> opportunity.

I know nothing about 3D modeling in biology, nor do I hang out with  
any protein structure folks, but 3D always sounds sexy. -grin-

If you're new to bioinformatics (I certainly am) you might want to  
read this:

   http://en.wikipedia.org/wiki/Protein_structure

Because that's probably where your 3D work would be used. Especially  
note the "Software" section, where you'll find some of the  
"competition".  :)

There's some cool stuff out there. I don't know what all would or  
wouldn't be time well spent in Perl / BioPerl.

HTH,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at uiuc.edu  Tue Jul 31 12:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 11:51:42 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu>

Make sure to keep responses on the ail list.

You might want to run a full install, just in case.  If I remember  
correctly Sendu made some changes a while back in the BLAST-related  
modules which may be related to this.  At the very least install/ 
upgrade all modules in Bio::Tools::Run.

chris

On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote:

> Thanks, Chris,
> But when I replaced the old RemoteBlast.pm with the new one, I got  
> "can't locate the object method "retrieve_parameter"". Does this  
> mean I need to install something else?
> Guojun
>
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast  
> with xml
>
>
>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:
>>>> I am running remoteblast and using readmethod "xml", I noticed that
>>> it is printing the output repeatedly nonstop. It's like in a loop.
>>> Did anybody notice this before? Can anybody help me getting out of
>>> this?
>>> Thanks a lot,
>>>
>>>
>>> Guojun Yang
>>> University of Georgia
>>> Not seeing that using bioperl-live; you may need to update
>> RemoteBlast.pm as this sounds similar to an issue that popped up
>> earlier in the spring.
>>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Jul 31 22:15:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 21:15:45 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
	<25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu>


On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote:

> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
>> If this posting is inappropriate, please let me know - my apologies.
>
> Not at all. AFAIK this is the perfect place to discuss any
> contributions you're motivated to make to the BioPerl project.
>
>> I recently came across an article on BioPerl, and it occurred to me
>> that
>> there might be some need for 3D rendering within your BioPerl  
>> project.
>>
>> I released a number of new/updated Perl OpenGL (POGL) modules this
>> year,
>> along with benchmarks that demonstrate that it performs comparably
>> to C.
>>
>> If there's a need for 3D features within BioPerl, and if I can be
>> of any
>> assistance in helping to add such features, I would enjoy the
>> opportunity.
>
> I know nothing about 3D modeling in biology, nor do I hang out with
> any protein structure folks, but 3D always sounds sexy. -grin-
>
> If you're new to bioinformatics (I certainly am) you might want to
> read this:
>
>    http://en.wikipedia.org/wiki/Protein_structure
>
> Because that's probably where your 3D work would be used. Especially
> note the "Software" section, where you'll find some of the
> "competition".  :)
>
> There's some cool stuff out there. I don't know what all would or
> wouldn't be time well spent in Perl / BioPerl.
>
> HTH,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah

I agree that protein structure is the best place for something like  
this.

It's a wide open area as far as I'm concerned; in fact I would say  
that Bio::Structure is getting pretty dated, so if anyone wants to  
take it over, refactor the code, and so on I don't have a problem.

chris


From dmessina at wustl.edu  Sun Jul  1 01:38:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Jul 2007 00:38:48 -0500
Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn
	repository]
In-Reply-To: <46869226.70203@sheffield.ac.uk>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>	<4673C7CB.1030709@mail.nih.gov>	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>	<18049.30026.61328.134490@almost.alerce.com>	<5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu>	<BFBA575A-E653-40F6-9242-D72655B6AE9C@wustl.edu>	<E83D9D3C-96F2-4B5A-B503-09C3860586D0@gmx.net>	<D7111143-D173-42DE-AAEF-C2365AA453A0@wustl.edu>	<18051.44281.831316.749586@almost.alerce.com>	<F5B048F4-CBA5-493A-8A5C-2033709D8A63@wustl.edu>
	<18051.61992.627473.323346@almost.alerce.com>
	<4684AF3D.5090907@sheffield.ac.uk>
	<843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu>
	<468628AC.9060200@sheffield.ac.uk>
	<461F64B9-87FD-458A-8945-8238E7076109@wustl.edu>
	<46869226.70203@sheffield.ac.uk>
Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu>


> [Nath]
> I think the list of seq formats recognised by Bioperl in Bio::SeqIO  
> and
> Bio::AlignIO would be a good start. As these are likely to be the ones
> that are sensitive to file format recognition and thus could break  
> tests
> if renamed.

Sounds good to me. I will do a quick tour of the rest of the repo  
looking for other common or important file extensions, but I don't  
expect there to be many if any.


> [still Nath]
> I think a lot of people have used "." in file names as an  
> alternative to
> a space. I think it would be beneficial to use an underscore "_" in
> these cases and leave the "." to represent the beginning of the file
> extension.

That's a great idea.


> [Chris]
> Do we need to define every filetype extension, or can there be a  
> fallback (eg if it isn't on the list or has no extension it's plain  
> text)?

For every file that's added, svn takes a peek to see if it's human- 
readable. If not, it's tagged with the generic MIME type application/ 
octet-stream. (It does this so it knows not to try to do diffs and  
merges on a binary file.)

So the default for a human-readable file is no MIME type, which I  
believe is essentially the same thing as text/plain.

And then regardless of the outcome of svn's peek, any matching auto- 
props are then applied, overriding svn's choice.

So if we don't define every extension, I think we'll be fine. It'd be  
nice to have everything tagged with a MIME type, though. For one  
thing, Apache will use it to do the right thing when people browse  
the repo over the web. And two, because metadata is cool. :)

One more thing: in the course of reading up on this, I learned that  
my earlier expectation about multiple auto-prop matches was  
incorrect. It's true that multiple unrelated matches means that  
multiple properties are set on the file. But when a file matches  
multiple *conflicting* auto-property patterns, there's no telling  
which value it'll get.


Dave


From hartzell at alerce.com  Sun Jul  1 12:29:29 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 1 Jul 2007 09:29:29 -0700
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
	<E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
Message-ID: <18055.54889.677775.868974@almost.alerce.com>

Hilmar Lapp writes:
 > It turns out that both files are also present on the release-0-9-3,  
 > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add
 > 
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ 
 > HUMBETGLOA.fasta
 > 
 > to the post-processing commands.
 > [...]

Will do.  Thanks for working out the incantations!

g.


From cjfields at uiuc.edu  Mon Jul  2 09:26:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:26:06 -0500
Subject: [Bioperl-l] test data
Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>

I am planing on adding test data to cvs for eutils and have run  
across some stuff in bugzilla that needs to be added as well.

Should we, as convention, start adding data sequestered to a fold  
with the test name, within t/data?  This might make life easier in  
the long run (keep track of files, get rid of old files, etc), and  
may make it easier for wrapping up the correct data with tests if we  
start submitting single module CPAN updates.

chris


From cjfields at uiuc.edu  Mon Jul  2 09:52:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:52:27 -0500
Subject: [Bioperl-l] test data
In-Reply-To: <468901C1.8020505@sendu.me.uk>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>

On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am planing on adding test data to cvs for eutils and have run  
>> across some stuff in bugzilla that needs to be added as well.
>> Should we, as convention, start adding data sequestered to a fold  
>> with the test name, within t/data?
>
> I'd actually argue that this shouldn't be done: data is sometimes  
> reused amongst multiple different test scripts, and when looking  
> for data to reuse its easier to spot it in a single directory  
> compared to searching through multiple directories.
>
>
>> This might make life easier in the long run (keep track of files,  
>> get rid of old files, etc), and may make it easier for wrapping up  
>> the correct data with tests if we start submitting single module  
>> CPAN updates.
>
> I don't think that will be an issue. The automated process would  
> read the test script and see what input files it uses, copying  
> those into the archive. So, just be sure to standardise on using  
> test_input_file() to make that possible.
>
>
> That said, I wouldn't mind especially either way. Just don't do it  
> now, since test script names (and therefore the name of the  
> directory you'd want to store the input files in) might all change.
>
>
> In fact we can imagine that we have a test script t/ 
> BioZombieKitten.t which stores its test data in t/data/ 
> BioZombieKitten/input.file but the script gets the path to this  
> file by:
> my $input_file = test_input_file('input.file');
>
> test_input_file() is then implemented to look for the file in the  
> subdir of data corresponding to the script name if we're dealing  
> with the 900-modules-in-a-package checkout-type situation, but just  
> in t/data if we're in the one-module-in-a-package situation.
>
> In any case, things will be most flexible if you drop files  
> directly into t/data for now and reference them without any subdirs  
> in the call to test_input_file().

Fine by me, I just find it very cluttered.

BioZombieKitten?!?

chris


From bix at sendu.me.uk  Mon Jul  2 10:00:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 15:00:37 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
	<61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
Message-ID: <46890505.1070707@sendu.me.uk>

Chris Fields wrote:
> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:
> Fine by me, I just find it very cluttered.

Yes, I agree. I also wish we had a decent naming convention for files. 
(Ie. it would be nice to have a good idea what a file was for without 
having to study the test script that uses it.)


> BioZombieKitten?!?

I get Bio/perl/ and Bio/ware/ confused in my head ;)
http://forums.bioware.com/viewtopic.html?topic=562916&forum=84


From bix at sendu.me.uk  Mon Jul  2 09:46:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 14:46:41 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
Message-ID: <468901C1.8020505@sendu.me.uk>

Chris Fields wrote:
> I am planing on adding test data to cvs for eutils and have run across 
> some stuff in bugzilla that needs to be added as well.
> 
> Should we, as convention, start adding data sequestered to a fold with 
> the test name, within t/data?

I'd actually argue that this shouldn't be done: data is sometimes reused 
amongst multiple different test scripts, and when looking for data to 
reuse its easier to spot it in a single directory compared to searching 
through multiple directories.


> This might make life easier in the long 
> run (keep track of files, get rid of old files, etc), and may make it 
> easier for wrapping up the correct data with tests if we start 
> submitting single module CPAN updates.

I don't think that will be an issue. The automated process would read 
the test script and see what input files it uses, copying those into the 
archive. So, just be sure to standardise on using test_input_file() to 
make that possible.


That said, I wouldn't mind especially either way. Just don't do it now, 
since test script names (and therefore the name of the directory you'd 
want to store the input files in) might all change.


In fact we can imagine that we have a test script t/BioZombieKitten.t 
which stores its test data in t/data/BioZombieKitten/input.file but the 
script gets the path to this file by:
my $input_file = test_input_file('input.file');

test_input_file() is then implemented to look for the file in the subdir 
of data corresponding to the script name if we're dealing with the 
900-modules-in-a-package checkout-type situation, but just in t/data if 
we're in the one-module-in-a-package situation.

In any case, things will be most flexible if you drop files directly 
into t/data for now and reference them without any subdirs in the call 
to test_input_file().


From hlapp at gmx.net  Mon Jul  2 16:02:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 16:02:37 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>

Just FYI, after applying the changes I've been sending, I was able to  
check out the repository in its entirety.

	-hilmar

On Jun 30, 2007, at 8:48 PM, George Hartzell wrote:

>
> There's a second cut at the subversion repository.  I've done a better
> job of setting svn:keywords and svn:eol-style on various files.  The
> defaults were more cautious and I used an auto-props files based on
> the wiki version.
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>
> The old repository's still around as
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>
> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it, you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From wrp at virginia.edu  Mon Jul  2 16:08:04 2007
From: wrp at virginia.edu (William R. Pearson)
Date: Mon, 2 Jul 2007 16:08:04 -0400
Subject: [Bioperl-l] Course: Computational and Comparative Genomics
Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu>


Course announcement - Application deadline, July 15, 2007

================================================================

Cold Spring Harbor
COMPUTATIONAL & COMPARATIVE GENOMICS
November 7 - 13, 200
Application Deadline: July 15, 2007

INSTRUCTORS:

Pearson, William, Ph.D., University of Virginia, Charlottesville, VA
Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of
Prussia, PA

Beyond BLAST and FASTA - Alignment: from proteins to genomes - This
course presents a comprehensive overview of the theory and practice of
computational methods for extracting the maximum amount of information
from protein and DNA sequence similarity through sequence database
searches, statistical analysis, and multiple sequence alignment, and
genome scale alignment. Additional topics include gene finding,
dentifying signals in unaligned sequences, integration of genetic and
sequence information in biological databases.

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course makes extensive use of local WWW pages to present
problem sets and the computing tools to solve them. Students use
Windows and Mac workstations attached to a UNIX server.

The course is designed for biologists seeking advanced training in
biological sequence analysis, computational biology core resource
directors and staff, and for scientists in other disciplines, such as
computer science, who wish to survey current research problems in
biological sequence analysis and comparative genomics.

The primary focus of the Computational and Comparative Genomics Course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms. Cold Spring Harbor also offers a
"Programming for Biology" course, which focuses more on software
development.

For additional information and the lecture schedule and problem sets
for the 2006 course, see:

         http://fasta.bioch.virginia.edu/cshl06

================================================================

To apply to the course, fill out and send in the form at:

         http://meetings.cshl.edu/courses/courseapplication.asp

================================================================

Bill Pearson


From niels at genomics.dk  Mon Jul  2 16:45:07 2007
From: niels at genomics.dk (Niels Larsen)
Date: Mon, 02 Jul 2007 22:45:07 +0200
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
References: <18054.63942.316904.413911@almost.alerce.com>
	<F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
Message-ID: <468963D3.3000007@genomics.dk>

I write hoping someone could show me how to create a PrimarySeq
object without parsing features and all first. The lines below
return

"Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16."

whereas calling Bio::SeqIO-> gives no error, but a too big object.
The GenBank record after the __END__ is the "1.gb" file. I could not
find out how from the tutorial or the Bio::PrimarySeq description.

Niels L


#!/usr/bin/env perl

use strict;
use warnings FATAL => qw ( all );

use Data::Dumper;

use Bio::Seq;
use Bio::SeqIO;

my ( $seq_h, $seq );

$seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' );
# $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' );

$seq = $seq_h->next_seq();

# print Dumper( $seq );

__END__

LOCUS       X60065                     9 bp    mRNA    linear   MAM 14-NOV-2006
DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
ACCESSION   X60065 REGION: 1..9
VERSION     X60065.1  GI:5
KEYWORDS    beta-2 glycoprotein I.
SOURCE      Bos taurus (cattle)
   ORGANISM  Bos taurus
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
             Pecora; Bovidae; Bovinae; Bos.
REFERENCE   1
   AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and
             Kristensen,T.
   TITLE     Complete primary structure of bovine beta 2-glycoprotein I:
             localization of the disulfide bridges
   JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
    PUBMED   1567819
REFERENCE   2  (bases 1 to 9)
   AUTHORS   Kristensen,T.
   TITLE     Direct Submission
   JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology,
             University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C,
             DENMARK
FEATURES             Location/Qualifiers
      source          1..9
                      /organism="Bos taurus"
                      /mol_type="mRNA"
                      /db_xref="taxon:9913"
                      /clone="pBB2I"
                      /tissue_type="liver"
      gene            <1..>9
                      /gene="beta-2-gpI"
      CDS             <1..>9
                      /gene="beta-2-gpI"
                      /codon_start=1
                      /product="beta-2-glycoprotein I"
                      /protein_id="CAA42669.1"
                      /db_xref="GI:6"
                      /db_xref="GOA:P17690"
                      /db_xref="UniProtKB/Swiss-Prot:P17690"
                      /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
                      VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
                      ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
                      SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
                      PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
                      VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
                      DASDVKPC"
      sig_peptide     <1..>9
                      /gene="beta-2-gpI"
ORIGIN
         1 ccagcgctc
//


From Kevin.M.Brown at asu.edu  Mon Jul  2 17:35:12 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 2 Jul 2007 14:35:12 -0700
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <468963D3.3000007@genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>

Start by having a look at the following link:
http://bioperl.org/cgi-bin/deob_interface.cgi

SeqIO is how one reads or writes sequences to/from files.
Bio::PrimarySeq is just an object that holds information about a
sequence obtained from a file.

As for how to parse a Genbank file into a list of features:

$file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
while (my $seq = $file->next_seq())
{
	@features = $seq->all_SeqFeatures;
	# sort features by their primary tags
	for my $f (@features)
	{
		my $tag = $f->primary_tag;
		if ($tag eq 'CDS')
		{
			# @sorted_features holds all the Bio::PrimarySeq
features obtained from the genbank file
			push @sorted_features, $f; 
		}
	}
}
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Niels Larsen
> Sent: Monday, July 02, 2007 1:45 PM
> Cc: bioperl-l List
> Subject: [Bioperl-l] simple PrimarySeq question
> 
> I write hoping someone could show me how to create a 
> PrimarySeq object without parsing features and all first. The 
> lines below return
> 
> "Can't locate object method "next_seq" via package 
> "Bio::PrimarySeq" at ./tst2 line 16."
> 
> whereas calling Bio::SeqIO-> gives no error, but a too big object.
> The GenBank record after the __END__ is the "1.gb" file. I 
> could not find out how from the tutorial or the 
> Bio::PrimarySeq description.
> 
> Niels L
> 
> 
> #!/usr/bin/env perl
> 
> use strict;
> use warnings FATAL => qw ( all );
> 
> use Data::Dumper;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my ( $seq_h, $seq );
> 
> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 
> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", 
> -format => 'genbank' );
> 
> $seq = $seq_h->next_seq();
> 
> # print Dumper( $seq );
> 
> __END__
> 
> LOCUS       X60065                     9 bp    mRNA    linear 
>   MAM 14-NOV-2006
> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
> ACCESSION   X60065 REGION: 1..9
> VERSION     X60065.1  GI:5
> KEYWORDS    beta-2 glycoprotein I.
> SOURCE      Bos taurus (cattle)
>    ORGANISM  Bos taurus
>              Eukaryota; Metazoa; Chordata; Craniata; 
> Vertebrata; Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; 
> Cetartiodactyla; Ruminantia;
>              Pecora; Bovidae; Bovinae; Bos.
> REFERENCE   1
>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., 
> Sottrup-Jensen,L. and
>              Kristensen,T.
>    TITLE     Complete primary structure of bovine beta 
> 2-glycoprotein I:
>              localization of the disulfide bridges
>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>     PUBMED   1567819
> REFERENCE   2  (bases 1 to 9)
>    AUTHORS   Kristensen,T.
>    TITLE     Direct Submission
>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of 
> Mol Biology,
>              University of Aarhus, C F Mollers Alle 130, 
> DK-8000 Aarhus C,
>              DENMARK
> FEATURES             Location/Qualifiers
>       source          1..9
>                       /organism="Bos taurus"
>                       /mol_type="mRNA"
>                       /db_xref="taxon:9913"
>                       /clone="pBB2I"
>                       /tissue_type="liver"
>       gene            <1..>9
>                       /gene="beta-2-gpI"
>       CDS             <1..>9
>                       /gene="beta-2-gpI"
>                       /codon_start=1
>                       /product="beta-2-glycoprotein I"
>                       /protein_id="CAA42669.1"
>                       /db_xref="GI:6"
>                       /db_xref="GOA:P17690"
>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>                       
> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>                       
> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>                       
> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>                       
> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>                       
> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>                       
> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>                       DASDVKPC"
>       sig_peptide     <1..>9
>                       /gene="beta-2-gpI"
> ORIGIN
>          1 ccagcgctc
> //
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From niels at genomics.dk  Mon Jul  2 20:41:24 2007
From: niels at genomics.dk (niels at genomics.dk)
Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST)
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>

Kevin,

Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
gets entries from file, and from those large parsed entries I can get a
simplified primary_seq object. But the SeqIO object includes feature
and annotation objects etc that takes time to make, and I wish to know
if there is a way to get a primari_seq object without this overhead. I
apologize if I overlooked it in the docs.

Niels


> Start by having a look at the following link:
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> SeqIO is how one reads or writes sequences to/from files.
> Bio::PrimarySeq is just an object that holds information about a
> sequence obtained from a file.
>
> As for how to parse a Genbank file into a list of features:
>
> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
> while (my $seq = $file->next_seq())
> {
> 	@features = $seq->all_SeqFeatures;
> 	# sort features by their primary tags
> 	for my $f (@features)
> 	{
> 		my $tag = $f->primary_tag;
> 		if ($tag eq 'CDS')
> 		{
> 			# @sorted_features holds all the Bio::PrimarySeq
> features obtained from the genbank file
> 			push @sorted_features, $f;
> 		}
> 	}
> }
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Niels Larsen
>> Sent: Monday, July 02, 2007 1:45 PM
>> Cc: bioperl-l List
>> Subject: [Bioperl-l] simple PrimarySeq question
>>
>> I write hoping someone could show me how to create a
>> PrimarySeq object without parsing features and all first. The
>> lines below return
>>
>> "Can't locate object method "next_seq" via package
>> "Bio::PrimarySeq" at ./tst2 line 16."
>>
>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>> The GenBank record after the __END__ is the "1.gb" file. I
>> could not find out how from the tutorial or the
>> Bio::PrimarySeq description.
>>
>> Niels L
>>
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings FATAL => qw ( all );
>>
>> use Data::Dumper;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>>
>> my ( $seq_h, $seq );
>>
>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>> -format => 'genbank' );
>>
>> $seq = $seq_h->next_seq();
>>
>> # print Dumper( $seq );
>>
>> __END__
>>
>> LOCUS       X60065                     9 bp    mRNA    linear
>>   MAM 14-NOV-2006
>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>> ACCESSION   X60065 REGION: 1..9
>> VERSION     X60065.1  GI:5
>> KEYWORDS    beta-2 glycoprotein I.
>> SOURCE      Bos taurus (cattle)
>>    ORGANISM  Bos taurus
>>              Eukaryota; Metazoa; Chordata; Craniata;
>> Vertebrata; Euteleostomi;
>>              Mammalia; Eutheria; Laurasiatheria;
>> Cetartiodactyla; Ruminantia;
>>              Pecora; Bovidae; Bovinae; Bos.
>> REFERENCE   1
>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>> Sottrup-Jensen,L. and
>>              Kristensen,T.
>>    TITLE     Complete primary structure of bovine beta
>> 2-glycoprotein I:
>>              localization of the disulfide bridges
>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>     PUBMED   1567819
>> REFERENCE   2  (bases 1 to 9)
>>    AUTHORS   Kristensen,T.
>>    TITLE     Direct Submission
>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>> Mol Biology,
>>              University of Aarhus, C F Mollers Alle 130,
>> DK-8000 Aarhus C,
>>              DENMARK
>> FEATURES             Location/Qualifiers
>>       source          1..9
>>                       /organism="Bos taurus"
>>                       /mol_type="mRNA"
>>                       /db_xref="taxon:9913"
>>                       /clone="pBB2I"
>>                       /tissue_type="liver"
>>       gene            <1..>9
>>                       /gene="beta-2-gpI"
>>       CDS             <1..>9
>>                       /gene="beta-2-gpI"
>>                       /codon_start=1
>>                       /product="beta-2-glycoprotein I"
>>                       /protein_id="CAA42669.1"
>>                       /db_xref="GI:6"
>>                       /db_xref="GOA:P17690"
>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>
>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>
>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>
>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>
>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>
>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>
>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>                       DASDVKPC"
>>       sig_peptide     <1..>9
>>                       /gene="beta-2-gpI"
>> ORIGIN
>>          1 ccagcgctc
>> //
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From hlapp at gmx.net  Mon Jul  2 22:36:19 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 22:36:19 -0400
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
	<23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net>

Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have  
examples for what you want to do:

      use Bio::SeqIO;
      # usually you won't instantiate this yourself - a SeqIO object -
      # you will have one already
      my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
      my $builder = $seqin->sequence_builder();

      # if you need only sequence, id, and description (e.g. for
      # conversion to FASTA format):
      $builder->want_none();
      $builder->add_wanted_slot('display_id','desc','seq');

      # if you want everything except the sequence and features
      $builder->want_all(1); # this is the default if it's untouched
      $builder->add_unwanted_slot('seq','features');

Let us know if that doesn't answer your question.

Note that this is currently only implemented for Genbank format.

	-hilmar

On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:

> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can  
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> 	@features = $seq->all_SeqFeatures;
>> 	# sort features by their primary tags
>> 	for my $f (@features)
>> 	{
>> 		my $tag = $f->primary_tag;
>> 		if ($tag eq 'CDS')
>> 		{
>> 			# @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> 			push @sorted_features, $f;
>> 		}
>> 	}
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS       X60065                     9 bp    mRNA    linear
>>>   MAM 14-NOV-2006
>>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION   X60065 REGION: 1..9
>>> VERSION     X60065.1  GI:5
>>> KEYWORDS    beta-2 glycoprotein I.
>>> SOURCE      Bos taurus (cattle)
>>>    ORGANISM  Bos taurus
>>>              Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>              Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>>              Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE   1
>>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>>              Kristensen,T.
>>>    TITLE     Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>>              localization of the disulfide bridges
>>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>>     PUBMED   1567819
>>> REFERENCE   2  (bases 1 to 9)
>>>    AUTHORS   Kristensen,T.
>>>    TITLE     Direct Submission
>>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>>              University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>>              DENMARK
>>> FEATURES             Location/Qualifiers
>>>       source          1..9
>>>                       /organism="Bos taurus"
>>>                       /mol_type="mRNA"
>>>                       /db_xref="taxon:9913"
>>>                       /clone="pBB2I"
>>>                       /tissue_type="liver"
>>>       gene            <1..>9
>>>                       /gene="beta-2-gpI"
>>>       CDS             <1..>9
>>>                       /gene="beta-2-gpI"
>>>                       /codon_start=1
>>>                       /product="beta-2-glycoprotein I"
>>>                       /protein_id="CAA42669.1"
>>>                       /db_xref="GI:6"
>>>                       /db_xref="GOA:P17690"
>>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>>                       DASDVKPC"
>>>       sig_peptide     <1..>9
>>>                       /gene="beta-2-gpI"
>>> ORIGIN
>>>          1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ewijaya at gmail.com  Tue Jul  3 02:56:30 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 14:56:30 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at gmail.com  Tue Jul  3 03:00:16 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 15:00:16 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at i2r.a-star.edu.sg  Tue Jul  3 02:35:12 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 3 Jul 2007 14:35:12 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net>

 
Dear all, 
I was trying to perform check with this command:
 
$ perl -MGD -e 'print $GD::VERSION';

And it gave: 
 
GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

 
I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
 
Can anybody suggest how can I resolve my problem?
 
This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi
 
--
Edward

------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.--------------------------------------------------------


From lstein at cshl.edu  Tue Jul  3 10:41:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 3 Jul 2007 10:40:26 -0401
Subject: [Bioperl-l] Problem with GD.pm version 2.35
In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com>

This happens when there is a mismatch between the compiled (.so) portion of
GD and the perl (.pm) version. Typically it occurs when you have installed
GD incorrectly by, e.g., copying the .pm file into position rather than
using the make file.

Solution: Uninstall old versions of GD by manually finding all occurrences
of GD.so and GD.pm and removing them. Then reinstall the correct way.

Lincoln

On 7/3/07, Edward Wijaya <ewijaya at gmail.com> wrote:
>
> Dear all,
> I was trying to perform check with this command:
>
> $ perl -MGD -e 'print $GD::VERSION';
>
> And it gave:
>
> GD object version 2.32 does not match $GD::VERSION 2.35 at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
> Compilation failed in require.
> BEGIN failed--compilation aborted.
>
> Similarly my script that uses GD.pm doesn't execute.
>
>
> I have installed the latest version of libgd version 2.0.35 downloaded
> from
> http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
>
> Can anybody suggest how can I resolve my problem?
>
> This is my Perl version:
> This is perl, v5.8.8 built for i386-linux-thread-multi
>
> --
> Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Jul  4 01:45:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 00:45:16 -0500
Subject: [Bioperl-l] genbank2gff3 - Name attribute?
Message-ID: <C790FCC2-81E5-4BB4-A9CB-E2E59E5ABE27@uiuc.edu>

I noticed that genbank2gff3.pl doesn't have an explicitly defined way  
of converting the gene/locus/etc name to a Name tag (for, say,  
GBrowse).  Any particular reason?

Should I stick with GFF2 for now?

chris


From bix at sendu.me.uk  Wed Jul  4 06:00:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 04 Jul 2007 11:00:31 +0100
Subject: [Bioperl-l] Splitting Bioperl
Message-ID: <468B6FBF.1070708@sendu.me.uk>

To summarise some previous threads:
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409

# Bioperl is currently one monolithic distribution of ~900 modules
# There is some desire to split it up into smaller functional groups
# There are some problems with that proposal
# An extreme variant of that proposal is to make the groups individual 
modules


Following this discussion:
http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
(especially Adam Kennedy's postings of 4/07, soon to appear in that 
archive), the extreme variant doesn't seem like a good idea.


I'm now suggesting that Steve's original split idea, as 
modified/expanded by Adam's driver and other ideas, is the best choice. 
The problems I previously identified can be solved in the same way they 
were solved in my extreme variant: the splits are done by Build.PL 
automation working on a single repository/code-base, not by splitting 
things up at the repository level.


As I see it, the way forward now is for someone interested enough to 
decide on the specifics of how things will be split and offer them up to 
the group for discussion. I don't mean vague possibilities of what might 
work as a split, but rather some real thought should go into it to make 
sure the split makes sense and will actually work in practice.

Following that, the splits can be implemented by some automated dist 
action of Build.PL.


If there isn't sufficient interest to make this happen, I don't see that 
as a terrible thing. There are benefits to keeping Bioperl monolithic, 
and some of the problems (eg. lack of updates) can be solved without 
changing its nature.


From cjfields at uiuc.edu  Wed Jul  4 10:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 09:53:45 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <468B6FBF.1070708@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>


On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote:

> To summarise some previous threads:
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ 
> focus=15409
>
> # Bioperl is currently one monolithic distribution of ~900 modules
> # There is some desire to split it up into smaller functional groups
> # There are some problems with that proposal
> # An extreme variant of that proposal is to make the groups individual
> modules
>
>
> Following this discussion:
> http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
> (especially Adam Kennedy's postings of 4/07, soon to appear in that
> archive), the extreme variant doesn't seem like a good idea.

brian d foy made some sound arguments against it as well.

> I'm now suggesting that Steve's original split idea, as
> modified/expanded by Adam's driver and other ideas, is the best  
> choice.
> The problems I previously identified can be solved in the same way  
> they
> were solved in my extreme variant: the splits are done by Build.PL
> automation working on a single repository/code-base, not by splitting
> things up at the repository level.
>
> As I see it, the way forward now is for someone interested enough to
> decide on the specifics of how things will be split and offer them  
> up to
> the group for discussion. I don't mean vague possibilities of what  
> might
> work as a split, but rather some real thought should go into it to  
> make
> sure the split makes sense and will actually work in practice.

We've already identified a few (SearchIO, Tools, GBrowse-related, etc).
...
> If there isn't sufficient interest to make this happen, I don't see  
> that
> as a terrible thing. There are benefits to keeping Bioperl monolithic,
> and some of the problems (eg. lack of updates) can be solved without
> changing its nature.

If so, proposals that solve this problem need to be made as well.

If we stay monolithic, then here's mine: we start having fixed,  
regularly timed dev releases like Parrot, monthly or bimonthly (quite  
common on CPAN), with brief release reports on which bugs have been  
fixed, code has been added, so on.  Not every bug has to be fixed per  
dev release; if that were true there would never be releases for some  
of the XML parser packages.  No RCs for dev releases (it's a dev  
release!).  These would be 1.x.y.  We can then, every once in a  
while, have a bug-squashing session, hackathon, etc, and have regular  
non-dev release (1.x) that all core devs accept and that passes a  
particular milestone.

As for the advantage of a split approach, as mentioned previously it  
is to focus modules/tests/scripts into groups with related  
functions.  Even just splitting off ones with external reqs (XML  
parsers, GD, etc) into an 'aux' release would be an advantage, as it  
doesn't confront a new user with the burden of installing a large  
list of dependencies, some of which may be complicated for a perl  
newbie to either install from scratch (DBD::mysql, GD) or to get the  
latest bug-fixed prereq release for their OS (the recent debacle with  
XML::SAX::Expat issues come to mind, which wasn't immediately  
available for win32 as a PPM).

I'm fairly open to any approach as long as it's reasonably though  
out, though I am admittedly a bit biased towards the split approach.   
I do think some change is in order; I worry about there ever being a  
1.6 release at this point.

chris


From davila at ioc.fiocruz.br  Wed Jul  4 13:11:20 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Wed, 04 Jul 2007 14:11:20 -0300
Subject: [Bioperl-l] ESTs in EST format
Message-ID: <468BD4B8.5050105@ioc.fiocruz.br>

Dear All,

I am trying to get all ESTs from a given species (eg: Trypanosoma 
brucei) from Genbank in EST format (eg: 
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... 
while using Entrez I can "display" individual EST entries in EST format, 
this "EST format" is not an option in the main "display" menu for batch 
download ...

I dont see the EST format listed 
(http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO 
deal with, so wonder there would another BioPerl module to do this ? any 
tips, would be greatly appreciated ;-)

Kindest regards, Alberto


From jason at bioperl.org  Wed Jul  4 13:52:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 10:52:59 -0700
Subject: [Bioperl-l] ESTs in EST format
In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br>
References: <468BD4B8.5050105@ioc.fiocruz.br>
Message-ID: <D0D013CC-1D28-46D6-A94F-EA53C7EC5219@bioperl.org>

Currently we don't support this format as far as I know it isn't a  
published standard nor is it a format that you NCBI distributes this  
data in flat format for (i.e. genbank dumps).

Is there any reason why you can't get what you need from the GenBank  
format?
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb

-jason
On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote:

> Dear All,
>
> I am trying to get all ESTs from a given species (eg: Trypanosoma
> brucei) from Genbank in EST format (eg:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucest&id=10280980)...
> while using Entrez I can "display" individual EST entries in EST  
> format,
> this "EST format" is not an option in the main "display" menu for  
> batch
> download ...
>
> I dont see the EST format listed
> (http://www.bioperl.org/wiki/Sequence_formats) among the ones that  
> SeqIO
> deal with, so wonder there would another BioPerl module to do  
> this ? any
> tips, would be greatly appreciated ;-)
>
> Kindest regards, Alberto
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Wed Jul  4 14:37:22 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Jul 2007 13:37:22 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>


On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:

>  we start having fixed,
> regularly timed dev releases like Parrot, monthly or bimonthly (quite
> common on CPAN), with brief release reports on which bugs have been
> fixed, code has been added, so on.  Not every bug has to be fixed per
> dev release; if that were true there would never be releases for some
> of the XML parser packages.  No RCs for dev releases (it's a dev
> release!).  These would be 1.x.y.  We can then, every once in a
> while, have a bug-squashing session, hackathon, etc, and have regular
> non-dev release (1.x) that all core devs accept and that passes a
> particular milestone.


Regardless of whether we split or don't, I think these ideas of  
adding a little more structure to BioPerl's development cycles --  
especially having bug-squashing and hacking sessions, where we all  
band together and commit some time to cranking through a bunch of to- 
dos -- would be beneficial, particularly as a means to keeping a  
certain basal level of momentum in BioPerl.

Dave


From jason at bioperl.org  Wed Jul  4 15:45:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 12:45:29 -0700
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
Message-ID: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>

I definitely agree - we can live up to the unstable "living on the  
edge" nature of dev releases a bit more perhaps?


On Jul 4, 2007, at 11:37 AM, David Messina wrote:

>
> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>
>>  we start having fixed,
>> regularly timed dev releases like Parrot, monthly or bimonthly (quite
>> common on CPAN), with brief release reports on which bugs have been
>> fixed, code has been added, so on.  Not every bug has to be fixed per
>> dev release; if that were true there would never be releases for some
>> of the XML parser packages.  No RCs for dev releases (it's a dev
>> release!).  These would be 1.x.y.  We can then, every once in a
>> while, have a bug-squashing session, hackathon, etc, and have regular
>> non-dev release (1.x) that all core devs accept and that passes a
>> particular milestone.
>
>
> Regardless of whether we split or don't, I think these ideas of
> adding a little more structure to BioPerl's development cycles --
> especially having bug-squashing and hacking sessions, where we all
> band together and commit some time to cranking through a bunch of to-
> dos -- would be beneficial, particularly as a means to keeping a
> certain basal level of momentum in BioPerl.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Wed Jul  4 16:54:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 15:54:14 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
Message-ID: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>

I think what's partially responsible for slowing down releases is the  
expectation that each dev release is supposed to have all bugs fixed,  
work for every OS, etc.  In other words, act like a stable release.

A developer release by nature is living on the edge, so why not have  
regular dev releases?  We keep telling users to update to using  
bioperl-live whenever something breaks, anyway.  We could decide to  
split stuff off along the way into more 'stable' sections if there  
were more demand for it, and have the more API-volatile code  
(DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
'dev' tag until we feel it's ready for prime time.

chris

On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:

> I definitely agree - we can live up to the unstable "living on the
> edge" nature of dev releases a bit more perhaps?
>
>
> On Jul 4, 2007, at 11:37 AM, David Messina wrote:
>
>>
>> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>>
>>>  we start having fixed,
>>> regularly timed dev releases like Parrot, monthly or bimonthly  
>>> (quite
>>> common on CPAN), with brief release reports on which bugs have been
>>> fixed, code has been added, so on.  Not every bug has to be fixed  
>>> per
>>> dev release; if that were true there would never be releases for  
>>> some
>>> of the XML parser packages.  No RCs for dev releases (it's a dev
>>> release!).  These would be 1.x.y.  We can then, every once in a
>>> while, have a bug-squashing session, hackathon, etc, and have  
>>> regular
>>> non-dev release (1.x) that all core devs accept and that passes a
>>> particular milestone.
>>
>>
>> Regardless of whether we split or don't, I think these ideas of
>> adding a little more structure to BioPerl's development cycles --
>> especially having bug-squashing and hacking sessions, where we all
>> band together and commit some time to cranking through a bunch of to-
>> dos -- would be beneficial, particularly as a means to keeping a
>> certain basal level of momentum in BioPerl.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Thu Jul  5 04:09:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 09:09:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
Message-ID: <468CA721.4020804@sheffield.ac.uk>

Chris Fields wrote:
> I think what's partially responsible for slowing down releases is the  
> expectation that each dev release is supposed to have all bugs fixed,  
> work for every OS, etc.  In other words, act like a stable release.
>
> A developer release by nature is living on the edge, so why not have  
> regular dev releases?  We keep telling users to update to using  
> bioperl-live whenever something breaks, anyway.  We could decide to  
> split stuff off along the way into more 'stable' sections if there  
> were more demand for it, and have the more API-volatile code  
> (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
> 'dev' tag until we feel it's ready for prime time.
>
> chris
>
> On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:
>
>   
-- snip --

I agree, although would the dev releases still need to pass all the 
tests? I'm thinking of people installing via CPAN.

I also agree with what was said in a previous post about bringing back 
bioperl-run (and some others) back into the same repository as 
bioperl-core (after a successful move over to svn) and have Build.PL 
deal with creating the packages etc for CPAN. This would hopefully help 
keep the run package (and others) up to speed with the core package.

I also agree with previous posts about organising and/or having some 
naming convention for test data files. I think an approach whereby data 
files were organised into directory trees (1 - 3 deep) with names that 
elude to the type of data in that subtree/file rather than the tests 
that use it etc. For example:

t/data
    |__ formats
    |           |__ seq
    |           |        |__ legal_fasta
    |           |        |              |__ extension.fas
    |           |        |              |__ extension.fasta
    |           |        |              |__ extension.foo
    |           |        |              |__ extension.bar
    |           |        |              |__ no_extension
    |           |        |              |__ interleaved.fas
    |           |        |              |__ non_interleaved.fas
    |           |        |              |__ single_seq.fas
    |           |        |              |__ multiple_seq.fas
    |           |        |              |__ desc_line1.fas
    |           |        |              |__ desc_line2.fas
    |           |        |
    |           |        |__ illegal_fasta
    |           |        |              |__ illegal_chars.fas
    |           |        |              |__ 
some_other_illegal_alternative.fas
    |           |        |
    |           |        |__ legal_genbank
    |           |        |              |__ etc etc
    |           |        |
    |           |        |__ illegal_genank
    |           |                      |__ etc etc
    |           |
    |           |__ aln
    |           |__ blast
    |           |        |__ legal_blastx
    |           |        |
    |           |        |__ legal_blastp
    |           |        |
    |           |        |__ legal_tblastx
    |           |        |
    |           |        |__ legal_plastpsi
    |           |        |
    |           |        |__ legal_wublast
    |           |__ foo
    |           |__ bar
    |           |__ misc
    |
    |__ etc

This type of setup, might lend itself to having a test script simply try 
to parse all the files in a directory to ensure nothing fails (for legal 
file formats) and fails for illegal formats. Naming of the file paths 
would help test authors to identify a suitable data file for their own 
tests before adding their own to the t/data dir. It might also help to 
identify areas where example test data is currently lacking.

Thinking about this a little more, I think it would be a good idea to 
include Test::Exception in t/lib. We should also be testing that 
warnings and exceptions are generated when expected - e.g. illegal 
characters in seq files etc etc. Without these sorts of tests we are 
only getting half the story. This testing might account for a large 
chunk of the poor test coverage, particularly when it comes to branches 
in the code.

Anyway, this type of reorganisation couldn't take place until the svn 
repo is up and working.

I'd appreciate any comments on the above!
Nath


From bix at sendu.me.uk  Thu Jul  5 04:55:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 09:55:25 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <468CB1FD.7060301@sendu.me.uk>

Nathan S. Haigh wrote:
> I agree, although would the dev releases still need to pass all the 
> tests? I'm thinking of people installing via CPAN.

Yes, they'd all have to pass. 'Developer release' should never have the 
connotation of 'broken release'. However, getting all tests to pass is a 
lot easier than fixing all bugs in bugzilla.

(... which actually goes to show how poor our tests are)

Worst case, if we were forced to stick to a schedule but couldn't fix a 
failing test, we could always make it a 'todo' test.


> I also agree with what was said in a previous post about bringing back 
> bioperl-run (and some others) back into the same repository as 
> bioperl-core (after a successful move over to svn)

Agree (with myself essentially).


> I also agree with previous posts about organising and/or having some 
> naming convention for test data files. I think an approach whereby data 
> files were organised into directory trees (1 - 3 deep) with names that 
> elude to the type of data in that subtree/file rather than the tests 
> that use it etc. For example:
> 
> t/data
>     |__ formats
>     |           |__ seq
>     |           |        |__ legal_fasta
>     |           |        |              |__ extension.fas
[snip]

At that level, files don't need extensions and can have fully 
informative names that explain what's interesting or special about them.


> This type of setup, might lend itself to having a test script simply try 
> to parse all the files in a directory to ensure nothing fails (for legal 
> file formats) and fails for illegal formats.

Great idea.


> Thinking about this a little more, I think it would be a good idea to 
> include Test::Exception in t/lib.

Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.


> Anyway, this type of reorganisation couldn't take place until the svn 
> repo is up and working.

Agree.


From bix at sendu.me.uk  Thu Jul  5 05:39:10 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 10:39:10 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>
	<468CB1FD.7060301@sendu.me.uk>
Message-ID: <468CBC3E.1020408@sendu.me.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Thinking about this a little more, I think it would be a good idea to 
>> include Test::Exception in t/lib.
> 
> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.

I've now done that: BioperlTest loads Test::Exception, from the copy in 
t/lib if necessary.

So, in BioperlTest-using scripts you now have access to the methods 
dies_ok, lives_ok, throws_ok and lives_and.


From N.Haigh at sheffield.ac.uk  Thu Jul  5 06:01:04 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 11:01:04 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk>

Quoting Sendu Bala <bix at sendu.me.uk>:

-- snip --
> 
> 
> > I also agree with previous posts about organising and/or having some 
> > naming convention for test data files. I think an approach whereby data 
> > files were organised into directory trees (1 - 3 deep) with names that 
> > elude to the type of data in that subtree/file rather than the tests 
> > that use it etc. For example:
> > 
> > t/data
> >     |__ formats
> >     |           |__ seq
> >     |           |        |__ legal_fasta
> >     |           |        |              |__ extension.fas
> [snip]
> 
> At that level, files don't need extensions and can have fully 
> informative names that explain what's interesting or special about them.
> 

You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside
the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to
check that the peek inside the file correctly determines the format.

-- snip --


From bix at sendu.me.uk  Thu Jul  5 06:04:16 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:04:16 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
Message-ID: <468CC220.804@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Sendu Bala <bix at sendu.me.uk>:
> 
> -- snip --
>> 
>>> I also agree with previous posts about organising and/or having
>>> some naming convention for test data files. I think an approach
>>> whereby data files were organised into directory trees (1 - 3
>>> deep) with names that elude to the type of data in that
>>> subtree/file rather than the tests that use it etc. For example:
>>> 
>>> t/data |__ formats |           |__ seq |           |        |__
>>> legal_fasta |           |        |              |__ extension.fas
>>> 
>> [snip]
>> 
>> At that level, files don't need extensions and can have fully 
>> informative names that explain what's interesting or special about
>> them.
>> 
> 
> You may be correct in most cases, however, isn't there a method for
> detecting the file format from the file extension and failing that it
> peeks inside the file? Therefore there should be a file extension for
> each of these to get good code coverage as well as each format not
> having an extension to check that the peek inside the file correctly
> determines the format.

Yes, you're quite correct.


From bix at sendu.me.uk  Thu Jul  5 06:47:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:47:12 +0100
Subject: [Bioperl-l] Warnings
Message-ID: <468CCC30.90406@sendu.me.uk>

I'm trying to get Test::Warn to work with Bioperl warnings as produced 
by Bio::Root::RootI::warn(). However, afaict the warnings must be 
generated with CORE::warn(), not print STDERR.

Is there any particular reason RootI::warn is done with print and not 
CORE::warn ? Can I change it to a warn?


From bix at sendu.me.uk  Thu Jul  5 09:04:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:04:50 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
Message-ID: <468CEC72.4090909@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> My guess is that using 'print STDERR' avoids showing sometimes annoying 
>    errordescription  at programname line  NN
> syntax being used.

Afaik,

CORE::warn "anything\n";

never includes the line number: messages with a new line always disable 
that feature. Bio::Root::RootI::warn /always/ puts new lines into the 
message, so they /never/ have the line number.


> On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
> objects is to find where warnings are coming from. Maybe extra text in 
> warnings leads to easier debugging.
> 
> I favour changing it.

So its my understanding there will be absolutely no difference in 
behaviour following this change (except that warning can be caught by 
Test::Warn). I just wanted to confirm my understanding.


From hlapp at gmx.net  Thu Jul  5 09:07:27 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Jul 2007 09:07:27 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>


On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I think what's partially responsible for slowing down releases is the
>> expectation that each dev release is supposed to have all bugs fixed,
>> work for every OS, etc.  In other words, act like a stable release.
>>

It doesn't. A stable release has a stable API that will be supported  
until the next stable release through point releases.

>> A developer release by nature is living on the edge, so why not have
>> regular dev releases?

There's no problem with regular dev releases, but tests will need to  
pass. There was never a stipulation that all bugs need to have been  
fixed. But all tests need to pass, so in an ideal world (in which  
everything is being tested) all tests passing would imply all (known)  
bugs fixed. Obviously, we don't live in an ideal world ...

If not everything passes then what is the big difference to a code  
snapshot? If using cvs (or svn) is too difficult for most people, we  
can consider creating a mechanism that puts up nightly snapshots for  
download.

> -- snip --
>
> I agree, although would the dev releases still need to pass all the
> tests? I'm thinking of people installing via CPAN.

For example, that's another point.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From heikki at sanbi.ac.za  Thu Jul  5 09:12:37 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 15:12:37 +0200
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <200707051512.38185.heikki@sanbi.ac.za>


One more suggestion:

It would be extemaly useful if we had a standard way of testing that a when a 
file is read into a bioperl object and then written out again into a same 
format, the input and output files are identical. If not, the test should 
show where the the differences start (showing all the differences would just 
clutter the screen).

This standard method/subroutine should be used to test all sequence and other 
text file IO.

Any takers? 

	-Heikki

On Thursday 05 July 2007 11:39:10 Sendu Bala wrote:
> Sendu Bala wrote:
> > Nathan S. Haigh wrote:
> >> Thinking about this a little more, I think it would be a good idea to
> >> include Test::Exception in t/lib.
> >
> > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
>
> I've now done that: BioperlTest loads Test::Exception, from the copy in
> t/lib if necessary.
>
> So, in BioperlTest-using scripts you now have access to the methods
> dies_ok, lives_ok, throws_ok and lives_and.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jul  5 08:58:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 14:58:59 +0200
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CCC30.90406@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
Message-ID: <200707051458.59921.heikki@sanbi.ac.za>

My guess is that using 'print STDERR' avoids showing sometimes annoying 
   errordescription  at programname line  NN
syntax being used.

On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
objects is to find where warnings are coming from. Maybe extra text in 
warnings leads to easier debugging.

I favour changing it.

	-Heikki


On Thursday 05 July 2007 12:47:12 Sendu Bala wrote:
> I'm trying to get Test::Warn to work with Bioperl warnings as produced
> by Bio::Root::RootI::warn(). However, afaict the warnings must be
> generated with CORE::warn(), not print STDERR.
>
> Is there any particular reason RootI::warn is done with print and not
> CORE::warn ? Can I change it to a warn?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Jul  5 09:44:08 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:44:08 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF5A8.7040402@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that
> a when a file is read into a bioperl object and then written out
> again into a same format, the input and output files are identical.

As Hilmar has pointed out in the past, Bioperl doesn't aim for the files 
to be identical, only for none of the information to be lost and to be 
ouput in the correct format.

So a round-trip test should read in the original, store all the parsed 
data, write it out, then read in the written version and see if the new 
parsed data matches the original.


For simpler or ultra-strict file formats, though...

> If not, the test should show where the the differences start (showing
> all the differences would just clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence
> and other text file IO.
> 
> Any takers?

There's already something along these lines in t/SeqIO.t (the section
that uses Algorithm::Diff).

I copied that over from the old testformats.pl script but haven't really
taken the time to see if its a good way of doing the test.

Is it? Can someone come up with something better? Can someone generalise
it if necessary?

I imagine you could just read the files into arrays and use 
Test::More::is_deeply(). If that would be satisfactory I could easily 
add a little method to BioperlTest that did that.


From n.haigh at sheffield.ac.uk  Thu Jul  5 09:47:24 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 14:47:24 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF66C.2070907@sheffield.ac.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that a when a 
> file is read into a bioperl object and then written out again into a same 
> format, the input and output files are identical. If not, the test should 
> show where the the differences start (showing all the differences would just 
> clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence and other 
> text file IO.
> 
> Any takers? 
> 
> 	-Heikki
> 

Wouldn't this require info about the formatting of the file to be stored 
in the object as well, such that the same formatting could be used when 
writing the file?

Wouldn't a better approach be to read the contents of file1 into ojb1, 
write obj1 to file2 in the same format, and then read file2 into obj2 
and compare obj1 to obj2 to ensure we have all the same data.

Nath


From cjfields at uiuc.edu  Thu Jul  5 09:52:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 08:52:12 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <BECE91CB-980B-4063-8E85-291CC85DCDC1@uiuc.edu>


On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote:

> ...
> I agree, although would the dev releases still need to pass all the  
> tests? I'm thinking of people installing via CPAN.

Remains to be decided.  All current tests (net and non-non) should  
pass.  Any bug fixes should try to have added tests if possible, with  
in-process stuff as TODO's.  Network tests are left up to user  
discretion, so if they fail for any particular reason there is a way  
around them.

> I also agree with what was said in a previous post about bringing  
> back bioperl-run (and some others) back into the same repository as  
> bioperl-core (after a successful move over to svn) and have  
> Build.PL deal with creating the packages etc for CPAN. This would  
> hopefully help keep the run package (and others) up to speed with  
> the core package.

It's up to how we want to have everything split.  I don't think it's  
immediately prescient (there are more important priorities, i.e.  
bugs, svn) but I would say folding everything back into live and  
'splitting' them out using an automated Build process is a viable  
option.

> I also agree with previous posts about organising and/or having  
> some naming convention for test data files. I think an approach  
> whereby data files were organised into directory trees (1 - 3 deep)  
> with names that elude to the type of data in that subtree/file  
> rather than the tests that use it etc. For example:
>
> t/data
>    |__ formats
>    |           |__ seq
>    |           |        |__ legal_fasta
>    |           |        |              |__ extension.fas
>    |           |        |              |__ extension.fasta
>    |           |        |              |__ extension.foo
>    |           |        |              |__ extension.bar
>    |           |        |              |__ no_extension
>    |           |        |              |__ interleaved.fas
>    |           |        |              |__ non_interleaved.fas
>    |           |        |              |__ single_seq.fas
>    |           |        |              |__ multiple_seq.fas
>    |           |        |              |__ desc_line1.fas
>    |           |        |              |__ desc_line2.fas
>    |           |        |
>    |           |        |__ illegal_fasta
>    |           |        |              |__ illegal_chars.fas
>    |           |        |              |__  
> some_other_illegal_alternative.fas
>    |           |        |
>    |           |        |__ legal_genbank
>    |           |        |              |__ etc etc
>    |           |        |
>    |           |        |__ illegal_genank
>    |           |                      |__ etc etc
>    |           |
>    |           |__ aln
>    |           |__ blast
>    |           |        |__ legal_blastx
>    |           |        |
>    |           |        |__ legal_blastp
>    |           |        |
>    |           |        |__ legal_tblastx
>    |           |        |
>    |           |        |__ legal_plastpsi
>    |           |        |
>    |           |        |__ legal_wublast
>    |           |__ foo
>    |           |__ bar
>    |           |__ misc
>    |
>    |__ etc
>
> This type of setup, might lend itself to having a test script  
> simply try to parse all the files in a directory to ensure nothing  
> fails (for legal file formats) and fails for illegal formats.  
> Naming of the file paths would help test authors to identify a  
> suitable data file for their own tests before adding their own to  
> the t/data dir. It might also help to identify areas where example  
> test data is currently lacking.

...
This seems like more of a 'guess sequence' and format validation  
issue, something we've talked about before:

http://bugzilla.open-bio.org/show_bug.cgi?id=1508

The way I feel about it is sequence format validation and sequence  
parsing should be separate issues and therefore in separate classes  
(with parsing optionally preceded by validation), but that's  
something for another discussion.

> Thinking about this a little more, I think it would be a good idea  
> to include Test::Exception in t/lib. We should also be testing that  
> warnings and exceptions are generated when expected - e.g. illegal  
> characters in seq files etc etc. Without these sorts of tests we  
> are only getting half the story. This testing might account for a  
> large chunk of the poor test coverage, particularly when it comes  
> to branches in the code.
>
> Anyway, this type of reorganisation couldn't take place until the  
> svn repo is up and working.
>
> I'd appreciate any comments on the above!
> Nath

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:08:29 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:08:29 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CF5A8.7040402@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk>
Message-ID: <468CFB5D.6080406@sheffield.ac.uk>

Is there a way to install all the modules that are used in the tests? I 
mean there are cases where tests are skipped and pass if the required 
module for testing is not installed. Therefore, missing out a chunk of 
the tests. It would be desirable to be able to install all these modules 
in order to complete they whole test suite - any ideas if/how this can 
be done?

Cheers
Nath


From bix at sendu.me.uk  Thu Jul  5 10:15:34 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 15:15:34 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <468CFD06.3080604@sendu.me.uk>

Nathan S. Haigh wrote:
> Is there a way to install all the modules that are used in the tests? I 
> mean there are cases where tests are skipped and pass if the required 
> module for testing is not installed. Therefore, missing out a chunk of 
> the tests. It would be desirable to be able to install all these modules 
> in order to complete they whole test suite - any ideas if/how this can 
> be done?

Yes, add them as recommended (or perhaps 'build_requires') modules in 
Build.PL, then run Build.PL and install the modules when it asks you.

Everything should be in Build.PL already. If I missed something, please 
add it.


From cjfields at uiuc.edu  Thu Jul  5 10:18:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:08 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <C3B6AF09-B395-4303-9B50-953C0FAAE8A7@uiuc.edu>


On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote:

> Is there a way to install all the modules that are used in the  
> tests? I
> mean there are cases where tests are skipped and pass if the required
> module for testing is not installed. Therefore, missing out a chunk of
> the tests. It would be desirable to be able to install all these  
> modules
> in order to complete they whole test suite - any ideas if/how this can
> be done?
>
> Cheers
> Nath

That's optionally done upon 'perl Build.PL', correct?  So if you  
choose not to install a particular prereq (i.e. XML::SAX), you  
shouldn't be forced to install it later just for tests.  Or am I  
misunderstanding you?

chris


From cjfields at uiuc.edu  Thu Jul  5 10:18:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:23 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CC220.804@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
Message-ID: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>


On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:

> Nathan S. Haigh wrote:
>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>> ...<snip snips>
>>> At that level, files don't need extensions and can have fully
>>> informative names that explain what's interesting or special about
>>> them.
>>>
>>
>> You may be correct in most cases, however, isn't there a method for
>> detecting the file format from the file extension and failing that it
>> peeks inside the file? Therefore there should be a file extension for
>> each of these to get good code coverage as well as each format not
>> having an extension to check that the peek inside the file correctly
>> determines the format.
>
> Yes, you're quite correct.

I actually like Sendu's idea more, or the idea of each test suite  
having it's own directory.

Tests which need to guess/validate the format are probably best left  
sequestered to a specific suite focused on format guessing/ 
validation, at least in my opinion.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:22:40 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:22:40 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFD06.3080604@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk>
Message-ID: <468CFEB0.80201@sheffield.ac.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Is there a way to install all the modules that are used in the tests? 
>> I mean there are cases where tests are skipped and pass if the 
>> required module for testing is not installed. Therefore, missing out a 
>> chunk of the tests. It would be desirable to be able to install all 
>> these modules in order to complete they whole test suite - any ideas 
>> if/how this can be done?
> 
> Yes, add them as recommended (or perhaps 'build_requires') modules in 
> Build.PL, then run Build.PL and install the modules when it asks you.
> 
> Everything should be in Build.PL already. If I missed something, please 
> add it.
> 

OK, to clarify using the test file Sendu mentioned in a previous post: 
t/SeqIO.t

This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String 
are not installed (the first two are not mentioned in Build.PL). 
However, if there are a lot of such skips in the whole test suite then 
there maybe few system with all these modules installed in order to 
conduct a complete test. These are the modules I'm referring to.

Nath


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:30:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:30:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
Message-ID: <468D006D.6050806@sheffield.ac.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:
> 
>> Nathan S. Haigh wrote:
>>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>>> ...<snip snips>
>>>> At that level, files don't need extensions and can have fully
>>>> informative names that explain what's interesting or special about
>>>> them.
>>>>
>>>
>>> You may be correct in most cases, however, isn't there a method for
>>> detecting the file format from the file extension and failing that it
>>> peeks inside the file? Therefore there should be a file extension for
>>> each of these to get good code coverage as well as each format not
>>> having an extension to check that the peek inside the file correctly
>>> determines the format.
>>
>> Yes, you're quite correct.
> 
> I actually like Sendu's idea more, or the idea of each test suite having 
> it's own directory.
> 
> Tests which need to guess/validate the format are probably best left 
> sequestered to a specific suite focused on format guessing/validation, 
> at least in my opinion.
> 
> chris


How easily would this lend itself to using the same data for multiple 
tests, or is it likely to lead to/exacerbate a culture of adding 
duplicate data files in each "test suite" rather than reusing?

Nath


From cjfields at uiuc.edu  Thu Jul  5 10:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:33:46 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
Message-ID: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>


On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote:

> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:
>
>> Chris Fields wrote:
>>> I think what's partially responsible for slowing down releases is  
>>> the
>>> expectation that each dev release is supposed to have all bugs  
>>> fixed,
>>> work for every OS, etc.  In other words, act like a stable release.
>
> It doesn't. A stable release has a stable API that will be  
> supported until the next stable release through point releases.

I agree, but I think there is still an expectation that 1.5.2 and  
beyond are more like true 'stable' releases even though we still  
designate them as 'developer.'   We unfortunately reinforce that when  
we tell users they need to update to v. 1.5.2 or bioperl-live to fix  
a particular bug in the 1.4 release.

There's nothing we can do about that now (hindsight is always 20/20,  
and 1.4 is just too old).  We (pumpkin, core devs) can try correcting  
that by ensuring any bug fixes be committed to any new stable branch  
as well as to live, at least until it becomes too problematic to  
maintain that particular stable branch (at which point we would go  
about getting ready for the next 'stable' and repeat the cycle over  
again).

>>> A developer release by nature is living on the edge, so why not have
>>> regular dev releases?
>
> There's no problem with regular dev releases, but tests will need  
> to pass. There was never a stipulation that all bugs need to have  
> been fixed. But all tests need to pass, so in an ideal world (in  
> which everything is being tested) all tests passing would imply all  
> (known) bugs fixed. Obviously, we don't live in an ideal world ...

...particularly when it comes to network-related tests and remote  
server problems (but those are by default not run, so there is a way  
around test fails there).  I agree here as well (all tests must  
pass).  As for the bug fixes, we can just stipulate which ones were  
fixed with the release (in a RELEASE_NOTES or similar), and maybe  
have TODO's in the test suite designating they are being worked on.

Basically, at regular intervals, maybe with a few weeks of lead time,  
the pumpkin would announce an impending dev. release.  Go through  
rounds of tests, bug fixes, etc.  When all tests pass post it on CPAN  
as a dev. release.  If we have a stable release branch with relevant  
bug fixes we can post that as well, again to the point where it  
becomes too problematic.

Would we just take a snapshot of MAIN and any relevant stable branch  
at that particular point for the CPAN release, just increasing the  
version number (1.x.y)?  Would it make sense to have a 1.x.y branch  
for each release (I don't think so, but maybe others disagree)?

> If not everything passes then what is the big difference to a code  
> snapshot? If using cvs (or svn) is too difficult for most people,  
> we can consider creating a mechanism that puts up nightly snapshots  
> for download.

If we feel a nightly snapshot is warranted we could do that though.   
I personally don't think there is a need, particularly since we have  
several means to obtain the latest code at any point in time  
(including the browsable CVS 'Download tarball').  We could state the  
next dev/stable CPAN release (pending on date dd/mm/yy) will have the  
bug fix, and if they want it immediately then pick it up from CVS.

>> -- snip --
>>
>> I agree, although would the dev releases still need to pass all the
>> tests? I'm thinking of people installing via CPAN.
>
> For example, that's another point.
>
>  	-hilmar

Yes, I agree.

As an aside, I don't think dev. releases pop up when you run a simple  
'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may  
know the answer to that.

chris 


From cjfields at uiuc.edu  Thu Jul  5 10:34:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:34:22 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>


On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:

>
> One more suggestion:
>
> It would be extemaly useful if we had a standard way of testing  
> that a when a
> file is read into a bioperl object and then written out again into  
> a same
> format, the input and output files are identical. If not, the test  
> should
> show where the the differences start (showing all the differences  
> would just
> clutter the screen).
>
> This standard method/subroutine should be used to test all sequence  
> and other
> text file IO.
>
> Any takers?
>
> 	-Heikki
...

I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
that do some checking, I think, but something like this would be of  
use.  However, what if the test file is old (as many in t/data are)  
and the format has changed?  GenBank and EMBL, for instance, have  
gone through several changes to format.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 10:43:51 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:43:51 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <468D03A7.3090408@sheffield.ac.uk>

Chris Fields wrote:
-- snip --

>>>
>>> I agree, although would the dev releases still need to pass all the
>>> tests? I'm thinking of people installing via CPAN.
>>
>> For example, that's another point.
>>
>>      -hilmar
> 
> Yes, I agree.
> 
> As an aside, I don't think dev. releases pop up when you run a simple 
> 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know 
> the answer to that.
> 
> chris


Thats right, it'll only install the non-developer releases (1.4 
currently). If you want to install the developer release from CPAN you 
need to know the path the archive and then do:

cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz

as detailed on the wiki:
http://www.bioperl.org/wiki/Release_1.5.2

Nath


From cjfields at uiuc.edu  Thu Jul  5 10:49:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:49:33 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFEB0.80201@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>


On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:

> Sendu Bala wrote:
>> ...
>> Yes, add them as recommended (or perhaps 'build_requires') modules in
>> Build.PL, then run Build.PL and install the modules when it asks you.
>>
>> Everything should be in Build.PL already. If I missed something,  
>> please
>> add it.
>>
>
> OK, to clarify using the test file Sendu mentioned in a previous post:
> t/SeqIO.t
>
> This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> IO::String
> are not installed (the first two are not mentioned in Build.PL).
> However, if there are a lot of such skips in the whole test suite then
> there maybe few system with all these modules installed in order to
> conduct a complete test. These are the modules I'm referring to.
>
> Nath

If they are only necessary for tests, work for all OSs, and are pure  
Perl they should be added to t/lib, like Test::More and the rest.  If  
they only work for some OSs they could be added to t/lib and skip  
based on OS, but they still must be pure Perl.  I would avoid  
anything that requires any compiling for XS or Inline altogether (I  
don't want to go down the nightmare road of OS-dependent compiler  
issues for a few tests).

Finally, if they are needed for core modules (not just tests) then  
they should be added to the core prereqs in Build.

chris


From cjfields at uiuc.edu  Thu Jul  5 10:52:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:52:58 -0500
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CEC72.4090909@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>


On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:

> ...
>
> So its my understanding there will be absolutely no difference in
> behaviour following this change (except that warning can be caught by
> Test::Warn). I just wanted to confirm my understanding.

You can always just try it out and run tests.  Might be interesting  
to see if anything breaks.

chris


From N.Haigh at sheffield.ac.uk  Thu Jul  5 10:58:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 15:58:30 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
> 
> >
> > One more suggestion:
> >
> > It would be extemaly useful if we had a standard way of testing  
> > that a when a
> > file is read into a bioperl object and then written out again into  
> > a same
> > format, the input and output files are identical. If not, the test  
> > should
> > show where the the differences start (showing all the differences  
> > would just
> > clutter the screen).
> >
> > This standard method/subroutine should be used to test all sequence  
> > and other
> > text file IO.
> >
> > Any takers?
> >
> > 	-Heikki
> ...
> 
> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
> that do some checking, I think, but something like this would be of  
> use.  However, what if the test file is old (as many in t/data are)  
> and the format has changed?  GenBank and EMBL, for instance, have  
> gone through several changes to format.
> 
> chris
> 
> 

Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes?

Nath


From N.Haigh at sheffield.ac.uk  Thu Jul  5 11:04:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 16:04:30 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:
> 
> > Sendu Bala wrote:
> >> ...
> >> Yes, add them as recommended (or perhaps 'build_requires') modules in
> >> Build.PL, then run Build.PL and install the modules when it asks you.
> >>
> >> Everything should be in Build.PL already. If I missed something,  
> >> please
> >> add it.
> >>
> >
> > OK, to clarify using the test file Sendu mentioned in a previous post:
> > t/SeqIO.t
> >
> > This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> > IO::String
> > are not installed (the first two are not mentioned in Build.PL).
> > However, if there are a lot of such skips in the whole test suite then
> > there maybe few system with all these modules installed in order to
> > conduct a complete test. These are the modules I'm referring to.
> >
> > Nath
> 
> If they are only necessary for tests, work for all OSs, and are pure  
> Perl they should be added to t/lib, like Test::More and the rest.  If  
> they only work for some OSs they could be added to t/lib and skip  
> based on OS, but they still must be pure Perl.  I would avoid  
> anything that requires any compiling for XS or Inline altogether (I  
> don't want to go down the nightmare road of OS-dependent compiler  
> issues for a few tests).

If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!?

> 
> Finally, if they are needed for core modules (not just tests) then  
> they should be added to the core prereqs in Build.
> 
> chris
> 


From bix at sendu.me.uk  Thu Jul  5 11:13:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:13:35 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <468D0A9F.4010709@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Chris Fields <cjfields at uiuc.edu>:
>>> OK, to clarify using the test file Sendu mentioned in a previous
>>> post: t/SeqIO.t
>>> 
>>> This test skips tests if Algorithm::Diff, IO::ScalarArray or 
>>> IO::String are not installed
>> 
>> If they are only necessary for tests, work for all OSs, and are
>> pure Perl they should be added to t/lib, like Test::More and the
>> rest.  If they only work for some OSs they could be added to t/lib
>> and skip based on OS, but they still must be pure Perl.  I would
>> avoid anything that requires any compiling for XS or Inline
>> altogether (I don't want to go down the nightmare road of
>> OS-dependent compiler issues for a few tests).
> 
> If this is the case, there surely is no need to skip the tests if
> they should be provided in the t/lib dir. Am I missing something!?

That skip in SeqIO.t is new and I simply didn't think of them as 
important enough to make anyone install them or include them in t/lib.

I'd go ahead and add those modules, but like I say, it may make more 
sense just to use is_deeply(), removing the dependency on 
Algorithm::Diff and IO::ScalarArray completely.


From cjfields at uiuc.edu  Thu Jul  5 11:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:35:41 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <F97172F8-F59A-4CCD-9BBD-B763675EB92F@uiuc.edu>


On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote:

> ...
>> If they are only necessary for tests, work for all OSs, and are pure
>> Perl they should be added to t/lib, like Test::More and the rest.  If
>> they only work for some OSs they could be added to t/lib and skip
>> based on OS, but they still must be pure Perl.  I would avoid
>> anything that requires any compiling for XS or Inline altogether (I
>> don't want to go down the nightmare road of OS-dependent compiler
>> issues for a few tests).
>
> If this is the case, there surely is no need to skip the tests if  
> they should be provided in the t/lib dir. Am I missing something!?

No, you are correct, but these are currently not in t/lib (unless  
someone snuck them in....)

Of the modules you listed above, only one (IO::String) is required by  
the core modules.  The others are not.  Users shouldn't be forced to  
install Algorithm::Diff or IO::ScalarArray just to run tests, so  
anything not required should go into t/lib if at all possible.

If there any reasons (OS issues, list of prereqs) which preclude  
adding these to t/lib we need to ask ourselves (1) why we are using  
that module in the first place?  And, if there is a good reason, (2)  
can we skip them if they aren't present?  Both of those options are  
already available.

chris


From cjfields at uiuc.edu  Thu Jul  5 11:50:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:50:55 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468D006D.6050806@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
	<468D006D.6050806@sheffield.ac.uk>
Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu>


On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote:

> ...
>> I actually like Sendu's idea more, or the idea of each test suite  
>> having it's own directory.
>> Tests which need to guess/validate the format are probably best  
>> left sequestered to a specific suite focused on format guessing/ 
>> validation, at least in my opinion.
>> chris
>
>
> How easily would this lend itself to using the same data for  
> multiple tests, or is it likely to lead to/exacerbate a culture of  
> adding duplicate data files in each "test suite" rather than reusing?
>
> Nath

If there is a group of test data used for more than one test suite we  
can group those together into a common use folder, or we can go by  
format.  I'm pretty open to anything, really, as long as it is more  
organized.

My point is really concerned more with validation/guessing.  I think  
we should limit those tests to their respective specific test suites,  
or even to sections within a particular test suite (for instance,  
genbank.t), but not to force sequence guessing or validation in other  
cases.  To me validation, guessing, and parsing are three distinct  
issues (much like XML parsers handle things), so they require three  
distinct tests.

As for true sequence validation, there is no official format  
validation scheme yet in BioPerl.  It's sort of unofficially  
intergrated into the sequence parsers themselves (something which I  
find to be problematic for several reasons too long to outline here).

chris


From cjfields at uiuc.edu  Thu Jul  5 11:54:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:54:42 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
	<1183647510.468d07168963c@webmail.shef.ac.uk>
Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu>


On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:

> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> 	-Heikki
>> ...
>>
>> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use.  However, what if the test file is old (as many in t/data are)
>> and the format has changed?  GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just  
> layout? e.g. a version number of the likes?
>
> Nath

I don't think so; this veers back into the whole validation issue  
(i.e. does the record fit certain specifications).  There are  
examples of seq records from different sources which bioperl is  
expected to parse, for example Ensembl GenBank records.  Some of  
those have feature tags or annotation fields which may not appear in  
output when using write_seq().

I don't think it's as important to replicate the output data exactly  
like the input as much as it's important to have the data represented  
in a Bio::Seq object (or any other Bio* instance) in a consistent  
manner and have the ability to incorporate new fields (such as the  
recent addition of genome projects) transparently.  The latter is  
hard to do with the current genbank parser (you have to specifically  
code for it), but it is a bit easier to do with the driver-handler  
model I'm working on.

chris


From bix at sendu.me.uk  Thu Jul  5 11:56:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:56:29 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <468D14AD.8050007@sendu.me.uk>

Sendu Bala wrote:
> Sendu Bala wrote:
>> Nathan S. Haigh wrote:
>>> Thinking about this a little more, I think it would be a good idea to 
>>> include Test::Exception in t/lib.
>> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
> 
> I've now done that: BioperlTest loads Test::Exception, from the copy in 
> t/lib if necessary.
> 
> So, in BioperlTest-using scripts you now have access to the methods 
> dies_ok, lives_ok, throws_ok and lives_and.

And I've also now added in support for Test::Warn, giving you 
warning_is, warnings_are, warning_like and warnings_like.

I've updated the HOWTO as well:
http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

You can see these things in action in t/seq_quality.t


From bix at sendu.me.uk  Thu Jul  5 11:57:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:57:23 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
	<2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
Message-ID: <468D14E3.6030104@sendu.me.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:
> 
>> ...
>>
>> So its my understanding there will be absolutely no difference in
>> behaviour following this change (except that warning can be caught by
>> Test::Warn). I just wanted to confirm my understanding.
> 
> You can always just try it out and run tests.  Might be interesting to 
> see if anything breaks.

I've made the change. Everything seems ok as far as I can tell.


From dmessina at wustl.edu  Thu Jul  5 12:02:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:02:26 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>


On Jul 5, 2007, at 9:33 AM, Chris Fields wrote:
> I agree, but I think there is still an expectation that 1.5.2 and
> beyond are more like true 'stable' releases even though we still
> designate them as 'developer.'   We unfortunately reinforce that when
> we tell users they need to update to v. 1.5.2 or bioperl-live to fix
> a particular bug in the 1.4 release.

I know this has been discussed before, but while we're talking about  
future release plans, it might be worth revisiting the BioPerl policy  
of designating only even-numbered releases as 'stable'. It's taking  
so long to get from 1.4 to 1.6. While the principle of keeping a  
stable API between 'stable' releases is valid in the ideal case, I  
think that continuing to label 1.5.2 (or whatever the latest 'dev'  
release is) as a developer release (which implies potentially  
unstable or bleeding-edge code) is highly misleading since we would  
never ever tell anyone to get 1.4 instead.

Alternatively, if we adopt a more aggressive release schedule as  
Chris proposed a couple days ago, then perhaps we could agree to push  
out an even-numbered release once a year or so, so that there is a  
'stable' release we could recommend.


> If we feel a nightly snapshot is warranted we could do that though.
> I personally don't think there is a need, particularly since we have
> several means to obtain the latest code at any point in time
> (including the browsable CVS 'Download tarball').  We could state the
> next dev/stable CPAN release (pending on date dd/mm/yy) will have the
> bug fix, and if they want it immediately then pick it up from CVS.

To make it easier for people to obtain the latest tarball, we could  
put the 'download tarball' link directly on the 'Getting_BioPerl'  
wiki page instead of only a link to the viewcvs interface. That way  
they wouldn't have to navigate the source tree to figure out which  
tarball they want (which is almost always going to be the bioperl- 
live tarball).

I think the actual URL underlying the 'Download tarball' link on  
viewcvs is stable:

	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
live.tar.gz?tarball=1


Dave


From cjfields at uiuc.edu  Thu Jul  5 12:13:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:13:30 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
Message-ID: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>


On Jul 5, 2007, at 11:02 AM, David Messina wrote:

> ...
> I know this has been discussed before, but while we're talking  
> about future release plans, it might be worth revisiting the  
> BioPerl policy of designating only even-numbered releases as  
> 'stable'. It's taking so long to get from 1.4 to 1.6. While the  
> principle of keeping a stable API between 'stable' releases is  
> valid in the ideal case, I think that continuing to label 1.5.2 (or  
> whatever the latest 'dev' release is) as a developer release (which  
> implies potentially unstable or bleeding-edge code) is highly  
> misleading since we would never ever tell anyone to get 1.4 instead.
>
> Alternatively, if we adopt a more aggressive release schedule as  
> Chris proposed a couple days ago, then perhaps we could agree to  
> push out an even-numbered release once a year or so, so that there  
> is a 'stable' release we could recommend.

I think the idea of 'stable' is best summarized back in Hilmar's post  
(i.e. we support a particular API for that release).  The 1.5  
releases I believe break some aspects of 1.4 API (some of the Feature/ 
Annotation stuff introduced before the official 1.5 release).  We  
still need to address some of those issues before a 1.6 which seems  
to be the only real stumbling block, but they are unfortunately not  
well-documented and are somewhat interwoven with GMOD code.

> ...
> To make it easier for people to obtain the latest tarball, we could  
> put the 'download tarball' link directly on the 'Getting_BioPerl'  
> wiki page instead of only a link to the viewcvs interface. That way  
> they wouldn't have to navigate the source tree to figure out which  
> tarball they want (which is almost always going to be the bioperl- 
> live tarball).
>
> I think the actual URL underlying the 'Download tarball' link on  
> viewcvs is stable:
>
> 	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
> live.tar.gz?tarball=1
>
> Dave

Sounds reasonable enough.  Do you want to do the honors?

chris


From dmessina at wustl.edu  Thu Jul  5 12:44:28 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:44:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>


> [Chris]
> The 1.5 releases I believe break some aspects of 1.4 API

Yes, this is true.

I question, though, whether it's relevant given that virtually no one  
uses 1.4 anymore. In any case, I would venture that the number of  
people who would be bitten by the 1.4->1.5 API change is much smaller  
than the number of people who download 1.4 and then ask us why it  
doesn't work.

I think that, rather than continuing to call 1.5.x the developer  
release in order to adhere to the API guarantee, it would be much  
clearer to users if we state clearly that everyone should download  
1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
changes.


>> [me]
>> we could put the 'download tarball' link directly on the  
>> 'Getting_BioPerl' wiki page
>
> [Chris]
> Sounds reasonable enough.  Do you want to do the honors?

Done.


Dave


From cjfields at uiuc.edu  Thu Jul  5 12:57:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:57:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>

On Jul 5, 2007, at 11:44 AM, David Messina wrote:

>
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no  
> one uses 1.4 anymore. In any case, I would venture that the number  
> of people who would be bitten by the 1.4->1.5 API change is much  
> smaller than the number of people who download 1.4 and then ask us  
> why it doesn't work.
>
> I think that, rather than continuing to call 1.5.x the developer  
> release in order to adhere to the API guarantee, it would be much  
> clearer to users if we state clearly that everyone should download  
> 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
> changes.

You'd be surprised how many are still using bioperl 1.2.3 (Ensembl)  
and 1.4 (any admin too scared to go with a 'dev' release).  The real  
answer is to get out a stable 1.6 ASAP.  The problem we currently  
have is (horrible Texas pun) 'too many pokers in the fire.'  We have  
svn migration, major changes in the test suite, talk about splitting  
bioperl, a lot of bugs to sort through, new code to add or work on,  
etc.  Not to mention our $jobs!

I think we should just bite the bullet and proceed with pulling out  
the controversial operator overloading in Bio::Annotation*, deprecate  
the tag methods in AnnotatableI, and go about fixing everything up.   
If that occurs (which seems to be the major impediment) and we get  
GMOD/GBrowse playing well with BioPerl then we can aim for a new  
stable release, and then institute a regular release cycle.

chris


From bpederse at gmail.com  Thu Jul  5 13:58:24 2007
From: bpederse at gmail.com (Brent Pedersen)
Date: Thu, 5 Jul 2007 10:58:24 -0700
Subject: [Bioperl-l] slippy map for genomic features.
Message-ID: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>

hi,
here's a side project i've been tinkering on in googlecode svn that
may be useful to some.
http://code.google.com/p/genome-browser/
it's a simple hack on top of OpenLayers (openlayers.org) to provide a
javascript slippy map interface and API to view and browse genomic
features. It can be used with any image generation program that can
accept &xmin= and &xmax= parameters through the url. -- though i
havent had it working it bioperl as bioperl generates images of
different height depending on the number of tracks.

there's a live example of the code in SVN here:
http://toxic.berkeley.edu/bpederse/genome-browser/
with images generated by a colleague's modules on first request. those
images are then cached by a simple perl script included in the SVN
repo. all subsequent requests are returned from the cache.
an image request (automatically generated by the javascript) looks like:
http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
but any implementation need only implement xmin and xmax. all other
parameters will be used for caching but are not required.

if anyone is interested in getting this going with bioperl image
generation--or improving the project in any way, let me know and i'll
add you as a committer and provide any javascript support that i can.

-brent

tar ball download:
http://genome-browser.googlecode.com/files/genome-browser-0.02.tar


From dmessina at wustl.edu  Thu Jul  5 14:39:16 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 13:39:16 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <DD6F2CE5-FE79-48D2-9410-FACA35AFEF9C@wustl.edu>

> The real answer is to get out a stable 1.6 ASAP.  The problem we  
> currently have is (horrible Texas pun) 'too many pokers in the  
> fire.'  We have svn migration, major changes in the test suite,  
> talk about splitting bioperl, a lot of bugs to sort through, new  
> code to add or work on, etc.  Not to mention our $jobs!

Yep, I hear ya.


> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*,  
> deprecate the tag methods in AnnotatableI, and go about fixing  
> everything up.  If that occurs (which seems to be the major  
> impediment) and we get GMOD/GBrowse playing well with BioPerl then  
> we can aim for a new stable release, and then institute a regular  
> release cycle.

That's a great plan. You're right -- better to devote energy to 1.6  
than to interim solutions.

Alright, I give, I give! :)
Dave


From glauberwagner at yahoo.com.br  Thu Jul  5 15:56:43 2007
From: glauberwagner at yahoo.com.br (Glauber Wagner)
Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART)
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com>

Dear All,

I have a problem if Bio::DB::Query::GenBank module. I
am trying to count the number of protein sequences and
the module did not return the expected number by count
object.

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

$query_string = "Trypanosoma cruzi[Organism]";

  my $query =
Bio::DB::Query::GenBank->new(-db=>'protein',
                                           
-query=>$query_string);
   my $count = $query->count;
   my @ids   = $query->ids;

print "$count\n";

Thanks.
Glauber


____________________________________________________________________________________
Novo Yahoo! Cad?? - Experimente uma nova busca.
http://yahoo.com.br/oqueeuganhocomisso 


From cjfields at uiuc.edu  Thu Jul  5 16:21:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 15:21:49 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>

NCBI esearch doesn't seem to be working at the moment.  I'm getting  
'Internal Server Error' at this time.  Try back again at a later point.

chris

On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:

> Dear All,
>
> I have a problem if Bio::DB::Query::GenBank module. I
> am trying to count the number of protein sequences and
> the module did not return the expected number by count
> object.
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query_string = "Trypanosoma cruzi[Organism]";
>
>   my $query =
> Bio::DB::Query::GenBank->new(-db=>'protein',
>
> -query=>$query_string);
>    my $count = $query->count;
>    my @ids   = $query->ids;
>
> print "$count\n";
>
> Thanks.
> Glauber
>
>
>
>
> ______________________________________________________________________ 
> ______________
> Novo Yahoo! Cad?? - Experimente uma nova busca.
> http://yahoo.com.br/oqueeuganhocomisso
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mitch_skinner at berkeley.edu  Thu Jul  5 17:22:38 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 05 Jul 2007 14:22:38 -0700
Subject: [Bioperl-l] slippy map for genomic features.
In-Reply-To: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
References: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
Message-ID: <468D611E.7020904@berkeley.edu>

Hi,

FWIW, we've been working on something similar:
http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html
based on GBrowse/Bio::Graphics and javascript that Andrew wrote from 
scratch (with the prototype library).  When our project was starting up 
(fall 05) Andrew looked but didn't find openlayers; I'm not sure if it 
was public back then but their current svn only goes back to 2006.

I think that things like layout (bumping) ought to be done in advance on 
a chromosome-wide basis; otherwise it's difficult to keep features from 
ending up at different heights on neighboring tiles.  And it would be 
difficult for the server to know what was being clicked on.  So we've 
been doing some up-front work to either do layout or to just render all 
the tiles in advance:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup
which is driven by this script:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup

Or you could just not bump at all, I guess.  I think of that as 
important functionality but I'd be interested in hearing about use cases 
where it's not necessary.  It's not just bumping, though; things like 
text labels also make it difficult to predict exactly what pixels a 
feature will span if you only have its genomic coordinates.

To make features clickable we've been using imagemaps; it simplifies the 
server code but it bogs down the client quite a bit.

I'd certainly be interested in seeing if there are ways we could work 
together; if you're at Berkeley maybe we could meet.

Regards,
Mitch

Brent Pedersen wrote:
> hi,
> here's a side project i've been tinkering on in googlecode svn that
> may be useful to some.
> http://code.google.com/p/genome-browser/
> it's a simple hack on top of OpenLayers (openlayers.org) to provide a
> javascript slippy map interface and API to view and browse genomic
> features. It can be used with any image generation program that can
> accept &xmin= and &xmax= parameters through the url. -- though i
> havent had it working it bioperl as bioperl generates images of
> different height depending on the number of tracks.
>
> there's a live example of the code in SVN here:
> http://toxic.berkeley.edu/bpederse/genome-browser/
> with images generated by a colleague's modules on first request. those
> images are then cached by a simple perl script included in the SVN
> repo. all subsequent requests are returned from the cache.
> an image request (automatically generated by the javascript) looks like:
> http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
> but any implementation need only implement xmin and xmax. all other
> parameters will be used for caching but are not required.
>
> if anyone is interested in getting this going with bioperl image
> generation--or improving the project in any way, let me know and i'll
> add you as a committer and provide any javascript support that i can.
>
> -brent
>
> tar ball download:
> http://genome-browser.googlecode.com/files/genome-browser-0.02.tar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From cjfields at uiuc.edu  Thu Jul  5 17:42:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 16:42:40 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
	<190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu>

Update: seems to be back up.  Give it a try now.

chris

On Jul 5, 2007, at 3:21 PM, Chris Fields wrote:

> NCBI esearch doesn't seem to be working at the moment.  I'm getting
> 'Internal Server Error' at this time.  Try back again at a later  
> point.
>
> chris
>
> On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:
>
>> Dear All,
>>
>> I have a problem if Bio::DB::Query::GenBank module. I
>> am trying to count the number of protein sequences and
>> the module did not return the expected number by count
>> object.
>>
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>>
>> $query_string = "Trypanosoma cruzi[Organism]";
>>
>>   my $query =
>> Bio::DB::Query::GenBank->new(-db=>'protein',
>>
>> -query=>$query_string);
>>    my $count = $query->count;
>>    my @ids   = $query->ids;
>>
>> print "$count\n";
>>
>> Thanks.
>> Glauber
>>
>>
>>
>>
>> _____________________________________________________________________ 
>> _
>> ______________
>> Novo Yahoo! Cad?? - Experimente uma nova busca.
>> http://yahoo.com.br/oqueeuganhocomisso
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Jul  6 03:09:17 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 08:09:17 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <468DEA9D.6010809@sheffield.ac.uk>

David Messina wrote:
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>>     
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no one  
> uses 1.4 anymore. In any case, I would venture that the number of  
> people who would be bitten by the 1.4->1.5 API change is much smaller  
> than the number of people who download 1.4 and then ask us why it  
> doesn't work.
>   

I'm not really up-to-speed with how the API should remain stable etc. Is 
the idea that the API should be stable from 1.4 though the 1.5 dev and 
then the next stale release can change that API? So any stable to stable 
upgrade could involve an API change while a stable to dev upgrade should 
have the same API? Does a stable API mean that the same method calls are 
available in a newer release....what about adding new methods to a newer 
release?

How are these API changes currently tracked? It seems to me that 
Test::More might be able to help in testing the API:

can_ok($module, @methods);


Nath


From n.haigh at sheffield.ac.uk  Fri Jul  6 07:10:14 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 12:10:14 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
Message-ID: <468E2316.1030804@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm taking a look at the tests for Bio::Variation::RNAChange.

If you create a new oject without arguments:
my $obj = Bio::Variation::RNAChange->new();

What do you expect the following to return:
$obj->label();

I thought it would probably be:
'inframe'

However you get:
'inframe, deletion'

Can anyone in the know explain what behaviour would be expected?

Cheers
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit
B8DxDViDOcx2gTFjSwQ2kNg=
=SroY
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 08:54:33 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 13:54:33 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E2316.1030804@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
Message-ID: <468E3B89.3090202@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan S. Haigh wrote:
> I'm taking a look at the tests for Bio::Variation::RNAChange.
> 
> If you create a new oject without arguments:
> my $obj = Bio::Variation::RNAChange->new();
> 
> What do you expect the following to return:
> $obj->label();
> 
> I thought it would probably be:
> 'inframe'
> 
> However you get:
> 'inframe, deletion'
> 
> Can anyone in the know explain what behaviour would be expected?
> 
> Cheers
> Nath

Following on from this, AAChange has the following two methods:
add_Allele() and allele_mut()

It appears that allele_mut is only capable of remembering 1 allele at a
time, whereas add_Allele() is provided to add support for mutliple
alleles - is that correct?

However, add_Allele() also calls allele_mut(), such that mutliple calls
to add_Allele will result in the overwriting of the allele being
remembered by allele_mut(). Things are further complicated by the fact
that label() uses allele_mut() to decide on the label to return.
Shouldn't label know aout multiple alleles set by multiple calls to
add_Allele?

It may be my lack of understanding alleles and what these classes are
intending to do, but trying to rewrite the test scripts to improve code
coverage has let me a little confused!

Thanks
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I
b8ZOENvDDDIxphAoxeKg8/E=
=f/sa
-----END PGP SIGNATURE-----


From tanzeem.mb at gmail.com  Thu Jul  5 02:39:34 2007
From: tanzeem.mb at gmail.com (tanzeem)
Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT)
Subject: [Bioperl-l] Problem working with remoteblast submit method in
 webbrowser.
In-Reply-To: <11114623.post@talk.nabble.com>
References: <11114623.post@talk.nabble.com>
Message-ID: <11441586.post@talk.nabble.com>


Ifound it myself.run apache as root and disable selinux, the problem will not
recur.

tanzeem wrote:
> 
>  I have a program which uses the Bio perl remoteblast module which
> compares a aminoacid  fasta file with swissprot database. The
> submit_blast() method  works successfully when   run  from commandline.But
> when the program is run from web browser it returns -1. I was trying to
> adapt the code from Remoteblast synopsis for my need.
> 

-- 
View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Fri Jul  6 09:00:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 06 Jul 2007 09:00:32 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <1183726832.2566.34.camel@localhost.localdomain>

On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
> 
> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*, deprecate  
> the tag methods in AnnotatableI, and go about fixing everything up.   
> If that occurs (which seems to be the major impediment) and we get  
> GMOD/GBrowse playing well with BioPerl then we can aim for a new  
> stable release, and then institute a regular release cycle.
> 
I think this sounds like a good idea to me too.  I'm planning on having
a GMOD hackathon at the end of the summer; if I had a new API by then,
we could focus on fixing anything that gets broken by the changes.

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070706/d77c2d90/attachment-0003.bin>

From cjfields at uiuc.edu  Fri Jul  6 09:10:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 6 Jul 2007 08:10:41 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
Message-ID: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>


On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:

> David Messina wrote:
>>> [Chris]
>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>
>>
>> Yes, this is true.
>>
>> I question, though, whether it's relevant given that virtually no one
>> uses 1.4 anymore. In any case, I would venture that the number of
>> people who would be bitten by the 1.4->1.5 API change is much smaller
>> than the number of people who download 1.4 and then ask us why it
>> doesn't work.
>>
>
> I'm not really up-to-speed with how the API should remain stable  
> etc. Is
> the idea that the API should be stable from 1.4 though the 1.5 dev and
> then the next stale release can change that API? So any stable to  
> stable
> upgrade could involve an API change while a stable to dev upgrade  
> should
> have the same API? Does a stable API mean that the same method  
> calls are
> available in a newer release....what about adding new methods to a  
> newer
> release?
>
> How are these API changes currently tracked? It seems to me that
> Test::More might be able to help in testing the API:
>
> can_ok($module, @methods);
>
>
> Nath	

It's basically a 'contract' of sorts between the devs (us) and users  
(us/them) that the API won't change for the extent of that release  
series, thus ensuring any scripts out there generating tons of data  
won't break down if they attempt to call a renamed method.  We try to  
maintain the API state anyway for those reasons, but in a dev release  
series we might decide to change some method names for consistency  
and deprecate older ambiguously-named methods (see below).  For a  
stable release it's critical the API remain intact.

There are a few methods which are considered deprecated or will be  
deprecated.  For instance, we recently talked about changes to method  
names which use case to specify whether you're receiving an object  
(get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
nested list, or whether to use each_* vs next_* for iterators.   
Consistency is nice!

chris 


From heikki at sanbi.ac.za  Fri Jul  6 09:20:26 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 6 Jul 2007 15:20:26 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E3B89.3090202@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
Message-ID: <200707061520.27000.heikki@sanbi.ac.za>

Hi Nat,

These modules have not been touched for a while and were developed for a 
specific task. A revire is defiitely in order.

The way RNAChange->label was written, it should return 'inframe' when given no 
alleles, but 'no change' would actually be better.

The multiple alleles were originally though to be a good idea, but the 
vocabulary for labels was developed for single allele, only, The use of the 
module ended up being limited to single allele, so add_allele() behaviour was  
conveniently ignored but not removed. :(

	-Heikki


On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> Nathan S. Haigh wrote:
> > I'm taking a look at the tests for Bio::Variation::RNAChange.
> >
> > If you create a new oject without arguments:
> > my $obj = Bio::Variation::RNAChange->new();
> >
> > What do you expect the following to return:
> > $obj->label();
> >
> > I thought it would probably be:
> > 'inframe'
> >
> > However you get:
> > 'inframe, deletion'
> >
> > Can anyone in the know explain what behaviour would be expected?
> >
> > Cheers
> > Nath
>
> Following on from this, AAChange has the following two methods:
> add_Allele() and allele_mut()
>
> It appears that allele_mut is only capable of remembering 1 allele at a
> time, whereas add_Allele() is provided to add support for mutliple
> alleles - is that correct?
>
> However, add_Allele() also calls allele_mut(), such that mutliple calls
> to add_Allele will result in the overwriting of the allele being
> remembered by allele_mut(). Things are further complicated by the fact
> that label() uses allele_mut() to decide on the label to return.
> Shouldn't label know aout multiple alleles set by multiple calls to
> add_Allele?
>
> It may be my lack of understanding alleles and what these classes are
> intending to do, but trying to rewrite the test scripts to improve code
> coverage has let me a little confused!
>
> Thanks
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From schlesi at ebi.ac.uk  Fri Jul  6 10:24:05 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Fri, 6 Jul 2007 15:24:05 +0100
Subject: [Bioperl-l] Unrooting a tree
Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>

Hi,

I am reading a rooted tree in newick format from a string (i.e. a
bifurcation at the root) and would like to unroot it (i.e. a
trifurcation at the root). I tried getting a grandchild of the root
and adding it as a direct child, but that does not seem to work (the
root still only has two descendents and the tree structure gets messed
up). Is there a nice way to do this directly in bioperl? Doing it on
the newick string is possible of course, but not nice.

Thanks
  Felix


From n.haigh at sheffield.ac.uk  Fri Jul  6 11:37:19 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:37:19 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
Message-ID: <468E61AF.9040106@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:
> 
> On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:
> 
>> David Messina wrote:
>>>> [Chris]
>>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>>
>>>
>>> Yes, this is true.
>>>
>>> I question, though, whether it's relevant given that virtually no one
>>> uses 1.4 anymore. In any case, I would venture that the number of
>>> people who would be bitten by the 1.4->1.5 API change is much smaller
>>> than the number of people who download 1.4 and then ask us why it
>>> doesn't work.
>>>
>>
>> I'm not really up-to-speed with how the API should remain stable etc. Is
>> the idea that the API should be stable from 1.4 though the 1.5 dev and
>> then the next stale release can change that API? So any stable to stable
>> upgrade could involve an API change while a stable to dev upgrade should
>> have the same API? Does a stable API mean that the same method calls are
>> available in a newer release....what about adding new methods to a newer
>> release?
>>
>> How are these API changes currently tracked? It seems to me that
>> Test::More might be able to help in testing the API:
>>
>> can_ok($module, @methods);
>>
>>
>> Nath   
> 
> It's basically a 'contract' of sorts between the devs (us) and users
> (us/them) that the API won't change for the extent of that release
> series, thus ensuring any scripts out there generating tons of data
> won't break down if they attempt to call a renamed method.  We try to
> maintain the API state anyway for those reasons, but in a dev release
> series we might decide to change some method names for consistency and
> deprecate older ambiguously-named methods (see below).  For a stable
> release it's critical the API remain intact.

Hmm, still not 100% clear - it is Friday!

So, someone running a script that was designed when 1.4 was released
should still be able to run their script for all future releases. So all
changes need to be backward compatible?

So you have several situations regarding method names:
1) Adding new methods should e fine since past scripts don't know about
them and won't have used them
2) Removing methods would break past scripts that used them
3) Renamed methods would break past scripts that used the old name

A stable API to me, means the same method calls should still be able to
accept the same arguments (inc the constructor) and return the same
object/data etc.

What if a module is pretty outdated and would benefit from a rewrite -
should all the old method names be included, what if this makes coding
difficult?

> 
> There are a few methods which are considered deprecated or will be
> deprecated.  For instance, we recently talked about changes to method
> names which use case to specify whether you're receiving an object
> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested
> list, or whether to use each_* vs next_* for iterators.  Consistency is
> nice!
> 

You mean the use of case to signify objects vs data being returned are
to be deprecated or encouraged? What was the outcome of the each_* vs
next_*?

Nath


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk
kAWH1zVa1ycopijl761cvkQ=
=fppH
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 11:43:41 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:43:41 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
Message-ID: <468E632D.4090801@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heikki Lehvaslaiho wrote:
> Hi Nat,
> 
> These modules have not been touched for a while and were developed for a 
> specific task. A revire is defiitely in order.
> 
> The way RNAChange->label was written, it should return 'inframe' when given no 
> alleles, but 'no change' would actually be better.

Wouldn't this effectively be changing the API since past scripts "could"
expect "inframe" to be returned.

> 
> The multiple alleles were originally though to be a good idea, but the 
> vocabulary for labels was developed for single allele, only, The use of the 
> module ended up being limited to single allele, so add_allele() behaviour was  
> conveniently ignored but not removed. :(

So add_Allele() and each_Allele() should be deprecated in favour of
allele_mut()?

- From my post about API's.....how should the capitalisation of
add_Allele() and each_Allele() be changed?

Cheers
Nath


> 
> 	-Heikki
> 
> 
> 
> On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
>> Nathan S. Haigh wrote:
>>> I'm taking a look at the tests for Bio::Variation::RNAChange.
>>>
>>> If you create a new oject without arguments:
>>> my $obj = Bio::Variation::RNAChange->new();
>>>
>>> What do you expect the following to return:
>>> $obj->label();
>>>
>>> I thought it would probably be:
>>> 'inframe'
>>>
>>> However you get:
>>> 'inframe, deletion'
>>>
>>> Can anyone in the know explain what behaviour would be expected?
>>>
>>> Cheers
>>> Nath
>> Following on from this, AAChange has the following two methods:
>> add_Allele() and allele_mut()
>>
>> It appears that allele_mut is only capable of remembering 1 allele at a
>> time, whereas add_Allele() is provided to add support for mutliple
>> alleles - is that correct?
>>
>> However, add_Allele() also calls allele_mut(), such that mutliple calls
>> to add_Allele will result in the overwriting of the allele being
>> remembered by allele_mut(). Things are further complicated by the fact
>> that label() uses allele_mut() to decide on the label to return.
>> Shouldn't label know aout multiple alleles set by multiple calls to
>> add_Allele?
>>
>> It may be my lack of understanding alleles and what these classes are
>> intending to do, but trying to rewrite the test scripts to improve code
>> coverage has let me a little confused!
>>
>> Thanks
>> Nath
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue
GBHuSHfsesX1ko55s+ME2Zc=
=tkG8
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Sat Jul  7 16:57:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 15:57:37 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
	<1183726832.2566.34.camel@localhost.localdomain>
Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu>

We'll prob. get a start soon, then.  I'll let you know when we start.

chris

On Jul 6, 2007, at 8:00 AM, Scott Cain wrote:

> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
>>
>> I think we should just bite the bullet and proceed with pulling out
>> the controversial operator overloading in Bio::Annotation*, deprecate
>> the tag methods in AnnotatableI, and go about fixing everything up.
>> If that occurs (which seems to be the major impediment) and we get
>> GMOD/GBrowse playing well with BioPerl then we can aim for a new
>> stable release, and then institute a regular release cycle.
>>
> I think this sounds like a good idea to me too.  I'm planning on  
> having
> a GMOD hackathon at the end of the summer; if I had a new API by then,
> we could focus on fixing anything that gets broken by the changes.
>
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Jul  7 17:17:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 16:17:14 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468E61AF.9040106@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
	<468E61AF.9040106@sheffield.ac.uk>
Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu>


On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote:

> ...
> Hmm, still not 100% clear - it is Friday!
>
> So, someone running a script that was designed when 1.4 was released
> should still be able to run their script for all future releases.  
> So all
> changes need to be backward compatible?

It helps.  For instance, if we change method names (rename each_Foo  
as next_Foo), we should have each_Foo delegate to next_Foo for the  
time being.  If we plan on deprecating the old method altogether we  
would add a warning message when it's called, then delegate.

It's a better solution than just changing the method outright, which  
means the user has to search through docs to find the renamed method.

> So you have several situations regarding method names:
> 1) Adding new methods should e fine since past scripts don't know  
> about
> them and won't have used them
> 2) Removing methods would break past scripts that used them
> 3) Renamed methods would break past scripts that used the old name
>
> A stable API to me, means the same method calls should still be  
> able to
> accept the same arguments (inc the constructor) and return the same
> object/data etc.

Yes.

> What if a module is pretty outdated and would benefit from a rewrite -
> should all the old method names be included, what if this makes coding
> difficult?

It depends on the module.  If a complete rewrite is needed then maybe  
starting with a new module/interface is best, and we could deprecate  
the older module completely.  That has been done already with  
Bio::Tools::BPLite (in favor of SearchIO) and a few other modules.

>> There are a few methods which are considered deprecated or will be
>> deprecated.  For instance, we recently talked about changes to method
>> names which use case to specify whether you're receiving an object
>> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
>> nested
>> list, or whether to use each_* vs next_* for iterators.   
>> Consistency is
>> nice!
>>
>
> You mean the use of case to signify objects vs data being returned are
> to be deprecated or encouraged? What was the outcome of the each_* vs
> next_*?
>
> Nath

Here's the section I added to the wiki (it started in a thread a few  
weeks or so ago, so it's a summary really):

http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names

Feel free to add to it or make suggestions.

BTWm Hilmar mentioned there was a movement to rename methods in old  
code to follow these recs but it was never completed.  It should be  
taken up again at some point but the recommendations are mainly here  
for newer code.

chris


From heikki at sanbi.ac.za  Sun Jul  8 03:32:21 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 8 Jul 2007 09:32:21 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E632D.4090801@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
	<468E632D.4090801@sheffield.ac.uk>
Message-ID: <200707080932.21818.heikki@sanbi.ac.za>

On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote:
> Heikki Lehvaslaiho wrote:
> > Hi Nat,
> >
> > These modules have not been touched for a while and were developed for a
> > specific task. A revire is defiitely in order.
> >
> > The way RNAChange->label was written, it should return 'inframe' when
> > given no alleles, but 'no change' would actually be better.
>
> Wouldn't this effectively be changing the API since past scripts "could"
> expect "inframe" to be returned.

Checking tha actal usage and what happens when you do change of a nucleotide 
to itself, you get the label 'silent'. I guess that would be a valid lable 
value even when the alleles are not initialised, too.

> > The multiple alleles were originally though to be a good idea, but the
> > vocabulary for labels was developed for single allele, only, The use of
> > the module ended up being limited to single allele, so add_allele()
> > behaviour was conveniently ignored but not removed. :(
>
> So add_Allele() and each_Allele() should be deprecated in favour of
> allele_mut()?

Yes.

> From my post about API's.....how should the capitalisation of
> add_Allele() and each_Allele() be changed?

Definitely, keept the current ones as deprecated alternatives.


    -Heikki

> Cheers
> Nath
>
> > 	-Heikki
> >
> > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> >> Nathan S. Haigh wrote:
> >>> I'm taking a look at the tests for Bio::Variation::RNAChange.
> >>>
> >>> If you create a new oject without arguments:
> >>> my $obj = Bio::Variation::RNAChange->new();
> >>>
> >>> What do you expect the following to return:
> >>> $obj->label();
> >>>
> >>> I thought it would probably be:
> >>> 'inframe'
> >>>
> >>> However you get:
> >>> 'inframe, deletion'
> >>>
> >>> Can anyone in the know explain what behaviour would be expected?
> >>>
> >>> Cheers
> >>> Nath
> >>
> >> Following on from this, AAChange has the following two methods:
> >> add_Allele() and allele_mut()
> >>
> >> It appears that allele_mut is only capable of remembering 1 allele at a
> >> time, whereas add_Allele() is provided to add support for mutliple
> >> alleles - is that correct?
> >>
> >> However, add_Allele() also calls allele_mut(), such that mutliple calls
> >> to add_Allele will result in the overwriting of the allele being
> >> remembered by allele_mut(). Things are further complicated by the fact
> >> that label() uses allele_mut() to decide on the label to return.
> >> Shouldn't label know aout multiple alleles set by multiple calls to
> >> add_Allele?
> >>
> >> It may be my lack of understanding alleles and what these classes are
> >> intending to do, but trying to rewrite the test scripts to improve code
> >> coverage has let me a little confused!
> >>
> >> Thanks
> >> Nath
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From xing.y.hu at gmail.com  Mon Jul  9 02:26:40 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Mon, 09 Jul 2007 14:26:40 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
Message-ID: <4691D520.60700@gmail.com>

Hi friends,

    I wrote a script for getting genomic sequence file from GenBank. To 
fulfill that target, I used DB::GenBank module to get the sequence via 
get_Seq_by_acc, and it works well. But this time, facing enormous amount 
of ESTs, I have no idea how to download them swiftly and elegantly.

    PROBLEM DESCRIPTION:
    goal: download all EST files of a specific species from GenBank, say 
Arabidopsis Thaliana or Oryza sativa(rice).
    other: whether all of ESTs are in a single file or separatedly 
placed does not matter.

    Can I use a bioperl script to achieve that? And How? I really 
appreciate.

Xing.


From akozik at atgc.org  Mon Jul  9 08:25:14 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Mon, 09 Jul 2007 05:25:14 -0700
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4691D520.60700@gmail.com>
References: <4691D520.60700@gmail.com>
Message-ID: <4692292A.1080900@atgc.org>

To download genomic sequences or ESTs for any organism (in various 
formats) you can use NCBI Taxonomy Browser:
http://www.ncbi.nlm.nih.gov/Taxonomy/

you can use taxonomy id to access different organisms, Arabidopsis for 
example (3702):
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702

or by direct web link:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1

assembled genomes can be accessed via ftp:
ftp://ftp.ncbi.nih.gov/genomes/

To download large amount of selected sequences (ESTs for example) you 
can use batch Entrez:
http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
(select EST for EST, it's critical)

It seems, to solve the problem you describe, you don't need to use 
bioperl. NCBI GenBank Entrez provides all necessary tools to work on 
these simple and frequent tasks.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


Xing Hu wrote:
> Hi friends,
> 
>     I wrote a script for getting genomic sequence file from GenBank. To 
> fulfill that target, I used DB::GenBank module to get the sequence via 
> get_Seq_by_acc, and it works well. But this time, facing enormous amount 
> of ESTs, I have no idea how to download them swiftly and elegantly.
> 
>     PROBLEM DESCRIPTION:
>     goal: download all EST files of a specific species from GenBank, say 
> Arabidopsis Thaliana or Oryza sativa(rice).
>     other: whether all of ESTs are in a single file or separatedly 
> placed does not matter.
> 
>     Can I use a bioperl script to achieve that? And How? I really 
> appreciate.
> 
> Xing.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jul  9 10:17:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Jul 2007 09:17:23 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4692292A.1080900@atgc.org>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>

Caveat: if you have millions of ESTs please consider NOT using my  
eutil script below or NCBI Batch Entrez, which would repeatedly hit  
the NCBI server thousands of times.  At least try looking for other  
ways to retrieve the data you want (ftp, organism-specific resources  
like Ensembl, so on), or run any scripts or data retrieval in off  
hours so you don't overtax the NCBI server.

There is a way you can use BioPerl if you don't mind living on the  
bleeding edge by using bioperl-live (core code from CVS).  I have  
been working on a set of modules for the last year  
(Bio::DB::EUtilities) which interact with all the various eutils for  
building data pipelines which uses the NCBI CGI interface.  You could  
possibly retrieve all relevant ESTs using a variation of the example  
script here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch

Note that the code examples do NOT work with rel. 1.5.2 code as the  
API has changed quite a bit; I'm working to rectify some of that.

The script I would use is below.  It retrieves batches of 500  
sequences (in fasta format) at a time, for a total of 10000 max seq  
records, saving the raw record data directly to a file (appending as  
you go along).  I added an eval block to check the server status and  
redo the call up to 4 times before giving up completely.  Using eval  
this way hasn't been extensively tested but should work.

---------------------------------------

use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'nucest',
                                        -term => 'txid3702',
                                        -usehistory => 'y',
                                        -keep_histories => 1);

my $count = $factory->get_count;

print "Count: $count\n";

if (my $hist = $factory->next_History) {
     print "History returned\n";
     # note db carries over from above
     $factory->set_parameters(-eutil => 'efetch',
                              -rettype => 'fasta',
                              -history => $hist);
     my ($retmax, $retstart) = (500,0);
     my $retry = 1;
     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq  
records to return
     RETRIEVE_SEQS:
     while ($retstart < $maxcount) {
         print "Returning from ",$retstart+1," to ",$retstart+ 
$retmax,"\n";
         $factory->set_parameters(-retmax => $retmax,
                                 -retstart => $retstart);
         # check in case of server error
         eval{
             $factory->get_Response(-file => ">>ESTs.fas");
         };
         if ($@) {
             die "Server error: $@.  Try again later" if $retry == 5;
             print STDERR "Server error, redo #$retry\n";
             $retry++ && redo RETRIEVE_SEQS;
         }
         $retstart += $retmax;
     }
}


---------------------------------------


chris

On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:

> To download genomic sequences or ESTs for any organism (in various
> formats) you can use NCBI Taxonomy Browser:
> http://www.ncbi.nlm.nih.gov/Taxonomy/
>
> you can use taxonomy id to access different organisms, Arabidopsis for
> example (3702):
> http://www.ncbi.nlm.nih.gov/sites/entrez? 
> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>
> or by direct web link:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>
> assembled genomes can be accessed via ftp:
> ftp://ftp.ncbi.nih.gov/genomes/
>
> To download large amount of selected sequences (ESTs for example) you
> can use batch Entrez:
> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
> (select EST for EST, it's critical)
>
> It seems, to solve the problem you describe, you don't need to use
> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
> these simple and frequent tasks.
>
> -Alex
>
> -- 
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 East Health Sciences Drive
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Xing Hu wrote:
>> Hi friends,
>>
>>     I wrote a script for getting genomic sequence file from  
>> GenBank. To
>> fulfill that target, I used DB::GenBank module to get the sequence  
>> via
>> get_Seq_by_acc, and it works well. But this time, facing enormous  
>> amount
>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>
>>     PROBLEM DESCRIPTION:
>>     goal: download all EST files of a specific species from  
>> GenBank, say
>> Arabidopsis Thaliana or Oryza sativa(rice).
>>     other: whether all of ESTs are in a single file or separatedly
>> placed does not matter.
>>
>>     Can I use a bioperl script to achieve that? And How? I really
>> appreciate.
>>
>> Xing.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon Jul  9 14:08:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 9 Jul 2007 11:08:07 -0700
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>

I don't think there is a function for this yet but it would be a good  
one to have.
I assume you don't really want to take a shot at writing it though?

To make this work I think you have to create a new node which  
contains the trifurcation and this node is what the root is set to.

-jason

On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:

> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From lstein at cshl.edu  Mon Jul  9 17:35:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 9 Jul 2007 17:35:49 -0400
Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager
Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com>

Hi Folks,

Sorry for the job spam. We're looking for a manager of the Cold Spring
Harbor Laboratory bioinformatics core facility. This is a semi-independent
staff position supporting  CSHL scientific researchers by providing
consultation, data mining and software development activities. You will have
a software staff of two, a  nice salary, good health benefits, and an
exciting and dynamic environment to work in. I'm looking for someone with a
strong bioinformatics background, at least five years experience programming
Perl, Java or Python in a academic or commercial environment, and management
experience. If you are interested, please send your CV and cover letter to
me.

Thanks,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From stewarta at nmrc.navy.mil  Mon Jul  9 18:16:12 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Mon, 9 Jul 2007 18:16:12 -0400
Subject: [Bioperl-l] rpsblast
Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil>

When I run...   $result = $factory->rpsblast($seq);   ... where $seq  
is a Bio::Seq object, it seems to simply copy the $seq object to  
$result;  When I run something similar... $rpsblast('/path/to/ 
myFile');    ... the value of $result then becomes '/path/to/myFile'.

Anyone else encounter this?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason_stajich at berkeley.edu  Mon Jul  9 21:36:10 2007
From: jason_stajich at berkeley.edu (Jason Stajich)
Date: Mon, 9 Jul 2007 18:36:10 -0700
Subject: [Bioperl-l] BOSC2007
Message-ID: <E6F5077E-50A3-489E-94B0-109FCAE6200F@berkeley.edu>

I posted a quick note about meeting up at BOSC/ISMB this year. If you  
are attending, please sign your name on the page or at least express  
an interest on whether you are interested in a BoF.  We'll try and  
discuss some of the current topics in BioPerl development as well try  
and use the time to coordinate any development that benefits from the  
face-to-face time.

http://bioperl.org/wiki/BOSC2007_Meetup
http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/

-jason
--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From schlesi at ebi.ac.uk  Tue Jul 10 08:58:00 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Tue, 10 Jul 2007 13:58:00 +0100
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
	<22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com>

Hi,

>  I don't think there is a function for this yet but it would be a good one
> to have.
> I assume you don't really want to take a shot at writing it though?
> To make this work I think you have to create a new node which contains the
> trifurcation and this node is what the root is set to.

Creating a new root is fine, but what would the (3) children of that
node be? I took a different approach now, where I iterate over all
(indirect) descendents of the root, find the first one which does not
have the root as its direct ancestor and move it up the tree, i.e.

foreach my $d ($root->get_all_Descendents){
  if ($d->ancestor != $root){
    $d->ancestor->remove_Descendent($d);
    if ($root->add_Descendent($d, 1) == 3){
    last;
  }}}

This will make the old root a trifurcation. It does the right thing
for what I am trying to do, but is not general I believe (it does for
example at the moment not worry about branch length). Also instead of
taking the first, taking the most distant possible subtree of a clade
up to the root might be better.

Felix


> On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:
>
> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


From xing.y.hu at gmail.com  Tue Jul 10 09:29:36 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Tue, 10 Jul 2007 21:29:36 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
Message-ID: <469389C0.5060303@gmail.com>

Thanks you guys.

I had to confess that how stupid I was. The easiest way seems to be the 
way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
fact, I knew that but I thought it was necessary to have all items 
selected before pressing save to launch download. So I was desperate to 
find a button that could achieve that without hundreds of thousands of 
clicking by me. "What about select none of those items at all?" -- This 
idea finally came to me after days of struggling and the problem was solved.

Xing


Chris Fields wrote:
> Caveat: if you have millions of ESTs please consider NOT using my 
> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
> the NCBI server thousands of times.  At least try looking for other 
> ways to retrieve the data you want (ftp, organism-specific resources 
> like Ensembl, so on), or run any scripts or data retrieval in off 
> hours so you don't overtax the NCBI server.
>
> There is a way you can use BioPerl if you don't mind living on the 
> bleeding edge by using bioperl-live (core code from CVS).  I have been 
> working on a set of modules for the last year (Bio::DB::EUtilities) 
> which interact with all the various eutils for building data pipelines 
> which uses the NCBI CGI interface.  You could possibly retrieve all 
> relevant ESTs using a variation of the example script here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>
> Note that the code examples do NOT work with rel. 1.5.2 code as the 
> API has changed quite a bit; I'm working to rectify some of that.
>
> The script I would use is below.  It retrieves batches of 500 
> sequences (in fasta format) at a time, for a total of 10000 max seq 
> records, saving the raw record data directly to a file (appending as 
> you go along).  I added an eval block to check the server status and 
> redo the call up to 4 times before giving up completely.  Using eval 
> this way hasn't been extensively tested but should work.
>
> ---------------------------------------
>
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                        -db => 'nucest',
>                                        -term => 'txid3702',
>                                        -usehistory => 'y',
>                                        -keep_histories => 1);
>
> my $count = $factory->get_count;
>
> print "Count: $count\n";
>
> if (my $hist = $factory->next_History) {
>     print "History returned\n";
>     # note db carries over from above
>     $factory->set_parameters(-eutil => 'efetch',
>                              -rettype => 'fasta',
>                              -history => $hist);
>     my ($retmax, $retstart) = (500,0);
>     my $retry = 1;
>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
> records to return
>     RETRIEVE_SEQS:
>     while ($retstart < $maxcount) {
>         print "Returning from ",$retstart+1," to 
> ",$retstart+$retmax,"\n";
>         $factory->set_parameters(-retmax => $retmax,
>                                 -retstart => $retstart);
>         # check in case of server error
>         eval{
>             $factory->get_Response(-file => ">>ESTs.fas");
>         };
>         if ($@) {
>             die "Server error: $@.  Try again later" if $retry == 5;
>             print STDERR "Server error, redo #$retry\n";
>             $retry++ && redo RETRIEVE_SEQS;
>         }
>         $retstart += $retmax;
>     }
> }
>
>
> ---------------------------------------
>
>
> chris
>
> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>
>> To download genomic sequences or ESTs for any organism (in various
>> formats) you can use NCBI Taxonomy Browser:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>
>> you can use taxonomy id to access different organisms, Arabidopsis for
>> example (3702):
>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>
>>
>> or by direct web link:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>
>>
>> assembled genomes can be accessed via ftp:
>> ftp://ftp.ncbi.nih.gov/genomes/
>>
>> To download large amount of selected sequences (ESTs for example) you
>> can use batch Entrez:
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>> (select EST for EST, it's critical)
>>
>> It seems, to solve the problem you describe, you don't need to use
>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>> these simple and frequent tasks.
>>
>> -Alex
>>
>> --Alexander Kozik
>> Bioinformatics Specialist
>> Genome and Biomedical Sciences Facility
>> 451 East Health Sciences Drive
>> University of California
>> Davis, CA 95616-8816
>> Phone: (530) 754-9127
>> email#1: akozik at atgc.org
>> email#2: akozik at gmail.com
>> web: http://www.atgc.org/
>>
>>
>>
>> Xing Hu wrote:
>>> Hi friends,
>>>
>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>> amount
>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>
>>>     PROBLEM DESCRIPTION:
>>>     goal: download all EST files of a specific species from GenBank, 
>>> say
>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>     other: whether all of ESTs are in a single file or separatedly
>>> placed does not matter.
>>>
>>>     Can I use a bioperl script to achieve that? And How? I really
>>> appreciate.
>>>
>>> Xing.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From davila at ioc.fiocruz.br  Tue Jul 10 09:58:29 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Tue, 10 Jul 2007 10:58:29 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <469389C0.5060303@gmail.com>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com>
Message-ID: <46939085.40906@ioc.fiocruz.br>

Hi Xing,

Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
and 13971 from T. cruzi 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
  that I cannot download at once in GenBank format... even when I select 
"GenBank" format in the Display menu I can only see and get/download 500 
ESTs each time...

I also downloaded all ESTs from GenBank (a pity there are not subsets of 
them !) but merging all them generate a file bigger than 120GB to be 
processed...

Just asked Diogo (my student) to give a try to the script sent by Chris 
Fields.. so finger crossed ;-)

Cheers, Alberto


Xing Hu wrote:
> Thanks you guys.
> 
> I had to confess that how stupid I was. The easiest way seems to be the 
> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
> fact, I knew that but I thought it was necessary to have all items 
> selected before pressing save to launch download. So I was desperate to 
> find a button that could achieve that without hundreds of thousands of 
> clicking by me. "What about select none of those items at all?" -- This 
> idea finally came to me after days of struggling and the problem was solved.
> 
> Xing
> 
> 
> 
> Chris Fields wrote:
>> Caveat: if you have millions of ESTs please consider NOT using my 
>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>> the NCBI server thousands of times.  At least try looking for other 
>> ways to retrieve the data you want (ftp, organism-specific resources 
>> like Ensembl, so on), or run any scripts or data retrieval in off 
>> hours so you don't overtax the NCBI server.
>>
>> There is a way you can use BioPerl if you don't mind living on the 
>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>> which interact with all the various eutils for building data pipelines 
>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>> relevant ESTs using a variation of the example script here:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>
>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>> API has changed quite a bit; I'm working to rectify some of that.
>>
>> The script I would use is below.  It retrieves batches of 500 
>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>> records, saving the raw record data directly to a file (appending as 
>> you go along).  I added an eval block to check the server status and 
>> redo the call up to 4 times before giving up completely.  Using eval 
>> this way hasn't been extensively tested but should work.
>>
>> ---------------------------------------
>>
>> use Bio::DB::EUtilities;
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                        -db => 'nucest',
>>                                        -term => 'txid3702',
>>                                        -usehistory => 'y',
>>                                        -keep_histories => 1);
>>
>> my $count = $factory->get_count;
>>
>> print "Count: $count\n";
>>
>> if (my $hist = $factory->next_History) {
>>     print "History returned\n";
>>     # note db carries over from above
>>     $factory->set_parameters(-eutil => 'efetch',
>>                              -rettype => 'fasta',
>>                              -history => $hist);
>>     my ($retmax, $retstart) = (500,0);
>>     my $retry = 1;
>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>> records to return
>>     RETRIEVE_SEQS:
>>     while ($retstart < $maxcount) {
>>         print "Returning from ",$retstart+1," to 
>> ",$retstart+$retmax,"\n";
>>         $factory->set_parameters(-retmax => $retmax,
>>                                 -retstart => $retstart);
>>         # check in case of server error
>>         eval{
>>             $factory->get_Response(-file => ">>ESTs.fas");
>>         };
>>         if ($@) {
>>             die "Server error: $@.  Try again later" if $retry == 5;
>>             print STDERR "Server error, redo #$retry\n";
>>             $retry++ && redo RETRIEVE_SEQS;
>>         }
>>         $retstart += $retmax;
>>     }
>> }
>>
>>
>> ---------------------------------------
>>
>>
>> chris
>>
>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>
>>> To download genomic sequences or ESTs for any organism (in various
>>> formats) you can use NCBI Taxonomy Browser:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>
>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>> example (3702):
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>
>>>
>>> or by direct web link:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>
>>>
>>> assembled genomes can be accessed via ftp:
>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>
>>> To download large amount of selected sequences (ESTs for example) you
>>> can use batch Entrez:
>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>> (select EST for EST, it's critical)
>>>
>>> It seems, to solve the problem you describe, you don't need to use
>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>> these simple and frequent tasks.
>>>
>>> -Alex
>>>
>>> --Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 East Health Sciences Drive
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>>
>>>
>>> Xing Hu wrote:
>>>> Hi friends,
>>>>
>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>> amount
>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>
>>>>     PROBLEM DESCRIPTION:
>>>>     goal: download all EST files of a specific species from GenBank, 
>>>> say
>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>     other: whether all of ESTs are in a single file or separatedly
>>>> placed does not matter.
>>>>
>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>> appreciate.
>>>>
>>>> Xing.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>


From cjfields at uiuc.edu  Tue Jul 10 10:05:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:05:43 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>

Just make sure you're using the latest from CVS.  Let me know if it  
doesn't work and I'll look into it.

chris

On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote:

> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei  
> ESTs
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=8)
> and 13971 from T. cruzi
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=11)
>   that I cannot download at once in GenBank format... even when I  
> select
> "GenBank" format in the Display menu I can only see and get/ 
> download 500
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not  
> subsets of
> them !) but merging all them generate a file bigger than 120GB to be
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by  
> Chris
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to  
>> be the
>> way using NCBI Taxonomy Browser which suggested by alex. As a  
>> matter of
>> fact, I knew that but I thought it was necessary to have all items
>> selected before pressing save to launch download. So I was  
>> desperate to
>> find a button that could achieve that without hundreds of  
>> thousands of
>> clicking by me. "What about select none of those items at all?" --  
>> This
>> idea finally came to me after days of struggling and the problem  
>> was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>> Caveat: if you have millions of ESTs please consider NOT using my
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit
>>> the NCBI server thousands of times.  At least try looking for other
>>> ways to retrieve the data you want (ftp, organism-specific resources
>>> like Ensembl, so on), or run any scripts or data retrieval in off
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the
>>> bleeding edge by using bioperl-live (core code from CVS).  I have  
>>> been
>>> working on a set of modules for the last year (Bio::DB::EUtilities)
>>> which interact with all the various eutils for building data  
>>> pipelines
>>> which uses the NCBI CGI interface.  You could possibly retrieve all
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. 
>>> 3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500
>>> sequences (in fasta format) at a time, for a total of 10000 max seq
>>> records, saving the raw record data directly to a file (appending as
>>> you go along).  I added an eval block to check the server status and
>>> redo the call up to 4 times before giving up completely.  Using eval
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms,  
>>>> Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez? 
>>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
>>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for  
>>>> example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to  
>>>> work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from  
>>>>> GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the  
>>>>> sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and  
>>>>> elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from  
>>>>> GenBank,
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From diogoat at gmail.com  Tue Jul 10 10:15:20 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 10 Jul 2007 11:15:20 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>

Deal All,
I use this script bellow, and it`s work very fine!
I only changed the query! And the script gave me the 5133 EST from T.
brucei.

#################################################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'gbdiv est[prop] AND Trypanosoma
brucei [organism]',
                                db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'Genbank',
                          -file => '>>Tbrucei.EST.fasta');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
####################################################################

Diogo Tschoeke/Fiocruz (Alberto`s Student)


From cjfields at uiuc.edu  Tue Jul 10 10:35:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:35:03 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
	<638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu>

That will work as well; the key difference between my example and  
this one is that the seq stream retrieved using Bio::DB::GenBank  
passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq  
record directly to a file (or callback or HTTP::Response) for  
optionally parsing later.

If you have problems with Bio::SeqIO you can always use  
Bio::DB::EUtilities to get around the issue until we resolve it.

chris

On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote:

> Deal All,
> I use this script bellow, and it`s work very fine!
> I only changed the query! And the script gave me the 5133 EST from T.
> brucei.
>
> ###################################################################### 
> ###########
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'gbdiv est[prop] AND  
> Trypanosoma
> brucei [organism]',
>                                 db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'Genbank',
>                           -file => '>>Tbrucei.EST.fasta');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> ####################################################################
>
> Diogo Tschoeke/Fiocruz (Alberto`s Student)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hartzell at alerce.com  Tue Jul 10 12:50:31 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 12:50:31 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
Message-ID: <18067.47319.254632.538811@almost.alerce.com>

Jason Stajich writes:
 > [...]
 > Do you know how to have svn commit messages generate summary emails  
 > as well?

I've made a local installation of the SVN::Notify bits in my home
directory and set up its notification script.  If folks are happy with
it then I'll work on getting The Powers That Be to do a real install
and we'll use it for the real repository.

It's currently configured to include diffs inline in the message.  I
prefer them as an attachment, but the current configuration of the
bioperl-guts-l list stalls messages w/ attachments and requires admin
intervention.  I have a support@ request going on it and will change
it if/when we get the issue resolved.

So, to review:

   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/

is the top of the repository and

   svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk 

will get you the main branch of bioperl-live.

Remember that the repository is transient, don't put anything
important in there....

Have at it, but remember that the entire world will see your commit
messages.

g.


From xing.y.hu at gmail.com  Tue Jul 10 13:08:35 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Wed, 11 Jul 2007 01:08:35 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>	<469389C0.5060303@gmail.com>
	<46939085.40906@ioc.fiocruz.br>
Message-ID: <4693BD13.2070509@gmail.com>

Hi Alberto,

Yes, I know that there is only choice for showing no more than 500 
entries on the NCBI website. However, I completely ignored that (doesn't 
mean that I have not seen that), and pulled down the "send to" and chose 
"file". Then a small window popped up, after saying yes to that, the 
downloading started. You might ask me how I know that it was not a batch 
of only 5 (default selection) or 500 ESTs? To be honest, I don't know at 
the first time. But the download has accumulated to millions bytes since 
then(due to my bad network condition, I have no idea when it will reach 
the end), and that doesn't look like a little batch of ESTs less than 
one thousand. Actually, I wrote a script to count the sequences within 
the temporary file and got a number much bigger than ten thousand. So I 
guess it works.

BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys!

Xing


Alberto Davila wrote:
> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
> and 13971 from T. cruzi 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
>   that I cannot download at once in GenBank format... even when I select 
> "GenBank" format in the Display menu I can only see and get/download 500 
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not subsets of 
> them !) but merging all them generate a file bigger than 120GB to be 
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by Chris 
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>   
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to be the 
>> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
>> fact, I knew that but I thought it was necessary to have all items 
>> selected before pressing save to launch download. So I was desperate to 
>> find a button that could achieve that without hundreds of thousands of 
>> clicking by me. "What about select none of those items at all?" -- This 
>> idea finally came to me after days of struggling and the problem was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> Caveat: if you have millions of ESTs please consider NOT using my 
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>>> the NCBI server thousands of times.  At least try looking for other 
>>> ways to retrieve the data you want (ftp, organism-specific resources 
>>> like Ensembl, so on), or run any scripts or data retrieval in off 
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the 
>>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>>> which interact with all the various eutils for building data pipelines 
>>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500 
>>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>>> records, saving the raw record data directly to a file (appending as 
>>> you go along).  I added an eval block to check the server status and 
>>> redo the call up to 4 times before giving up completely.  Using eval 
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to 
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>       
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>         
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from GenBank, 
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>       
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bix at sendu.me.uk  Tue Jul 10 13:14:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Jul 2007 18:14:29 +0100
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
Message-ID: <4693BE75.4090005@sendu.me.uk>

George Hartzell wrote:
> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails  
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.

Can I put a vote in that you don't? I search through email body text in 
my archive of guts to find certain diffs, so really like the diffs inline.

Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
in the subject? Seems redundant and makes it harder to see what was 
changed in a small email client window.


From aaron.j.mackey at gsk.com  Tue Jul 10 13:20:15 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 10 Jul 2007 13:20:15 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
Message-ID: <OF37443F52.13AE1143-ON85257314.005D5FF0-85257314.005F432E@gsk.com>

George, this is all very nice to finally have, thank you for your efforts!

Any chance that the diff-as-attachment vs. diffs-inline question can be 
different for each subscriber?  The utility of the "guts" mailing list (to 
me) is that it's an encyclopedia of browsable, skimmable, and searchable 
diffs, not just a date-stamped record of diffs (if so, why provide an 
attachment at all, just provide a URL to the diff in the respository).

Thanks again,

-Aaron


bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM:

> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails 
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.
> 
> So, to review:
> 
>    svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/
> 
> is the top of the repository and
> 
>    svn co svn+ssh://dev.open-bio.
> org/home/hartzell/bioperl_take2/bioperl-live/trunk 
> 
> will get you the main branch of bioperl-live.
> 
> Remember that the repository is transient, don't put anything
> important in there....
> 
> Have at it, but remember that the entire world will see your commit
> messages.
> 
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Tue Jul 10 14:18:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 13:18:07 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>


On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Jason Stajich writes:
>>> [...]
>>> Do you know how to have svn commit messages generate summary emails
>>> as well?
>>
>> I've made a local installation of the SVN::Notify bits in my home
>> directory and set up its notification script.  If folks are happy  
>> with
>> it then I'll work on getting The Powers That Be to do a real install
>> and we'll use it for the real repository.
>>
>> It's currently configured to include diffs inline in the message.  I
>> prefer them as an attachment, but the current configuration of the
>> bioperl-guts-l list stalls messages w/ attachments and requires admin
>> intervention.  I have a support@ request going on it and will change
>> it if/when we get the issue resolved.
>
> Can I put a vote in that you don't? I search through email body  
> text in
> my archive of guts to find certain diffs, so really like the diffs  
> inline.
>
> Also, is there any way to get rid of the 'bioperl' in [bioperl  
> revision]
> in the subject? Seems redundant and makes it harder to see what was
> changed in a small email client window.

Agree on both counts; the devs have gotten used to seeing the diffs  
inline.

We prob. need to schedule a specific day/time when the switchover  
would take place so we can announce (so everyone knows and no one can  
gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
some tools a while ago...

chris


From hartzell at alerce.com  Tue Jul 10 16:09:09 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:09:09 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <18067.59237.519166.454578@almost.alerce.com>

Sendu Bala writes:
 > George Hartzell wrote:
 > > Jason Stajich writes:
 > >  > [...]
 > >  > Do you know how to have svn commit messages generate summary emails  
 > >  > as well?
 > > 
 > > I've made a local installation of the SVN::Notify bits in my home
 > > directory and set up its notification script.  If folks are happy with
 > > it then I'll work on getting The Powers That Be to do a real install
 > > and we'll use it for the real repository.
 > > 
 > > It's currently configured to include diffs inline in the message.  I
 > > prefer them as an attachment, but the current configuration of the
 > > bioperl-guts-l list stalls messages w/ attachments and requires admin
 > > intervention.  I have a support@ request going on it and will change
 > > it if/when we get the issue resolved.
 > 
 > Can I put a vote in that you don't? I search through email body text in 
 > my archive of guts to find certain diffs, so really like the diffs inline.

Ok, three votes against attachments.  Anyone want to vote in support,
otherwise I'll just leave 'em inline.

 > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
 > in the subject? Seems redundant and makes it harder to see what was 
 > changed in a small email client window.

Sure.  The default's just [RevisionNumber].  Does that work for folk?

g.


From hartzell at alerce.com  Tue Jul 10 16:11:36 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:11:36 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
Message-ID: <18067.59384.247108.463648@almost.alerce.com>

Chris Fields writes:
 > [...]
 > We prob. need to schedule a specific day/time when the switchover  
 > would take place so we can announce (so everyone knows and no one can  
 > gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
 > some tools a while ago...

I haven't done anything about it.

I think that we also need to have some input from the admin/support
folk about access methods (https, etc...).

Are we going to want to mirror the repository anywhere?

g.


From hartzell at alerce.com  Wed Jul 11 09:17:08 2007
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 11 Jul 2007 09:17:08 -0400
Subject: [Bioperl-l] extra hook functionality for svn repos?
Message-ID: <18068.55380.626778.486775@almost.alerce.com>


There are a bunch of "contributed" hook scripts at

  http://subversion.tigris.org/tools_contrib.html#hook_scripts

Given that many bioperl users depend on case-preserving but
case-insensitive file systems, I'm wondering if hooking up the
case-insensitive.py script might be worthwhile.

Likewise, the check-mime-type.pl script might help us keep
svn:mime-type and svn:eol-style properties up to date.

There are others there, but none that I found interesting.

How big-brother do we want the repository to be?

g.


From cjfields at uiuc.edu  Wed Jul 11 09:40:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Jul 2007 08:40:54 -0500
Subject: [Bioperl-l] extra hook functionality for svn repos?
In-Reply-To: <18068.55380.626778.486775@almost.alerce.com>
References: <18068.55380.626778.486775@almost.alerce.com>
Message-ID: <A13F608F-16FA-4432-AA2F-83674E3A73F4@uiuc.edu>


On Jul 11, 2007, at 8:17 AM, George Hartzell wrote:

>
> There are a bunch of "contributed" hook scripts at
>
>   http://subversion.tigris.org/tools_contrib.html#hook_scripts
>
> Given that many bioperl users depend on case-preserving but
> case-insensitive file systems, I'm wondering if hooking up the
> case-insensitive.py script might be worthwhile.

I'm not sure how often we run into this, though.  Anyone know?

> Likewise, the check-mime-type.pl script might help us keep
> svn:mime-type and svn:eol-style properties up to date.

The latter two might be nice.  I thought we planned on defaulting to  
a simple 'plain text' mime type on commits if it isn't specifically  
predefined, but maybe this way is better?

> There are others there, but none that I found interesting.
>
> How big-brother do we want the repository to be?
>
> g.

'Friendly' big-brother, not 'dystopian' big-brother.

chris


From marian.thieme at lycos.de  Wed Jul 11 05:05:18 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 09:05:18 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178019848@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/eec1aa42/attachment-0003.html>

From dmessina at wustl.edu  Wed Jul 11 16:14:17 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 11 Jul 2007 15:14:17 -0500
Subject: [Bioperl-l] submitting code
In-Reply-To: <188661178019848@lycos-europe.com>
References: <188661178019848@lycos-europe.com>
Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu>

Hi Marian,

Thanks so much for contributing! The best way would be to create a  
Bugzilla ticket and then attach the code to that ticket. One of the  
developers will check it in and give you feedback if there are any  
little tweaks that would be helpful*.

Would you be able to include documentation and test cases with your  
module?

Dave


* For more info:
http://www.bioperl.org/wiki/FAQ#I. 
27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F
http://www.bioperl.org/wiki/Developer_Information
http://www.bioperl.org/wiki/Becoming_a_developer
http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From marian.thieme at lycos.de  Wed Jul 11 11:12:20 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 15:12:20 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178030343@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/c95991b8/attachment-0003.html>

From e-just at northwestern.edu  Thu Jul 12 10:37:03 2007
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 12 Jul 2007 09:37:03 -0500
Subject: [Bioperl-l] Job opening in Chicago
Message-ID: <fa1fe35c0707120737i71c6c26fq7635e350da9bf23f@mail.gmail.com>

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago)
for a Bioinformatics Software Engineer.  This job involves writing and
maintaining software for a genome database using Chado/OO-Perl/Bioperl
and many other state of the art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From cjfields at uiuc.edu  Thu Jul 12 12:09:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Jul 2007 11:09:02 -0500
Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question
Message-ID: <A8310D54-F800-43BE-B6C3-3879206CE697@uiuc.edu>

I have been running into some GFF formatting issues where the  
attributes column is left undef (no '.'), which causes  
GFF3Loader::parse_attributes() to complain with an 'use of undefined  
string with split' warning.  Would it be okay with the powers that be  
(Scott, Lincoln) to add a warning or exception there?  I'm guessing a  
warning is better in this case, as just returning works fine.

chris


From jason at bioperl.org  Fri Jul 13 13:30:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 13:30:05 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.59384.247108.463648@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>

I'll try and look into this and other stuff with the migration in  
next week or so - maybe we'll make some time to talk it through  
during BOSC.  I don't know yet when I'll actually have time to think  
about it properly.

I am still worried about doing https because of the current system we  
have supporting user logins and that we didn't want to run a web  
server on the main repository machine and we'll have to install DAV  
on the main repository machine.  if ssh+svn is going to be sufficient  
hurdle for people, note it was already a hurdle for them with CVS,  
but we'll have to think a bit more on it.

We might be able to do some sort of NFS (or other exported FS) but  
exported to the webserver machine but that is may be a recipe for  
disaster.

-jason
On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:

> Chris Fields writes:
>> [...]
>> We prob. need to schedule a specific day/time when the switchover
>> would take place so we can announce (so everyone knows and no one can
>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>> some tools a while ago...
>
> I haven't done anything about it.
>
> I think that we also need to have some input from the admin/support
> folk about access methods (https, etc...).
>
> Are we going to want to mirror the repository anywhere?
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri Jul 13 14:29:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 13:29:22 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu>

I don't think there's a huge rush on this since BOSC is imminent. If  
devs really want https then we can try adding it after migration, but  
if it becomes too much of a headache (particularly for the web  
admins) I wouldn't worry about it.

chris

On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote:

> I'll try and look into this and other stuff with the migration in
> next week or so - maybe we'll make some time to talk it through
> during BOSC.  I don't know yet when I'll actually have time to think
> about it properly.
>
> I am still worried about doing https because of the current system we
> have supporting user logins and that we didn't want to run a web
> server on the main repository machine and we'll have to install DAV
> on the main repository machine.  if ssh+svn is going to be sufficient
> hurdle for people, note it was already a hurdle for them with CVS,
> but we'll have to think a bit more on it.
>
> We might be able to do some sort of NFS (or other exported FS) but
> exported to the webserver machine but that is may be a recipe for
> disaster.
>
> -jason
> On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:
>
>> Chris Fields writes:
>>> [...]
>>> We prob. need to schedule a specific day/time when the switchover
>>> would take place so we can announce (so everyone knows and no one  
>>> can
>>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>>> some tools a while ago...
>>
>> I haven't done anything about it.
>>
>> I think that we also need to have some input from the admin/support
>> folk about access methods (https, etc...).
>>
>> Are we going to want to mirror the repository anywhere?
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sheris at eps.berkeley.edu  Fri Jul 13 14:42:32 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Fri, 13 Jul 2007 11:42:32 -0700
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
Message-ID: <200707131142.32366.sheris@eps.berkeley.edu>

Hi,
I have a collection of sequencing reads aligned with a consensus sequence that 
I input into a Bio::PopGen::Population object in order to calculate allele 
frequencies. The consensus sequence is included to force clustalw to give a 
better alignment. However,  I need to remove the consensus sequence before 
calculating allele frequencies in the individual reads. I'm having trouble 
with this part of it. I get the following error message:

"Can't locate object method "person_id" via package "Bio::PopGen::Individual" 		
at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line 49."

Here is the code snippet producing the error. $pop is a 
Bio::PopGen::Population object.

	my @consensus = "gene_consensus";
	$pop->remove_Individuals(@consensus);

I also tried:
	my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); 
	$pop->remove_Individuals(@consensus);

which produced the same error. Can anyone send me in the right direction? I 
suspect this is a simple problem.

Sheri

-- 
Sheri Simmons
Department of Earth and Planetary Sciences
University of California, Berkeley
Berkeley, CA 94720-4767


From jason at bioperl.org  Fri Jul 13 16:17:31 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 16:17:31 -0400
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu>
References: <200707131142.32366.sheris@eps.berkeley.edu>
Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org>

Hi Sheri -

Shoot - that was my fault - bug in the code where I was only using  
"Person" not Individuals for the code when I was testing.

I've commited a bugfix to CVS - do you need me to send you the  
updated file or are you comfortable grabbing the code from CVS or  
http://code.open-bio.org

This is the change - you may have a different version of BioPerl than  
what is in CVS so you may have to make the changes on line 260 rather  
than 282 -- or you can upgrade to latest code via CVS (although this  
is probably harder for you since you've got stuff installed in /usr/ 
share)':

RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ 
Population.pm,v
retrieving revision 1.22
diff -r1.22 Population.pm
282c282
<       unshift @tosplice, $i if( $namehash{$ind->person_id} );
---
 >       unshift @tosplice, $i if( $namehash{$ind->unique_id} );

-jason
On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote:

> Hi,
> I have a collection of sequencing reads aligned with a consensus  
> sequence that
> I input into a Bio::PopGen::Population object in order to calculate  
> allele
> frequencies. The consensus sequence is included to force clustalw  
> to give a
> better alignment. However,  I need to remove the consensus sequence  
> before
> calculating allele frequencies in the individual reads. I'm having  
> trouble
> with this part of it. I get the following error message:
>
> "Can't locate object method "person_id" via package  
> "Bio::PopGen::Individual" 		
> at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line  
> 49."
>
> Here is the code snippet producing the error. $pop is a
> Bio::PopGen::Population object.
>
> 	my @consensus = "gene_consensus";
> 	$pop->remove_Individuals(@consensus);
>
> I also tried:
> 	my @consensus = $pop->get_Individuals(-unique_id =>  
> "gene_consensus");
> 	$pop->remove_Individuals(@consensus);
>
> which produced the same error. Can anyone send me in the right  
> direction? I
> suspect this is a simple problem.
>
> Sheri
>
> -- 
> Sheri Simmons
> Department of Earth and Planetary Sciences
> University of California, Berkeley
> Berkeley, CA 94720-4767
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From hartzell at alerce.com  Fri Jul 13 16:34:14 2007
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 13 Jul 2007 16:34:14 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <18071.57798.130368.703488@almost.alerce.com>

Jason Stajich writes:
 > I'll try and look into this and other stuff with the migration in  
 > next week or so - maybe we'll make some time to talk it through  
 > during BOSC.  I don't know yet when I'll actually have time to think  
 > about it properly.
 > 
 > I am still worried about doing https because of the current system we  
 > have supporting user logins and that we didn't want to run a web  
 > server on the main repository machine and we'll have to install DAV  
 > on the main repository machine.  if ssh+svn is going to be sufficient  
 > hurdle for people, note it was already a hurdle for them with CVS,  
 > but we'll have to think a bit more on it.
 > [...]

How are you thinking about providing anonymous readonly non-dev access
to the repository?  svn+ssh using an anonymous/guest account (can it
be screwed down tightly enough?)  svn-mirror the repo onto the public
machine and do DAV there w/out having to worry about authenticating
the devs?

g.


From jason at bioperl.org  Fri Jul 13 17:33:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 17:33:29 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18071.57798.130368.703488@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
	<18071.57798.130368.703488@almost.alerce.com>
Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org>


On Jul 13, 2007, at 4:34 PM, George Hartzell wrote:

> Jason Stajich writes:
>> I'll try and look into this and other stuff with the migration in
>> next week or so - maybe we'll make some time to talk it through
>> during BOSC.  I don't know yet when I'll actually have time to think
>> about it properly.
>>
>> I am still worried about doing https because of the current system we
>> have supporting user logins and that we didn't want to run a web
>> server on the main repository machine and we'll have to install DAV
>> on the main repository machine.  if ssh+svn is going to be sufficient
>> hurdle for people, note it was already a hurdle for them with CVS,
>> but we'll have to think a bit more on it.
>> [...]
>
> How are you thinking about providing anonymous readonly non-dev access
> to the repository?  svn+ssh using an anonymous/guest account (can it
> be screwed down tightly enough?)  svn-mirror the repo onto the public
> machine and do DAV there w/out having to worry about authenticating
> the devs?
>
We'll do svn on the public anonymous machine like we already do with  
CVS and with SVN

See:
http://code.open-bio.org
  AND
http://code.open-bio.org/svnweb/
See blipkit.

-jason
> g.
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From scrosson at uchicago.edu  Fri Jul 13 18:15:30 2007
From: scrosson at uchicago.edu (Sean Crosson)
Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC)
Subject: [Bioperl-l] ace to fasta conversion
Message-ID: <loom.20070714T000856-94@post.gmane.org>

I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
and it works great.  We're now trying to convert a big (250 MB) .ace file to
fasta.  The documentation suggests I can do this, but everytime I run the script
below, it outputs an empty .fas file.  Does anyone have any suggestions on how
to make this script work?  Does SeqIO really convert between these file types? 
Thanks for your help.

#!/usr/bin/perl -w

use Bio::SeqIO;


$in  = Bio::SeqIO->new(-file => "454Contigs.ace",
                       -format => 'ace');
$out = Bio::SeqIO->new(-file => ">454Contigs.fas",
                       -format => 'fasta');
while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }


From cvillamar at gmail.com  Fri Jul 13 19:24:04 2007
From: cvillamar at gmail.com (Carlos Villacorta)
Date: Fri, 13 Jul 2007 16:24:04 -0700
Subject: [Bioperl-l] beginner problem with fasta headers
Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>

hi all,
I have a embl sequence file, when formatting to fasta with Seqio it
gives a long string header for each sequence that my following
phylogenetic software cannot handle...
Does anyone knows how to format those embl or genbank files to fasta
but retrieving in the headers just two or three fields (e.g. id | gene
| sp_name)?
Any advice with this problem would be very appreciated, thanks!


From j_martin at lbl.gov  Fri Jul 13 20:05:45 2007
From: j_martin at lbl.gov (Joel Martin)
Date: Fri, 13 Jul 2007 17:05:45 -0700
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <loom.20070714T000856-94@post.gmane.org>
References: <loom.20070714T000856-94@post.gmane.org>
Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org>

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote:
> I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
> and it works great.  We're now trying to convert a big (250 MB) .ace file to
> fasta.  The documentation suggests I can do this, but everytime I run the script
> below, it outputs an empty .fas file.  Does anyone have any suggestions on how
> to make this script work?  Does SeqIO really convert between these file types? 
> Thanks for your help.
> 
> #!/usr/bin/perl -w
> 
> use Bio::SeqIO;
> 
> 
> $in  = Bio::SeqIO->new(-file => "454Contigs.ace",
>                        -format => 'ace');
> $out = Bio::SeqIO->new(-file => ">454Contigs.fas",
>                        -format => 'fasta');
> while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Jul 14 00:06:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 23:06:27 -0500
Subject: [Bioperl-l] beginner problem with fasta headers
In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu>

Some reading material...

http://www.bioperl.org/wiki/ 
FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files
http://www.bioperl.org/wiki/ 
FAQ#I_would_like_to_make_my_own_custom_fasta_header_- 
_how_do_I_do_this.3F
http://www.bioperl.org/wiki/FASTA_sequence_format#Note

Quiz on Monday!

chris

On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote:

> hi all,
> I have a embl sequence file, when formatting to fasta with Seqio it
> gives a long string header for each sequence that my following
> phylogenetic software cannot handle...
> Does anyone knows how to format those embl or genbank files to fasta
> but retrieving in the headers just two or three fields (e.g. id | gene
> | sp_name)?
> Any advice with this problem would be very appreciated, thanks!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scrosson at uchicago.edu  Fri Jul 13 23:43:59 2007
From: scrosson at uchicago.edu (scrosson)
Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT)
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org>
References: <loom.20070714T000856-94@post.gmane.org>
	<20070714000544.GB29841@eniac.jgi-psf.org>
Message-ID: <11590811.post@talk.nabble.com>


This problem now makes sense.  I've been playing with Bio::Assembly::IO,
which does indeed read phrap .ace files.  Does anyone have an idea how to
pull the assembled contigs out of a Bio::Assembly object and write them out
as multi-fasta (or strings for that matter)?  None of our workstations are
running phrap/consed and I'd love to see these contigs.

Sean 
       

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try
poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

-- 
View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bioperlanand at yahoo.com  Sat Jul 14 13:55:53 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT)
Subject: [Bioperl-l] a question on obtain PDB records using bioperl
Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com>

Hi everybody,

Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records.

Thanks in advance,

Anand

       
---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. 


From johnsonm at gmail.com  Tue Jul 17 14:23:58 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 17 Jul 2007 13:23:58 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
Message-ID: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>

I'm tinkering with parsing iprscan reports with BioPerl.  I noticed that this:

  my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro');

  while (my $seq = $seqio->next_seq()) {
      ...
  }

Does not work unless I first 'use XML::DOM::XPath'.  I get this error:

  Can't locate object method "findnodes" via package
"XML::DOM::Document" at
bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
30.

I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
suck in XML::DOM::Xpath.  I see that t/interpro.t requires
XML::DOM::XPath:

test_begin(-tests => 17,
                -requires_module => 'XML::DOM::XPath');

Is suppose the reason the test specs a require XML::DOM::XPath is so
that tests can be skipped if XML::DOM::XPath is not available.
Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?


From sac at bioperl.org  Tue Jul 17 15:49:32 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 17 Jul 2007 12:49:32 -0700
Subject: [Bioperl-l] Ohloh account for bioperl
Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>

I came across a web app that tracks various metrics for open source
projects, noticed that bioperl wasn't listed, and added it:

http://www.ohloh.net/projects/6685

Seems like an interesting resource that could help add some
visibility. It creates metrics by directly processing the source code
repository. I hooked it up to the CVS repos for bioperl-live, -db,
-run, and -pipeline. It has yet to do its analysis at this point.

Feel free to create Ohloh accounts for yourselves. When you add
yourself as a contributor to Bioperl, you can indicate the username
associated with your commits, but this requires that it first process
the commit logs to figure out what the usernames are. You can still
create an account, just update it later with your username.

Steve


From cjfields at uiuc.edu  Tue Jul 17 17:04:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Jul 2007 16:04:44 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>


On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:

> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed  
> that this:
>
>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>  
> 'interpro');
>
>   while (my $seq = $seqio->next_seq()) {
>       ...
>   }
>
> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
>
>   Can't locate object method "findnodes" via package
> "XML::DOM::Document" at
> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> 30.
>
> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> XML::DOM::XPath:
>
> test_begin(-tests => 17,
>                 -requires_module => 'XML::DOM::XPath');
>
> Is suppose the reason the test specs a require XML::DOM::XPath is so
> that tests can be skipped if XML::DOM::XPath is not available.
> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?

You're right; I think tests passed b/c XML::DOM::XPath (if present),  
was eval'd as a required module.  When I commented out the spot where  
it is eval'd in the test suite I can replicate this error.  I have  
added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it  
passes fine.

Thanks for the heads up!

chris


From xianranli78 at yahoo.com.cn  Wed Jul 18 01:55:19 2007
From: xianranli78 at yahoo.com.cn (Xianran Li)
Date: Wed, 18 Jul 2007 13:55:19 +0800
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file
Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>

Hi,

I want to extract some infomation  from the gff3 file like:

12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
   
The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?

Thanks for your help.


Xianran Li


From georg.otto at tuebingen.mpg.de  Wed Jul 18 05:32:26 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 18 Jul 2007 11:32:26 +0200
Subject: [Bioperl-l] run megablast
Message-ID: <m1r6n66or9.fsf@tuebingen.mpg.de>


Hi,

is there a module to run megablast in a script (equivalent to ncbi
blast in StandAloneBlast.pm)?

Cheers,

Georg


From jeevitesh at ibab.ac.in  Wed Jul 18 06:03:24 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 03:15:33 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in>

Hi Friends,

we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES.

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 04:45:50 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From cain.cshl at gmail.com  Wed Jul 18 09:10:40 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 09:10:40 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from
	gff3	file
In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
Message-ID: <1184764240.2570.31.camel@localhost.localdomain>

Hi Xianran Li,

Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
as Bio::DB::GFF3), then you can use the attributes method to get
anything in the ninth column:

  my ($name) = $gene->attributes('Name');

The parenthesis are needed around $name because the attributes method
returns a list and the parens capture the first item of the list into
$name.

Scott


On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> Hi,
> 
> I want to extract some infomation  from the gff3 file like:
> 
> 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
>    
> The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> 
> Thanks for your help.
> 
> 
> Xianran Li
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/c66ec18b/attachment-0003.bin>

From johnsonm at gmail.com  Wed Jul 18 16:53:00 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 18 Jul 2007 15:53:00 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <469DB6C6.9010702@pasteur.fr>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
	<5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>
	<469DB6C6.9010702@pasteur.fr>
Message-ID: <ebf5eb170707181352v4d59ec81kfb6f706ca4643cc7@mail.gmail.com>

The output from InterProScan, invoked thusly:

iprscan -cli -seqtype p -i input_file -o output_file -format xml

On 7/18/07, Emmanuel Quevillon <tuco at pasteur.fr> wrote:
> Hi guys,
>
> I read your email and I wondered which iprscan file you've
> been talking about? Is it the file produced by InterProScan
> or the file called match.xml representing the whole uniprot
> database against InterPro? Reading the xml parser
> implemented into Bio::SeqIO::interpro, I guess it is the
> second one?
> In such case, I just want to let you know that the xml
> schema changed and the file name also. It is now called
> match_complete.xml.
> I attached the DTD to be able to see the new structure.
> Here is an example of the new data representation.
>
>
> <protein id="A0A000" name="A0A000_9ACTO" length="394"
> crc64="F1DD0C1042811B48">
>      <match id="G3DSA:3.40.640.10"
> name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D"
> status="T" evd="HMMPfam">
>        <ipr id="IPR015421" name="Pyridoxal
> phosphate-dependent transferase, major region, subdomain 1"
> type="Domain" />
>        <lcn start="52" end="288" score="4.30000170645879E-75" />
>      </match>
>      <match id="PTHR13693:SF7" name="PTHR13693:SF7"
> dbname="PANTHER" status="T" evd="not_rel">
>        <lcn start="33" end="389" score="0.0" />
>      </match>
> </protein>
>
> As you can see some time there is no interpro info (no ipr
> element).
>
> I think it would be good to change also the interpro parser ?
>
> Regards
>
> Emmanuel
>
> Chris Fields wrote:
> > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:
> >
> >> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed
> >> that this:
> >>
> >>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>
> >> 'interpro');
> >>
> >>   while (my $seq = $seqio->next_seq()) {
> >>       ...
> >>   }
> >>
> >> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
> >>
> >>   Can't locate object method "findnodes" via package
> >> "XML::DOM::Document" at
> >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> >> 30.
> >>
> >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> >> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> >> XML::DOM::XPath:
> >>
> >> test_begin(-tests => 17,
> >>                 -requires_module => 'XML::DOM::XPath');
> >>
> >> Is suppose the reason the test specs a require XML::DOM::XPath is so
> >> that tests can be skipped if XML::DOM::XPath is not available.
> >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?
> >
> > You're right; I think tests passed b/c XML::DOM::XPath (if present),
> > was eval'd as a required module.  When I commented out the spot where
> > it is eval'd in the test suite I can replicate this error.  I have
> > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it
> > passes fine.
> >
> > Thanks for the heads up!
> >
> > chris
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cain.cshl at gmail.com  Wed Jul 18 22:47:53 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 22:47:53 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from	gff3
	file
In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
	<1184764240.2570.31.camel@localhost.localdomain>
	<008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
Message-ID: <1184813273.2570.96.camel@localhost.localdomain>

[Please always reply to the mailing list so that answers can archived]


Yes, because commas are not allowed in GFF3 in an unescaped form.
Essentially, you are doing this with your GFF3:

  Name=receptor kinase ORK10;Name= putative

and when you do this:

  my ($name) = $gene->attributes('Name');

you are getting the first item in the list of names, and I suspect which
one you get is random.

To fix it, you need to replace the comma with %2C (the URL escape code
for a comma).  If you generated this GFF3, you will need to add a step
to URI encode your attribute strings.  If you got it from someone else,
you should point out to them that their GFF is flawed.

Scott


On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote:
> However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? 
> 
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
> as Bio::DB::GFF3), then you can use the attributes method to get
> anything in the ninth column:
> 
>   my ($name) = $gene->attributes('Name');
> 
> The parenthesis are needed around $name because the attributes method
> returns a list and the parens capture the first item of the list into
> $name.
> 
> Scott
> 
> 
> On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> > Hi,
> > 
> > I want to extract some infomation  from the gff3 file like:
> > 
> > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
> >    
> > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> > 
> > Thanks for your help.
> > 
> > 
> > Xianran Li
> ----- Original Message ----- 
> From: "Scott Cain" <cain.cshl at gmail.com>
> To: "Xianran Li" <xianranli78 at yahoo.com.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 18, 2007 9:10 PM
> Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file
> 
> 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l&#0;??i??'?????h??&
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/86cf671f/attachment-0003.bin>

From acutter at eeb.utoronto.ca  Thu Jul 19 22:25:08 2007
From: acutter at eeb.utoronto.ca (Asher Cutter)
Date: Thu, 19 Jul 2007 22:25:08 -0400
Subject: [Bioperl-l] tree comparisons with bioperl
Message-ID: <46A01D04.5040209@eeb.utoronto.ca>

I was reading over the functions for working with trees in bioperl. I am 
looking for something that will compare two topologies and report back 
if they are equivalent. i.e. something like:

does ((a,(b,c)) == ((A,B),C) ? (in this case, no)

But of course in reality they would be more complicated topologies. This 
would be useful for simulating random trees to compare with some given 
topology of interest.

I saw the methods for testing for monophyly and paraphyly, but not much 
beyond that...perhaps I have missed something?

Any suggestions?

Thanks,
Asher

-- 

___________________________________
Asher D. Cutter
Assistant Professor
Department of Ecology & Evolutionary Biology
University of Toronto
25 Harbord St.
Toronto, ON, M5S 3G5

tel: 416-978-4602
email: acutter at eeb.utoronto.ca
http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130
___________________________________


From jeevitesh at ibab.ac.in  Fri Jul 20 00:25:22 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From n.haigh at sheffield.ac.uk  Sun Jul 22 07:34:58 2007
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sun, 22 Jul 2007 12:34:58 +0100
Subject: [Bioperl-l] Ohloh account for bioperl
In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
Message-ID: <46A340E2.4040505@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Chervitz wrote:
> I came across a web app that tracks various metrics for open source
> projects, noticed that bioperl wasn't listed, and added it:
> 
> http://www.ohloh.net/projects/6685
> 
> Seems like an interesting resource that could help add some
> visibility. It creates metrics by directly processing the source code
> repository. I hooked it up to the CVS repos for bioperl-live, -db,
> -run, and -pipeline. It has yet to do its analysis at this point.
> 
> Feel free to create Ohloh accounts for yourselves. When you add
> yourself as a contributor to Bioperl, you can indicate the username
> associated with your commits, but this requires that it first process
> the commit logs to figure out what the usernames are. You can still
> create an account, just update it later with your username.
> 
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Nice to see the graphs of number of commits each developer has made over
the last 5 years and how new developers have arisen while those more
"seasoned" developers can relax a little more -proof of an excellent
open source project!

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO
4JWvG5Gy+H/UqpeXYAcSCX0=
=LrFt
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Sun Jul 22 23:53:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 22 Jul 2007 22:53:48 -0500
Subject: [Bioperl-l] run megablast
In-Reply-To: <m1r6n66or9.fsf@tuebingen.mpg.de>
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>

StandAloneBlast runs the megablast executable directly, though I  
think you can specify a MegaBlast search using blastall with the '-n'  
flag.

We could probably add this functionality in fairly easily since  
SearchIO can parse megablast output; no one's had the need to code it  
yet.

chris

On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:

>
> Hi,
>
> is there a module to run megablast in a script (equivalent to ncbi
> blast in StandAloneBlast.pm)?
>
> Cheers,
>
> Georg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jeevitesh at ibab.ac.in  Mon Jul 23 06:34:36 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6.

We need to find the shared distance as said above.

Kindly helps us it will help our research a lot.

With Thanks & regards
jeevitesh


From bix at sendu.me.uk  Mon Jul 23 07:08:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 23 Jul 2007 12:08:23 +0100
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared
	Distance
In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
Message-ID: <46A48C27.6060905@sendu.me.uk>

jeevitesh at ibab.ac.in wrote:
> Hi Friends,
> 
> We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
> A TREE.

Please stop sending this message. We heard you the first time. If no one 
answered, either no one knows the answer or no one understood you.


> The Distance method of TreeIO in Bioperl module gives the total distance.
> 
> But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
> illustrated
> in figure.
> 
> Suppose we have a tree
>     A                C
>      \              /
>       \2          2/
>        \__________/
>        /    6     \
>       /2          2\
>      /              \
>     B                D
> 
> The shared path between AB and AC is 2.
> and for AC and BD the shared path is 6.

I don't follow. But if you already know how to work the answer out, 
describe the algorithm in words and maybe someone can code it up for you.


From georg.otto at tuebingen.mpg.de  Mon Jul 23 09:56:46 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Mon, 23 Jul 2007 15:56:46 +0200
Subject: [Bioperl-l] run megablast
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
	<1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>
Message-ID: <m11weznrz5.fsf@tuebingen.mpg.de>

Thanks a lot! I guess I should have read the blast documentation more
carefully....

Best,

Georg

Chris Fields <cjfields at uiuc.edu> writes:
> StandAloneBlast runs the megablast executable directly, though I  
> think you can specify a MegaBlast search using blastall with the '-n'  
> flag.
>
> We could probably add this functionality in fairly easily since  
> SearchIO can parse megablast output; no one's had the need to code it  
> yet.
>
> chris
>
> On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:
>
>>
>> Hi,
>>
>> is there a module to run megablast in a script (equivalent to ncbi
>> blast in StandAloneBlast.pm)?
>>
>> Cheers,
>>
>> Georg
>>


From cjfields at uiuc.edu  Mon Jul 23 11:41:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Jul 2007 10:41:35 -0500
Subject: [Bioperl-l] Bio::Assembly bug/feature?
Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu>

To all:

I think I have found a major problem with Bio::Assembly; this was  
first noticed on Mac OS X in relation to bug 2320 and  
Bio::Assembly::IO.  I am uncertain whether this is meant to be a  
feature or a bug but it certainly needs to be documented or fixed as  
it leads to subtle errors.  I also can't see the advantage of this  
approach, but maybe I can be enlightened?  Either way, I think it's  
worth a discussion for those willing to follow.  I'll add as a bug  
later if needed.

A bit of background: each instance of a Bio::Assembly::Contig has a  
Bio::SeqFeature::Collection instance attached to it; each  
Bio::SeqFeature::Collection itself has a tied DB_File handle attached  
which remains open during the lifetime of the Bio::SF::Collection  
object.  When using Bio::Assembly one adds the various Contig objects  
to a Bio::Assembly::Scaffold.  So, for instance, if one had ~1000  
Contigs in a Scaffold, one would also have ~1000 open tied db  
handles, one per Contig instance.  So far, so good.

Unfortunately, when adding a ton of Contig objects to a  
Bio::Assembly::Scaffold one can run into a host of system-dependent  
issues based on resource usage limits (as one might expect).  This  
script:

------------------------------
use Bio::Assembly::Scaffold;
use Bio::Assembly::Contig;
use Bio::SeqFeature::Generic;

my $scaffold = Bio::Assembly::Scaffold->new();

for my $id (1..15000) {
     print "Contig #$id\n";
     my $contig = Bio::Assembly::Contig->new(-id => $id);
     my $feat = Bio::SeqFeature::Generic->new(-start=>1,
                                            -end=>10,
                                            -strand=>1);
     $contig->add_features([$feat]);
     $scaffold->add_contig($contig);
}
------------------------------

may fail on Mac OS X when one reaches the maximum number of open file  
descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - 
n'); the call to tie the DB_File handle in SF::Collection fails  
silently, so later on when called on you get the following:

...
Contig #251
Contig #252
Contig #253
Contig #254
Can't call method "put" on an undefined value at /Users/cjfields/src/ 
bioperl-live/Bio/SeqFeature/Collection.pm line 225.

I have added an exception to catch this.  On Mac OS X you can  
increase the file descriptor limit using ulimit, at least to a  
certain point.  However, when testing this out on dev.open-bio.org  
(Linux) the 'tie' sometimes fails (and the exception pops up), but it  
isn't dependent on 'ulimit -n'.  This is what happens more often:

...
Contig #10567
Contig #10568
Contig #10569
Contig #10570
Out of memory!

Sometimes followed by a seg fault.  Ick!

Any ideas? For instance, should we set this up so that one  
SF::Collection is used for all the Contigs (since each one has a  
unique ID anyway)?  Leave as is and document/track the issue as a  
bug?  Both?

chris


From ba6450 at wayne.edu  Mon Jul 23 16:06:14 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu>

Hello everyone:

I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:

[code]
use Bio::Tools::Run::Phylo::PAML::Codeml;
use Bio::AlignIO;
use Bio::TreeIO;

my $alignio = Bio::AlignIO->new(-format => 'phylip',
			         -file   => 'NM_000034.CDSalign.paml');

my $aln = $alignio->next_aln;

my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
my $tree   = $treeio->next_tree;

my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();

$codeml->alignment($aln);
$codeml->tree($tree);

my ($rc,$parser) = $codeml->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();
print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
[/code]

It gives the following error when I try to compile:

[error]
------------ EXCEPTION: Bio::Root::Exception -------------
MSG: unable to find or run executable for 'codeml'
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
-----------------------------------------------------------
Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
[/error]

Any idea, guys?

Munirul Islam
Phd Student
Computer Science
Wayne State University


From arareko at campus.iztacala.unam.mx  Mon Jul 23 17:19:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 23 Jul 2007 16:19:24 -0500
Subject: [Bioperl-l] error running codeml
In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx>

Apparently, your script isn't able to locate the codeml executable in 
your Windows environment. Do you have the PAML package installed? 
Instructions on how to install it are located here:

http://abacus.gene.ucl.ac.uk/software/paml.html

Regards,
Mauricio.

Munirul Islam wrote:
> Hello everyone:
> 
> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
> 
> [code]
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::AlignIO;
> use Bio::TreeIO;
> 
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
> 			         -file   => 'NM_000034.CDSalign.paml');
> 
> my $aln = $alignio->next_aln;
> 
> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> my $tree   = $treeio->next_tree;
> 
> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> 
> $codeml->alignment($aln);
> $codeml->tree($tree);
> 
> my ($rc,$parser) = $codeml->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> [/code]
> 
> It gives the following error when I try to compile:
> 
> [error]
> ------------ EXCEPTION: Bio::Root::Exception -------------
> MSG: unable to find or run executable for 'codeml'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> -----------------------------------------------------------
> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> [/error]
> 
> Any idea, guys?
> 
> Munirul Islam
> Phd Student
> Computer Science
> Wayne State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From ba6450 at wayne.edu  Mon Jul 23 19:53:22 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu>

Thanks Mauricio. 

I needed to add an environment variable for the paml directiory. 

$ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; 

One question ... I would like to save the temp files.  So, what modification do I need to make such that 
$obj->save_tempfiles returns 1 within codeml.pm? 

Regards 

Munir

---- Original message ----
>Date: Mon, 23 Jul 2007 16:19:24 -0500
>From: Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>  
>Subject: Re: [Bioperl-l] error running codeml  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Apparently, your script isn't able to locate the codeml executable in 
>your Windows environment. Do you have the PAML package installed? 
>Instructions on how to install it are located here:
>
>http://abacus.gene.ucl.ac.uk/software/paml.html
>
>Regards,
>Mauricio.
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
>> 
>> [code]
>> use Bio::Tools::Run::Phylo::PAML::Codeml;
>> use Bio::AlignIO;
>> use Bio::TreeIO;
>> 
>> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>> 			         -file   => 'NM_000034.CDSalign.paml');
>> 
>> my $aln = $alignio->next_aln;
>> 
>> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
>> my $tree   = $treeio->next_tree;
>> 
>> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
>> 
>> $codeml->alignment($aln);
>> $codeml->tree($tree);
>> 
>> my ($rc,$parser) = $codeml->run();
>> my $result = $parser->next_result;
>> my $MLmatrix = $result->get_MLmatrix();
>> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
>> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
>> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
>> [/code]
>> 
>> It gives the following error when I try to compile:
>> 
>> [error]
>> ------------ EXCEPTION: Bio::Root::Exception -------------
>> MSG: unable to find or run executable for 'codeml'
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
>> -----------------------------------------------------------
>> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
>> [/error]
>> 
>> Any idea, guys?
>> 
>> Munirul Islam
>> Phd Student
>> Computer Science
>> Wayne State University
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>
>-- 
>MAURICIO HERRERA CUADRA
>arareko at campus.iztacala.unam.mx
>Laboratorio de Gen?tica
>Unidad de Morfofisiolog?a y Funci?n
>Facultad de Estudios Superiores Iztacala, UNAM
>


From jason at bioperl.org  Tue Jul 24 03:19:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Jul 2007 09:19:18 +0200
Subject: [Bioperl-l] error running codeml
In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
	<46A51B5C.9080808@campus.iztacala.unam.mx>
Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com>

when you initialize the Codeml object just pass in my $codeml =
Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1);

OR do
$codeml->save_tempfiles(1);

You may want to set you TEMPDIR as well and you print out where the tempdir
is located with
print $codeml->tempdir;
and I think you can get the temp outfile.
my $name = $codeml->outfile_name;
print "name is $name\n";

-jason
On 7/23/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
>
> Apparently, your script isn't able to locate the codeml executable in
> your Windows environment. Do you have the PAML package installed?
> Instructions on how to install it are located here:
>
> http://abacus.gene.ucl.ac.uk/software/paml.html
>
> Regards,
> Mauricio.
>
>
> Munirul Islam wrote:
> > Hello everyone:
> >
> > I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is
> the code:
> >
> > [code]
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::AlignIO;
> > use Bio::TreeIO;
> >
> > my $alignio = Bio::AlignIO->new(-format => 'phylip',
> >                                -file   => 'NM_000034.CDSalign.paml');
> >
> > my $aln = $alignio->next_aln;
> >
> > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> > my $tree   = $treeio->next_tree;
> >
> > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> >
> > $codeml->alignment($aln);
> > $codeml->tree($tree);
> >
> > my ($rc,$parser) = $codeml->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> > [/code]
> >
> > It gives the following error when I try to compile:
> >
> > [error]
> > ------------ EXCEPTION: Bio::Root::Exception -------------
> > MSG: unable to find or run executable for 'codeml'
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> > -----------------------------------------------------------
> > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI
> (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> > [/error]
> >
> > Any idea, guys?
> >
> > Munirul Islam
> > Phd Student
> > Computer Science
> > Wayne State University
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Tue Jul 24 17:16:54 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu>

Hello everyone:

I am having problem loading a sequence file from within a directory.  

#############################################################
$dirname = "rundir";
opendir (DIR, $dirname) || die("can't open $dirname");
      
while (defined($file = readdir(DIR))) {
    next if $file =~ /^\.\.?$/;		# skip . and ..
    $abs_path = File::Spec->rel2abs( $file ) ;
    
    # gives a file not found exception for the following code
    my $alignio = Bio::AlignIO->new(-format => 'nexus',
				-file   => $abs_path);
    my $aln = $alignio->next_aln;
    @sequencenames -> $aln->_read_taxlabels;
	  		
    foreach $taxa (@sequencenames) {
	print $taxa . "\n";
    } 		
}        
#############################################################

Your suggestions please.

Regards,

Munirul Islam
PhD Student
Computer Science
Wayne State University
Detroit, Michigan, USA


From bix at sendu.me.uk  Tue Jul 24 18:39:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Jul 2007 23:39:33 +0100
Subject: [Bioperl-l] error loading sequence
In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu>
References: <20070724171654.EEX04380@mirapointms6.wayne.edu>
Message-ID: <46A67FA5.3070505@sendu.me.uk>

Munirul Islam wrote:
> Hello everyone:
> 
> I am having problem loading a sequence file from within a directory.  
> 
> #############################################################
> $dirname = "rundir";
> opendir (DIR, $dirname) || die("can't open $dirname");
>       
> while (defined($file = readdir(DIR))) {
>     next if $file =~ /^\.\.?$/;		# skip . and ..
>     $abs_path = File::Spec->rel2abs( $file ) ;
>     
>     # gives a file not found exception for the following code

This isn't a Bioperl problem. You're using the wrong File::Spec method. 
You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Tue Jul 24 20:10:04 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu>

Thanks.  That worked nicely.  I need your suggestion to load codeml control data from a file.  Consider the following code:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params =>	{'noisy' => 9,
		 'verbose' => 2,
		 'runmode' => 0,
		 'seqtype' => 1,
		 'CodonFreq' => 2,
		 'aaDist' => 0,
		 'model' => 2,
		 'NSsites' => 2,
		 'icode' => 0	});
-------------------------------------------------------------

Tried to modify it by passing a hash reference after loading data from a file.:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params => \%hashlist );
-------------------------------------------------------------

Still that didn't work.  Your suggestions pls.

Munir

---- Original message ----
>Date: Tue, 24 Jul 2007 23:39:33 +0100
>From: Sendu Bala <bix at sendu.me.uk>  
>Subject: Re: [Bioperl-l] error loading sequence  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am having problem loading a sequence file from within a directory.  
>> 
>> #############################################################
>> $dirname = "rundir";
>> opendir (DIR, $dirname) || die("can't open $dirname");
>>       
>> while (defined($file = readdir(DIR))) {
>>     next if $file =~ /^\.\.?$/;		# skip . and ..
>>     $abs_path = File::Spec->rel2abs( $file ) ;
>>     
>>     # gives a file not found exception for the following code
>
>This isn't a Bioperl problem. You're using the wrong File::Spec method. 
>You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Thu Jul 26 15:21:20 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT)
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu>

Hello Everyone:

I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.

my $alignio = Bio::AlignIO->new(-format => 'phylip',
				-file   => 'seq.txt');

I guess its not in valid phylip format.

I tried to change 'seq.txt' to sequential format.  Still that didn't work.

Any suggestions on how to load 'seq.txt' in bioperl?  

Thanks,

Munir
PhD Student
Computer Science
Wayne State University
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0003.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq.out
Type: application/octet-stream
Size: 24318 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0003.obj>

From jason at bioperl.org  Thu Jul 26 20:12:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 17:12:03 -0700
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu>
References: <20070726152120.EFA94600@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com>

You can try and pass in -interleaved => 0 as another option when you
init your AlignIO object.

On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
> Hello Everyone:
>
> I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.
>
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>                                 -file   => 'seq.txt');
>
> I guess its not in valid phylip format.
>
> I tried to change 'seq.txt' to sequential format.  Still that didn't work.
>
> Any suggestions on how to load 'seq.txt' in bioperl?
>
> Thanks,
>
> Munir
> PhD Student
> Computer Science
> Wayne State University
>
>      11     2202
>
> human
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> chimp
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> macaca
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG
> CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC
> GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC
> ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT
> ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG
> CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC
> GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG ---
> --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG
> CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG
> AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> mouse
> GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC
> ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG
> CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA
> AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA
> GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC
> TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG
> GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC
> TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC
> GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC
> CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG
> TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC
> CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC
> CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC
> TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT
> TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG
> AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA
> AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC
> ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC
> TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG
> TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT ---
> --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG
> CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT
> GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG
> AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC
> TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC
> TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG
> GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT
> rat
> GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG
> CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA
> AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA
> GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC
> TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC
> TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC
> GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC
> CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA
> TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT
> CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT
> CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC
> TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT
> TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG
> CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA
> AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC
> ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG
> TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT ---
> --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG
> CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG
> AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC
> TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC
> TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG
> GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT
> rabbit
> GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG
> AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC
> ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG
> CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC
> CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG
> GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC
> TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC
> CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG
> TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC
> CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC
> GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC
> TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA
> GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC
> TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG
> CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT
> --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG
> ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT
> ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG
> TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA
> GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG ---
> --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG
> CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG
> GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC
> AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG
> GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC
> ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG
> GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT
> dog
> GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG
> AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC
> ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG
> CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC
> TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT
> GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC
> TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT
> CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT
> GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC
> CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC
> CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC
> CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC
> ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC
> TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT
> TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG
> CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA
> CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC
> ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC
> ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG
> CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC
> AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG ---
> --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT
> GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT
> AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG
> GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC
> ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG
> GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT
> cow
> GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA
> CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC
> ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG
> CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG
> AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG
> GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC
> CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG
> ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT
> GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC
> TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT
> CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC
> TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC
> GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG
> TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC
> TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC
> ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG
> CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA
> CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC
> ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC
> ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC
> CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT
> AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG ---
> --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG
> TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT
> AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG
> GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC
> ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC
> TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG
> GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT
> elephant
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
> --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC
> ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA
> AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG
> GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG
> ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG
> GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC
> TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG
> TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC
> TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC
> GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC
> CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC
> CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN ---
> --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- ---
> --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN
> NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- ---
> --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN ---
> --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN
> NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG
> GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC
> ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT
> opossum
> GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA
> --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC
> ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA
> AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC
> GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG
> GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC
> CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG
> ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG
> ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT
> TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT
> CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC
> TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC
> CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC
> TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC
> CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC
> CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC
> ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC
> TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA
> GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC
> TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG
> CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG
> GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC
> AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC
> ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC
> ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG
> CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT
> CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC ---
> --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG
> CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA
> GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG
> CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC
> AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA
> GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC
> ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC
> TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG
> GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- ---
> chicken
> GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG
> --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC
> ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG
> CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG
> GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG
> GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC
> CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC
> ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC
> AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC
> TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT
> CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC
> TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT
> GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC
> CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC
> TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT
> CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC
> CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC
> ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC
> TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA
> GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC
> TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG
> CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC
> ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG ---
> --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- ---
> --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG
> GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- ---
> CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC
> AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC
> TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC
> CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG
> GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG
> TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC
> AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG
> GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC
> GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC
> TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG
> GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Thu Jul 26 21:20:11 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT)
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu>

Thanks.  The error is removed now.

I have a question.  Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file?

Munir

---- Original message ----
>Date: Thu, 26 Jul 2007 17:12:03 -0700
>From: "Jason Stajich" <jason at bioperl.org>  
>Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl)  
>To: "Munirul Islam" <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>You can try and pass in -interleaved => 0 as another option when you
>init your AlignIO object.
>


From jason at bioperl.org  Fri Jul 27 00:28:36 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 21:28:36 -0700
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu>
References: <20070726212011.EFB49252@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com>

Have you tried reading the documentation for the Bio::SimpleAlign object?

for my $seq ( $aln->each_seq ) {
 print $seq->display_id, "\n";
}

I'd appreciate if you added some of your questions with the answers to the
FAQ or to other places on the wiki so that other people can benefit from
your learning here.


On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
>
> Thanks.  The error is removed now.
>
> I have a question.  Is there any function that I can use to get the
> sequence list (human, chimp, etc.) after loading an alignment from file?
>
> Munir
>
> ---- Original message ----
> >Date: Thu, 26 Jul 2007 17:12:03 -0700
> >From: "Jason Stajich" <jason at bioperl.org>
> >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in
> bioperl)
> >To: "Munirul Islam" <ba6450 at wayne.edu>
> >Cc: bioperl-l at lists.open-bio.org
> >
> >You can try and pass in -interleaved => 0 as another option when you
> >init your AlignIO object.
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From arareko at campus.iztacala.unam.mx  Fri Jul 27 11:18:55 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 10:18:55 -0500
Subject: [Bioperl-l] Perl Survey 2007
Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx>

It really takes about 5 minutes:

http://perlsurvey.org/

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From dhoworth at mrc-lmb.cam.ac.uk  Fri Jul 27 12:07:17 2007
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Fri, 27 Jul 2007 17:07:17 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk>

Mauricio Herrera Cuadra wrote:
> It really takes about 5 minutes:
> http://perlsurvey.org/

and gives all your personal information including email address to
anybody who cares to snoop the HTTP POST message! So there's definitely
no anonymity.

Cheers, Dave


From spiros at lokku.com  Fri Jul 27 12:38:57 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Fri, 27 Jul 2007 17:38:57 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
Message-ID: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>

On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
> Mauricio Herrera Cuadra wrote:
> > It really takes about 5 minutes:
> > http://perlsurvey.org/
>
> and gives all your personal information including email address to
> anybody who cares to snoop the HTTP POST message! So there's definitely
> no anonymity.

Not to mention that it requires registration (?). Who is behind the
survey ? I am on a number of Perl and Perl related lists and haven't
seen it being mentioned.

Spiros


From arareko at campus.iztacala.unam.mx  Fri Jul 27 13:37:31 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 12:37:31 -0500
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
	<bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx>

Spiros Denaxas wrote:
> On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
>> Mauricio Herrera Cuadra wrote:
>>> It really takes about 5 minutes:
>>> http://perlsurvey.org/
>> and gives all your personal information including email address to
>> anybody who cares to snoop the HTTP POST message! So there's definitely
>> no anonymity.

I didn't provided any personal information other than my country and 
birthyear. As for my email, I always use the one I have for all the SPAM 
I'd like to subscribe to :)

> Not to mention that it requires registration (?). Who is behind the
> survey ? I am on a number of Perl and Perl related lists and haven't
> seen it being mentioned.

Registration is rather different from confirming your email (which 
prevents filling the DB multiple times by spambots/yourself, thus 
screwing the survey). Who's behind it, its purpose, privacy, etc., 
please read the FAQ:

http://perlsurvey.org/faq/

Cheers,
Mauricio.

> Spiros
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From Alicia.Amadoz at uv.es  Mon Jul 30 11:46:57 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
Message-ID: <1245168492amadoz@uv.es>

Hi, i'm trying to run a bioperl script in linux with standaloneblast
from a webserver but I have the following error:

-------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------

I have tried several things to fix it as setting some environment
variables both directly through the shell and adding some code in my
script with,

BEGIN {
$ENV{PATH} .= ':/usr/local/blast-2.2.16';
$ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; 
$ENV{BLASTDATADIR} = '/usr/local/data/';
}

and with,

$local->executable('/usr/local/bin');
my $blast_report = $local->blastall($inputfilename); 

I have also checked that the webserver has permission of read and
execute in all blast executables and directories. But trying all of
these things it keeps showing the same error above.

Any more idea to solve this problem? My script works well when I use it
as a simply script and I've reboot the system several times when changes
where performed. 

Thanks to anyone who will be able to help me!
Regards,
Alicia


From gyang at plantbio.uga.edu  Mon Jul 30 16:58:51 2007
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 30 Jul 2007 16:58:51 -0400
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>

I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this?  
Thanks a lot,  
   

Guojun Yang
University of Georgia
  
   
From grafman at graphcomp.com  Sun Jul 29 17:08:04 2007
From: grafman at graphcomp.com (Grafman Productions)
Date: Sun, 29 Jul 2007 14:08:04 -0700
Subject: [Bioperl-l] Perl 3D OpenGL
Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>

If this posting is inappropriate, please let me know - my apologies.

I recently came across an article on BioPerl, and it occurred to me that 
there might be some need for 3D rendering within your BioPerl project.

I released a number of new/updated Perl OpenGL (POGL) modules this year, 
along with benchmarks that demonstrate that it performs comparably to C.

If there's a need for 3D features within BioPerl, and if I can be of any 
assistance in helping to add such features, I would enjoy the opportunity. 


From torsten.seemann at infotech.monash.edu.au  Mon Jul 30 19:27:46 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 09:27:46 +1000
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <1245168492amadoz@uv.es>
References: <1245168492amadoz@uv.es>
Message-ID: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>

Alicia,

> Hi, i'm trying to run a bioperl script in linux with standaloneblast
> from a webserver but I have the following error:
> -------------------- WARNING ---------------------
> MSG: cannot find path to blastall
> ---------------------------------------------------
> $ENV{BLASTDATADIR} = '/usr/local/data/';
> $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';

I think the last one (or two) paths should be
'/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
BLAST installation is where the 'blastall' binary actually lives.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From cjfields at uiuc.edu  Mon Jul 30 20:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Jul 2007 19:53:45 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
Message-ID: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>


On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:

> I am running remoteblast and using readmethod "xml", I noticed that  
> it is printing the output repeatedly nonstop. It's like in a loop.  
> Did anybody notice this before? Can anybody help me getting out of  
> this?
> Thanks a lot,
>
>
> Guojun Yang
> University of Georgia

Not seeing that using bioperl-live; you may need to update  
RemoteBlast.pm as this sounds similar to an issue that popped up  
earlier in the spring.

chris


From torsten.seemann at infotech.monash.edu.au  Tue Jul 31 02:24:34 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 16:24:34 +1000
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
	<FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
Message-ID: <a79f6a4b0707302324t261687e7g1012e1f536500c09@mail.gmail.com>

> as this sounds similar to an issue that popped up
> earlier in the spring.

I could have sworn it was autumn! ;-)

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From Alicia.Amadoz at uv.es  Tue Jul 31 06:11:54 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
References: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
Message-ID: <2361686267amadoz@uv.es>

Hi, I tried what you suggested and that was it, it works perfectly.
Thank you very much. 

Regards,
Alicia

> Alicia,
> 
> > Hi, i'm trying to run a bioperl script in linux with standaloneblast
> > from a webserver but I have the following error:
> > -------------------- WARNING ---------------------
> > MSG: cannot find path to blastall
> > ---------------------------------------------------
> > $ENV{BLASTDATADIR} = '/usr/local/data/';
> > $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';
> 
> I think the last one (or two) paths should be
> '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
> BLAST installation is where the 'blastall' binary actually lives.
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> 
> 


From jay at jays.net  Tue Jul 31 08:00:56 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 31 Jul 2007 07:00:56 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>

On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
> If this posting is inappropriate, please let me know - my apologies.

Not at all. AFAIK this is the perfect place to discuss any  
contributions you're motivated to make to the BioPerl project.

> I recently came across an article on BioPerl, and it occurred to me  
> that
> there might be some need for 3D rendering within your BioPerl project.
>
> I released a number of new/updated Perl OpenGL (POGL) modules this  
> year,
> along with benchmarks that demonstrate that it performs comparably  
> to C.
>
> If there's a need for 3D features within BioPerl, and if I can be  
> of any
> assistance in helping to add such features, I would enjoy the  
> opportunity.

I know nothing about 3D modeling in biology, nor do I hang out with  
any protein structure folks, but 3D always sounds sexy. -grin-

If you're new to bioinformatics (I certainly am) you might want to  
read this:

   http://en.wikipedia.org/wiki/Protein_structure

Because that's probably where your 3D work would be used. Especially  
note the "Software" section, where you'll find some of the  
"competition".  :)

There's some cool stuff out there. I don't know what all would or  
wouldn't be time well spent in Perl / BioPerl.

HTH,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at uiuc.edu  Tue Jul 31 12:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 11:51:42 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu>

Make sure to keep responses on the ail list.

You might want to run a full install, just in case.  If I remember  
correctly Sendu made some changes a while back in the BLAST-related  
modules which may be related to this.  At the very least install/ 
upgrade all modules in Bio::Tools::Run.

chris

On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote:

> Thanks, Chris,
> But when I replaced the old RemoteBlast.pm with the new one, I got  
> "can't locate the object method "retrieve_parameter"". Does this  
> mean I need to install something else?
> Guojun
>
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast  
> with xml
>
>
>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:
>>>> I am running remoteblast and using readmethod "xml", I noticed that
>>> it is printing the output repeatedly nonstop. It's like in a loop.
>>> Did anybody notice this before? Can anybody help me getting out of
>>> this?
>>> Thanks a lot,
>>>
>>>
>>> Guojun Yang
>>> University of Georgia
>>> Not seeing that using bioperl-live; you may need to update
>> RemoteBlast.pm as this sounds similar to an issue that popped up
>> earlier in the spring.
>>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Jul 31 22:15:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 21:15:45 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
	<25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>
Message-ID: <04BCAD9E-CC25-4F0A-85B1-FBA91C64CE7D@uiuc.edu>


On Jul 31, 2007, at 7:00 AM, Jay Hannah wrote:

> On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
>> If this posting is inappropriate, please let me know - my apologies.
>
> Not at all. AFAIK this is the perfect place to discuss any
> contributions you're motivated to make to the BioPerl project.
>
>> I recently came across an article on BioPerl, and it occurred to me
>> that
>> there might be some need for 3D rendering within your BioPerl  
>> project.
>>
>> I released a number of new/updated Perl OpenGL (POGL) modules this
>> year,
>> along with benchmarks that demonstrate that it performs comparably
>> to C.
>>
>> If there's a need for 3D features within BioPerl, and if I can be
>> of any
>> assistance in helping to add such features, I would enjoy the
>> opportunity.
>
> I know nothing about 3D modeling in biology, nor do I hang out with
> any protein structure folks, but 3D always sounds sexy. -grin-
>
> If you're new to bioinformatics (I certainly am) you might want to
> read this:
>
>    http://en.wikipedia.org/wiki/Protein_structure
>
> Because that's probably where your 3D work would be used. Especially
> note the "Software" section, where you'll find some of the
> "competition".  :)
>
> There's some cool stuff out there. I don't know what all would or
> wouldn't be time well spent in Perl / BioPerl.
>
> HTH,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah

I agree that protein structure is the best place for something like  
this.

It's a wide open area as far as I'm concerned; in fact I would say  
that Bio::Structure is getting pretty dated, so if anyone wants to  
take it over, refactor the code, and so on I don't have a problem.

chris


From cjfields at uiuc.edu  Sun Jul  1 00:40:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Jun 2007 19:40:53 -0500
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
Message-ID: <A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>

Checkout worked for me (Mac OS X) using both:

svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ 
tags/release-0-9-2/t/data
svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ 
tags/release-0-9-2/

so removing the offending file worked (good catch!).  Haven't run a  
full co but probably isn't necessary.

chris

On Jun 30, 2007, at 6:36 PM, Hilmar Lapp wrote:

>
> On Jun 28, 2007, at 3:43 PM, George Hartzell wrote:
>
>> I just did the experiment, and filename-insensitivity seems to be
>> breaking something.
>>
>> I'm using an svn I picked up from http://www.codingmonkeys.de/mbo/.
>>
>> I reformatted a memory stick to be case sensitive and co of
>>
>>   bioperl/bioperl-live/tags/release-0-9-2/t
>>
>> worked, then I made a directory in my home dir (normal mac thing) and
>> got the same error as above.
>
> You picked up a rename of a file from lower case extension to upper  
> case extension. Unfortunately, there are several months between  
> adding the upper-case and removing the lower-case version.
>
> We can reconstruct what happened with this using svn log on the  
> directory (this does not require a checkout):
>
> $ svn log --verbose svn+ssh://dev.open-bio.org/home/hartzell/ 
> bioperl/bioperl-live/trunk/t/data
>
> Searching for HUMBETGLOA yields the following two commits that  
> added one and removed the other:
>
> ---------------------------------------------------------------------- 
> --
> r2245 | jason | 2001-12-08 11:59:05 -0500 (Sat, 08 Dec 2001) | 2 lines
> Changed paths:
>    M /bioperl-live/trunk/t/SearchIO.t
>    A /bioperl-live/trunk/t/data/HUMBETGLOA.FASTA
>    A /bioperl-live/trunk/t/data/cysprot1.FASTA
>
> added tests for FASTA
>
> ---------------------------------------------------------------------- 
> --
> r2877 | jason | 2002-03-11 22:39:40 -0500 (Mon, 11 Mar 2002) | 2 lines
> Changed paths:
>    A /bioperl-live/trunk/t/data/HUMBETGLOA.fa
>    D /bioperl-live/trunk/t/data/HUMBETGLOA.fasta
>
> renaming file to avoid clobbering on windows
>
> Unfortunately, both files are in the tag (again, no checkout  
> required):
>
> $ svn list svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- 
> live/tags/release-0-9-2/t/data | grep HUMBETGLOA | grep -i fasta
> HUMBETGLOA.FASTA
> HUMBETGLOA.fasta
>
> We can remove the offending version from the repository (again,  
> without needing a checkout):
>
> $ svn rm svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- 
> live/tags/release-0-9-2/t/data/HUMBETGLOA.fasta
>
> I did this, and now the tag checks out fine on OSX. Can anyone  
> confirm?
>
> (BTW the ability to operate on the repository w/o needing a  
> checkout is another advantage of svn)
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hartzell at alerce.com  Sun Jul  1 00:48:06 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 30 Jun 2007 17:48:06 -0700
Subject: [Bioperl-l] Take 2 of the new subversion repository.
Message-ID: <18054.63942.316904.413911@almost.alerce.com>


There's a second cut at the subversion repository.  I've done a better
job of setting svn:keywords and svn:eol-style on various files.  The
defaults were more cautious and I used an auto-props files based on
the wiki version.

  svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2

The old repository's still around as

  svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1

I renamed it so that people would work with it by mistake.  If, for
some hard-to-imagine reason, you have a working copy that you want to
run against it, you should be able to do an svn switch --relocate on
your working copy and be back in shape.  In fact, it might be a good
time to give it a try....

g.


From hartzell at alerce.com  Sun Jul  1 01:17:18 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 30 Jun 2007 18:17:18 -0700
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
Message-ID: <18055.158.30409.808612@almost.alerce.com>

Chris Fields writes:
 > Checkout worked for me (Mac OS X) using both:
 > 
 > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ 
 > tags/release-0-9-2/t/data
 > svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl-live/ 
 > tags/release-0-9-2/
 > 
 > so removing the offending file worked (good catch!).  Haven't run a  
 > full co but probably isn't necessary.
 > [...]

I'll keep a note of that as something to do when I prepare the final
cut of the repository.

g.


From jason at bioperl.org  Sun Jul  1 01:25:30 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 30 Jun 2007 18:25:30 -0700
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>

Thanks George -
I also did
chgrp -R bioperl /home/hartzell/bioperl_take?
to make sure the group permission was set right.

We may also want to do a chmod g+s on all the dirs in there as well  
so that permissions are preserved when this gets deployed for real.

If anyone wants to make some changes to files and commit them, as  
well as make some branches/tags to play around a little bit since  
we'll likely throw this away and do it again from locked down version  
from CVS at some appointed time.

Do you know how to have svn commit messages generate summary emails  
as well?

-j
On Jun 30, 2007, at 5:48 PM, George Hartzell wrote:

>
> There's a second cut at the subversion repository.  I've done a better
> job of setting svn:keywords and svn:eol-style on various files.  The
> defaults were more cautious and I used an auto-props files based on
> the wiki version.
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>
> The old repository's still around as
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>
> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it, you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From hlapp at gmx.net  Sun Jul  1 02:21:25 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Jun 2007 22:21:25 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <5F53A433-BAA9-431D-A0C5-5955690D0B73@gmx.net>


On Jun 30, 2007, at 8:48 PM, George Hartzell wrote:

> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it,

It's not so hard to imagine - checking out the entire repository  
takes a long time.

> you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....

It doesn't work:

svn: The repository at 'svn+ssh://dev.open-bio.org/home/hartzell/ 
bioperl_take2' has uuid '31277767-6726-dc11-ab4c-0019e3f901d6', but  
the WC has '27e854f1-f323-dc11-8c1b-0019e3f901d6'

You can't relocate to a totally new repository (relocating to  
bioperl_take1 does work though).

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Jul  1 02:39:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Jun 2007 21:39:27 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
Message-ID: <7C6FD6C9-CBED-40D3-BA90-4B34F79E6DE0@uiuc.edu>

There are a few CPAN modules available; here's one:

http://search.cpan.org/~dwheeler/SVN-Notify-2.66/lib/SVN/Notify.pm

chris

On Jun 30, 2007, at 8:25 PM, Jason Stajich wrote:

> Thanks George -
> I also did
> chgrp -R bioperl /home/hartzell/bioperl_take?
> to make sure the group permission was set right.
>
> We may also want to do a chmod g+s on all the dirs in there as well
> so that permissions are preserved when this gets deployed for real.
>
> If anyone wants to make some changes to files and commit them, as
> well as make some branches/tags to play around a little bit since
> we'll likely throw this away and do it again from locked down version
> from CVS at some appointed time.
>
> Do you know how to have svn commit messages generate summary emails
> as well?
>
> -j
> On Jun 30, 2007, at 5:48 PM, George Hartzell wrote:
>
>>
>> There's a second cut at the subversion repository.  I've done a  
>> better
>> job of setting svn:keywords and svn:eol-style on various files.  The
>> defaults were more cautious and I used an auto-props files based on
>> the wiki version.
>>
>>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>>
>> The old repository's still around as
>>
>>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>>
>> I renamed it so that people would work with it by mistake.  If, for
>> some hard-to-imagine reason, you have a working copy that you want to
>> run against it, you should be able to do an svn switch --relocate on
>> your working copy and be back in shape.  In fact, it might be a good
>> time to give it a try....
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Jul  1 02:46:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 30 Jun 2007 21:46:05 -0500
Subject: [Bioperl-l] Splits again
In-Reply-To: <4686CC04.6000403@sendu.me.uk>
References: <467949EC.9040100@sendu.me.uk>
	<467FBDD3.8050009@sendu.me.uk>	<46823ABE.2080300@sendu.me.uk>
	<4682B000.2050707@sheffield.ac.uk>	<A17327A5-A174-4110-B793-A80775D80623@uiuc.edu>	<4682B798.1010409@sheffield.ac.uk>
	<4682C6F5.4020406@sendu.me.uk> <4682D12E.3000803@sendu.me.uk>
	<2517AA40-9CDF-44F0-9665-107549DFD30C@uiuc.edu>
	<4682E824.1050507@sendu.me.uk>
	<FBAC5A51-B894-4508-996F-B0248CCF5022@uiuc.edu>
	<4683624F.6020402@sendu.me.uk>
	<CFF085C7-89F1-4DB7-BDA2-935E96AEEE5B@uiuc.edu>
	<4683DBEA.90005@sendu.me.uk>
	<904D660A-3A2F-46F5-A198-0C00CBBF14C1@uiuc.edu>
	<468409C7.7020102@sendu.me.uk>
	<A910978B-C0E9-40DE-B674-7B693520807E@gmx.net>
	<4686CC04.6000403@sendu.me.uk>
Message-ID: <D10BF6DE-D8A6-448A-8850-A7B13AE54266@uiuc.edu>


On Jun 30, 2007, at 4:32 PM, Sendu Bala wrote:

> Hilmar Lapp wrote:
>> On Jun 28, 2007, at 3:19 PM, Sendu Bala wrote:
>>> [...]
>>> Very definitely the latter. The key benefit of my approach is  
>>> that  the organisation stays as is and that a snapshot of the  
>>> repository  remains a single directory of modules in Bio so that  
>>> people don't  have to 'install' Bioperl, they can still just  
>>> uncompress the  archive (or check out the package from svn) and  
>>> point their  PERL5LIB to the root dir of the package.
> [snip]
>> In this sense, I understand a release pumpkin will generate ~900   
>> packages to upload to CPAN? How much hassle is that compared to  
>> what  uploading a bioperl release means right now?
>
> I'd have to investigate. I did my uploads using the PAUSE website,  
> which for 900 packages would be unfeasible. Will have to see if the  
> process can be automated.

Not that they would care one way or another but maybe we should  
contact the CPAN maintainers to get their thoughts.  They might have  
some ideas...

>> How brittle is all the Build.PL code that will be needed to  
>> automate  all of this, and how difficult will it be to maintain?  
>> For example,  if someone adds in 10 new modules, what Build.PL- 
>> related work will  need to be done?
>
> Well, my plan will be that once the work is done, you won't need to  
> touch the Build.PL code again. My intent is that the pumpkin can  
> just type one command and not think about anything.
>
> As for the reality, I won't know until I think about it properly  
> and experiment.

A good experiment for a branch.  I still think this could be  
accomplished step-wise; for instance run a quick test using something  
with a simple dependency tree like Bio::Root::Root (only needs  
RootI), finish up with Bio::Root*, then work down into PrimarySeq,  
Seq, etc.  Submit them to CPAN piecemeal or in batches (all  
Bio::Seq*, so on).

If the Build.PL, etc are to be generated on the fly then maybe there  
should be a simple way of registering or matching tests to modules  
(or vice versa) to ease the pain, particularly for new code.

chris


From hlapp at gmx.net  Sun Jul  1 02:56:04 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 30 Jun 2007 22:56:04 -0400
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
Message-ID: <E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>

It turns out that both files are also present on the release-0-9-3,  
bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add

$ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ 
HUMBETGLOA.fasta
$ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ 
HUMBETGLOA.fasta
$ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ 
HUMBETGLOA.fasta
$ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ 
HUMBETGLOA.fasta

to the post-processing commands.

	-hilmar

On Jun 30, 2007, at 8:40 PM, Chris Fields wrote:

> Checkout worked for me (Mac OS X) using both:
>
> svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- 
> live/tags/release-0-9-2/t/data
> svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- 
> live/tags/release-0-9-2/
>
> so removing the offending file worked (good catch!).  Haven't run a  
> full co but probably isn't necessary.
>
> chris
>
> On Jun 30, 2007, at 6:36 PM, Hilmar Lapp wrote:
>
>>
>> On Jun 28, 2007, at 3:43 PM, George Hartzell wrote:
>>
>>> I just did the experiment, and filename-insensitivity seems to be
>>> breaking something.
>>>
>>> I'm using an svn I picked up from http://www.codingmonkeys.de/mbo/.
>>>
>>> I reformatted a memory stick to be case sensitive and co of
>>>
>>>   bioperl/bioperl-live/tags/release-0-9-2/t
>>>
>>> worked, then I made a directory in my home dir (normal mac thing)  
>>> and
>>> got the same error as above.
>>
>> You picked up a rename of a file from lower case extension to  
>> upper case extension. Unfortunately, there are several months  
>> between adding the upper-case and removing the lower-case version.
>>
>> We can reconstruct what happened with this using svn log on the  
>> directory (this does not require a checkout):
>>
>> $ svn log --verbose svn+ssh://dev.open-bio.org/home/hartzell/ 
>> bioperl/bioperl-live/trunk/t/data
>>
>> Searching for HUMBETGLOA yields the following two commits that  
>> added one and removed the other:
>>
>> --------------------------------------------------------------------- 
>> ---
>> r2245 | jason | 2001-12-08 11:59:05 -0500 (Sat, 08 Dec 2001) | 2  
>> lines
>> Changed paths:
>>    M /bioperl-live/trunk/t/SearchIO.t
>>    A /bioperl-live/trunk/t/data/HUMBETGLOA.FASTA
>>    A /bioperl-live/trunk/t/data/cysprot1.FASTA
>>
>> added tests for FASTA
>>
>> --------------------------------------------------------------------- 
>> ---
>> r2877 | jason | 2002-03-11 22:39:40 -0500 (Mon, 11 Mar 2002) | 2  
>> lines
>> Changed paths:
>>    A /bioperl-live/trunk/t/data/HUMBETGLOA.fa
>>    D /bioperl-live/trunk/t/data/HUMBETGLOA.fasta
>>
>> renaming file to avoid clobbering on windows
>>
>> Unfortunately, both files are in the tag (again, no checkout  
>> required):
>>
>> $ svn list svn+ssh://dev.open-bio.org/home/hartzell/bioperl/ 
>> bioperl-live/tags/release-0-9-2/t/data | grep HUMBETGLOA | grep -i  
>> fasta
>> HUMBETGLOA.FASTA
>> HUMBETGLOA.fasta
>>
>> We can remove the offending version from the repository (again,  
>> without needing a checkout):
>>
>> $ svn rm svn+ssh://dev.open-bio.org/home/hartzell/bioperl/bioperl- 
>> live/tags/release-0-9-2/t/data/HUMBETGLOA.fasta
>>
>> I did this, and now the tag checks out fine on OSX. Can anyone  
>> confirm?
>>
>> (BTW the ability to operate on the repository w/o needing a  
>> checkout is another advantage of svn)
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From dmessina at wustl.edu  Sun Jul  1 05:38:48 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Jul 2007 00:38:48 -0500
Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn
	repository]
In-Reply-To: <46869226.70203@sheffield.ac.uk>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>	<4673C7CB.1030709@mail.nih.gov>	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>	<18049.30026.61328.134490@almost.alerce.com>	<5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu>	<BFBA575A-E653-40F6-9242-D72655B6AE9C@wustl.edu>	<E83D9D3C-96F2-4B5A-B503-09C3860586D0@gmx.net>	<D7111143-D173-42DE-AAEF-C2365AA453A0@wustl.edu>	<18051.44281.831316.749586@almost.alerce.com>	<F5B048F4-CBA5-493A-8A5C-2033709D8A63@wustl.edu>
	<18051.61992.627473.323346@almost.alerce.com>
	<4684AF3D.5090907@sheffield.ac.uk>
	<843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu>
	<468628AC.9060200@sheffield.ac.uk>
	<461F64B9-87FD-458A-8945-8238E7076109@wustl.edu>
	<46869226.70203@sheffield.ac.uk>
Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu>


> [Nath]
> I think the list of seq formats recognised by Bioperl in Bio::SeqIO  
> and
> Bio::AlignIO would be a good start. As these are likely to be the ones
> that are sensitive to file format recognition and thus could break  
> tests
> if renamed.

Sounds good to me. I will do a quick tour of the rest of the repo  
looking for other common or important file extensions, but I don't  
expect there to be many if any.


> [still Nath]
> I think a lot of people have used "." in file names as an  
> alternative to
> a space. I think it would be beneficial to use an underscore "_" in
> these cases and leave the "." to represent the beginning of the file
> extension.

That's a great idea.


> [Chris]
> Do we need to define every filetype extension, or can there be a  
> fallback (eg if it isn't on the list or has no extension it's plain  
> text)?

For every file that's added, svn takes a peek to see if it's human- 
readable. If not, it's tagged with the generic MIME type application/ 
octet-stream. (It does this so it knows not to try to do diffs and  
merges on a binary file.)

So the default for a human-readable file is no MIME type, which I  
believe is essentially the same thing as text/plain.

And then regardless of the outcome of svn's peek, any matching auto- 
props are then applied, overriding svn's choice.

So if we don't define every extension, I think we'll be fine. It'd be  
nice to have everything tagged with a MIME type, though. For one  
thing, Apache will use it to do the right thing when people browse  
the repo over the web. And two, because metadata is cool. :)

One more thing: in the course of reading up on this, I learned that  
my earlier expectation about multiple auto-prop matches was  
incorrect. It's true that multiple unrelated matches means that  
multiple properties are set on the file. But when a file matches  
multiple *conflicting* auto-property patterns, there's no telling  
which value it'll get.


Dave


From hartzell at alerce.com  Sun Jul  1 16:29:29 2007
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 1 Jul 2007 09:29:29 -0700
Subject: [Bioperl-l] First cut svn repository
In-Reply-To: <E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca>
	<bba689ec0706151440o56a7d6c6ncf72a37cd2b2cdc5@mail.gmail.com>
	<185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net>
	<8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu>
	<4673C7CB.1030709@mail.nih.gov>
	<410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu>
	<18049.30026.61328.134490@almost.alerce.com>
	<4683A7D1.8070403@sendu.me.uk>
	<18051.48684.996884.134046@almost.alerce.com>
	<4683C385.3050904@sendu.me.uk>
	<18051.63674.685297.426813@almost.alerce.com>
	<D554E628-AB22-44C2-B253-3CDDB3F71253@uiuc.edu>
	<18052.3946.224905.415905@almost.alerce.com>
	<2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net>
	<A348C2D6-F00B-4E76-A78F-E192A912E785@uiuc.edu>
	<E250DB37-E2C1-4F71-A2FE-B64603EB69FD@gmx.net>
Message-ID: <18055.54889.677775.868974@almost.alerce.com>

Hilmar Lapp writes:
 > It turns out that both files are also present on the release-0-9-3,  
 > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add
 > 
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ 
 > HUMBETGLOA.fasta
 > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ 
 > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ 
 > HUMBETGLOA.fasta
 > 
 > to the post-processing commands.
 > [...]

Will do.  Thanks for working out the incantations!

g.


From cjfields at uiuc.edu  Mon Jul  2 13:26:06 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:26:06 -0500
Subject: [Bioperl-l] test data
Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>

I am planing on adding test data to cvs for eutils and have run  
across some stuff in bugzilla that needs to be added as well.

Should we, as convention, start adding data sequestered to a fold  
with the test name, within t/data?  This might make life easier in  
the long run (keep track of files, get rid of old files, etc), and  
may make it easier for wrapping up the correct data with tests if we  
start submitting single module CPAN updates.

chris


From cjfields at uiuc.edu  Mon Jul  2 13:52:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Jul 2007 08:52:27 -0500
Subject: [Bioperl-l] test data
In-Reply-To: <468901C1.8020505@sendu.me.uk>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>

On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I am planing on adding test data to cvs for eutils and have run  
>> across some stuff in bugzilla that needs to be added as well.
>> Should we, as convention, start adding data sequestered to a fold  
>> with the test name, within t/data?
>
> I'd actually argue that this shouldn't be done: data is sometimes  
> reused amongst multiple different test scripts, and when looking  
> for data to reuse its easier to spot it in a single directory  
> compared to searching through multiple directories.
>
>
>> This might make life easier in the long run (keep track of files,  
>> get rid of old files, etc), and may make it easier for wrapping up  
>> the correct data with tests if we start submitting single module  
>> CPAN updates.
>
> I don't think that will be an issue. The automated process would  
> read the test script and see what input files it uses, copying  
> those into the archive. So, just be sure to standardise on using  
> test_input_file() to make that possible.
>
>
> That said, I wouldn't mind especially either way. Just don't do it  
> now, since test script names (and therefore the name of the  
> directory you'd want to store the input files in) might all change.
>
>
> In fact we can imagine that we have a test script t/ 
> BioZombieKitten.t which stores its test data in t/data/ 
> BioZombieKitten/input.file but the script gets the path to this  
> file by:
> my $input_file = test_input_file('input.file');
>
> test_input_file() is then implemented to look for the file in the  
> subdir of data corresponding to the script name if we're dealing  
> with the 900-modules-in-a-package checkout-type situation, but just  
> in t/data if we're in the one-module-in-a-package situation.
>
> In any case, things will be most flexible if you drop files  
> directly into t/data for now and reference them without any subdirs  
> in the call to test_input_file().

Fine by me, I just find it very cluttered.

BioZombieKitten?!?

chris


From bix at sendu.me.uk  Mon Jul  2 14:00:37 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 15:00:37 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
	<468901C1.8020505@sendu.me.uk>
	<61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu>
Message-ID: <46890505.1070707@sendu.me.uk>

Chris Fields wrote:
> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote:
> Fine by me, I just find it very cluttered.

Yes, I agree. I also wish we had a decent naming convention for files. 
(Ie. it would be nice to have a good idea what a file was for without 
having to study the test script that uses it.)


> BioZombieKitten?!?

I get Bio/perl/ and Bio/ware/ confused in my head ;)
http://forums.bioware.com/viewtopic.html?topic=562916&forum=84


From bix at sendu.me.uk  Mon Jul  2 13:46:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 02 Jul 2007 14:46:41 +0100
Subject: [Bioperl-l] test data
In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu>
Message-ID: <468901C1.8020505@sendu.me.uk>

Chris Fields wrote:
> I am planing on adding test data to cvs for eutils and have run across 
> some stuff in bugzilla that needs to be added as well.
> 
> Should we, as convention, start adding data sequestered to a fold with 
> the test name, within t/data?

I'd actually argue that this shouldn't be done: data is sometimes reused 
amongst multiple different test scripts, and when looking for data to 
reuse its easier to spot it in a single directory compared to searching 
through multiple directories.


> This might make life easier in the long 
> run (keep track of files, get rid of old files, etc), and may make it 
> easier for wrapping up the correct data with tests if we start 
> submitting single module CPAN updates.

I don't think that will be an issue. The automated process would read 
the test script and see what input files it uses, copying those into the 
archive. So, just be sure to standardise on using test_input_file() to 
make that possible.


That said, I wouldn't mind especially either way. Just don't do it now, 
since test script names (and therefore the name of the directory you'd 
want to store the input files in) might all change.


In fact we can imagine that we have a test script t/BioZombieKitten.t 
which stores its test data in t/data/BioZombieKitten/input.file but the 
script gets the path to this file by:
my $input_file = test_input_file('input.file');

test_input_file() is then implemented to look for the file in the subdir 
of data corresponding to the script name if we're dealing with the 
900-modules-in-a-package checkout-type situation, but just in t/data if 
we're in the one-module-in-a-package situation.

In any case, things will be most flexible if you drop files directly 
into t/data for now and reference them without any subdirs in the call 
to test_input_file().


From hlapp at gmx.net  Mon Jul  2 20:02:37 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 16:02:37 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18054.63942.316904.413911@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
Message-ID: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>

Just FYI, after applying the changes I've been sending, I was able to  
check out the repository in its entirety.

	-hilmar

On Jun 30, 2007, at 8:48 PM, George Hartzell wrote:

>
> There's a second cut at the subversion repository.  I've done a better
> job of setting svn:keywords and svn:eol-style on various files.  The
> defaults were more cautious and I used an auto-props files based on
> the wiki version.
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2
>
> The old repository's still around as
>
>   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1
>
> I renamed it so that people would work with it by mistake.  If, for
> some hard-to-imagine reason, you have a working copy that you want to
> run against it, you should be able to do an svn switch --relocate on
> your working copy and be back in shape.  In fact, it might be a good
> time to give it a try....
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From wrp at virginia.edu  Mon Jul  2 20:08:04 2007
From: wrp at virginia.edu (William R. Pearson)
Date: Mon, 2 Jul 2007 16:08:04 -0400
Subject: [Bioperl-l] Course: Computational and Comparative Genomics
Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu>


Course announcement - Application deadline, July 15, 2007

================================================================

Cold Spring Harbor
COMPUTATIONAL & COMPARATIVE GENOMICS
November 7 - 13, 200
Application Deadline: July 15, 2007

INSTRUCTORS:

Pearson, William, Ph.D., University of Virginia, Charlottesville, VA
Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of
Prussia, PA

Beyond BLAST and FASTA - Alignment: from proteins to genomes - This
course presents a comprehensive overview of the theory and practice of
computational methods for extracting the maximum amount of information
from protein and DNA sequence similarity through sequence database
searches, statistical analysis, and multiple sequence alignment, and
genome scale alignment. Additional topics include gene finding,
dentifying signals in unaligned sequences, integration of genetic and
sequence information in biological databases.

The course combines lectures with hands-on exercises; students are
encouraged to pose challenging sequence analysis problems using their
own data. The course makes extensive use of local WWW pages to present
problem sets and the computing tools to solve them. Students use
Windows and Mac workstations attached to a UNIX server.

The course is designed for biologists seeking advanced training in
biological sequence analysis, computational biology core resource
directors and staff, and for scientists in other disciplines, such as
computer science, who wish to survey current research problems in
biological sequence analysis and comparative genomics.

The primary focus of the Computational and Comparative Genomics Course
is the theory and practice of algorithms used in computational
biology, with the goal of using current methods more effectively and
developing new algorithms. Cold Spring Harbor also offers a
"Programming for Biology" course, which focuses more on software
development.

For additional information and the lecture schedule and problem sets
for the 2006 course, see:

         http://fasta.bioch.virginia.edu/cshl06

================================================================

To apply to the course, fill out and send in the form at:

         http://meetings.cshl.edu/courses/courseapplication.asp

================================================================

Bill Pearson


From niels at genomics.dk  Mon Jul  2 20:45:07 2007
From: niels at genomics.dk (Niels Larsen)
Date: Mon, 02 Jul 2007 22:45:07 +0200
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
References: <18054.63942.316904.413911@almost.alerce.com>
	<F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
Message-ID: <468963D3.3000007@genomics.dk>

I write hoping someone could show me how to create a PrimarySeq
object without parsing features and all first. The lines below
return

"Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16."

whereas calling Bio::SeqIO-> gives no error, but a too big object.
The GenBank record after the __END__ is the "1.gb" file. I could not
find out how from the tutorial or the Bio::PrimarySeq description.

Niels L


#!/usr/bin/env perl

use strict;
use warnings FATAL => qw ( all );

use Data::Dumper;

use Bio::Seq;
use Bio::SeqIO;

my ( $seq_h, $seq );

$seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' );
# $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' );

$seq = $seq_h->next_seq();

# print Dumper( $seq );

__END__

LOCUS       X60065                     9 bp    mRNA    linear   MAM 14-NOV-2006
DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
ACCESSION   X60065 REGION: 1..9
VERSION     X60065.1  GI:5
KEYWORDS    beta-2 glycoprotein I.
SOURCE      Bos taurus (cattle)
   ORGANISM  Bos taurus
             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
             Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
             Pecora; Bovidae; Bovinae; Bos.
REFERENCE   1
   AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and
             Kristensen,T.
   TITLE     Complete primary structure of bovine beta 2-glycoprotein I:
             localization of the disulfide bridges
   JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
    PUBMED   1567819
REFERENCE   2  (bases 1 to 9)
   AUTHORS   Kristensen,T.
   TITLE     Direct Submission
   JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology,
             University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C,
             DENMARK
FEATURES             Location/Qualifiers
      source          1..9
                      /organism="Bos taurus"
                      /mol_type="mRNA"
                      /db_xref="taxon:9913"
                      /clone="pBB2I"
                      /tissue_type="liver"
      gene            <1..>9
                      /gene="beta-2-gpI"
      CDS             <1..>9
                      /gene="beta-2-gpI"
                      /codon_start=1
                      /product="beta-2-glycoprotein I"
                      /protein_id="CAA42669.1"
                      /db_xref="GI:6"
                      /db_xref="GOA:P17690"
                      /db_xref="UniProtKB/Swiss-Prot:P17690"
                      /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
                      VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
                      ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
                      SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
                      PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
                      VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
                      DASDVKPC"
      sig_peptide     <1..>9
                      /gene="beta-2-gpI"
ORIGIN
         1 ccagcgctc
//


From Kevin.M.Brown at asu.edu  Mon Jul  2 21:35:12 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 2 Jul 2007 14:35:12 -0700
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <468963D3.3000007@genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>

Start by having a look at the following link:
http://bioperl.org/cgi-bin/deob_interface.cgi

SeqIO is how one reads or writes sequences to/from files.
Bio::PrimarySeq is just an object that holds information about a
sequence obtained from a file.

As for how to parse a Genbank file into a list of features:

$file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
while (my $seq = $file->next_seq())
{
	@features = $seq->all_SeqFeatures;
	# sort features by their primary tags
	for my $f (@features)
	{
		my $tag = $f->primary_tag;
		if ($tag eq 'CDS')
		{
			# @sorted_features holds all the Bio::PrimarySeq
features obtained from the genbank file
			push @sorted_features, $f; 
		}
	}
}
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Niels Larsen
> Sent: Monday, July 02, 2007 1:45 PM
> Cc: bioperl-l List
> Subject: [Bioperl-l] simple PrimarySeq question
> 
> I write hoping someone could show me how to create a 
> PrimarySeq object without parsing features and all first. The 
> lines below return
> 
> "Can't locate object method "next_seq" via package 
> "Bio::PrimarySeq" at ./tst2 line 16."
> 
> whereas calling Bio::SeqIO-> gives no error, but a too big object.
> The GenBank record after the __END__ is the "1.gb" file. I 
> could not find out how from the tutorial or the 
> Bio::PrimarySeq description.
> 
> Niels L
> 
> 
> #!/usr/bin/env perl
> 
> use strict;
> use warnings FATAL => qw ( all );
> 
> use Data::Dumper;
> 
> use Bio::Seq;
> use Bio::SeqIO;
> 
> my ( $seq_h, $seq );
> 
> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 
> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", 
> -format => 'genbank' );
> 
> $seq = $seq_h->next_seq();
> 
> # print Dumper( $seq );
> 
> __END__
> 
> LOCUS       X60065                     9 bp    mRNA    linear 
>   MAM 14-NOV-2006
> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
> ACCESSION   X60065 REGION: 1..9
> VERSION     X60065.1  GI:5
> KEYWORDS    beta-2 glycoprotein I.
> SOURCE      Bos taurus (cattle)
>    ORGANISM  Bos taurus
>              Eukaryota; Metazoa; Chordata; Craniata; 
> Vertebrata; Euteleostomi;
>              Mammalia; Eutheria; Laurasiatheria; 
> Cetartiodactyla; Ruminantia;
>              Pecora; Bovidae; Bovinae; Bos.
> REFERENCE   1
>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S., 
> Sottrup-Jensen,L. and
>              Kristensen,T.
>    TITLE     Complete primary structure of bovine beta 
> 2-glycoprotein I:
>              localization of the disulfide bridges
>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>     PUBMED   1567819
> REFERENCE   2  (bases 1 to 9)
>    AUTHORS   Kristensen,T.
>    TITLE     Direct Submission
>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of 
> Mol Biology,
>              University of Aarhus, C F Mollers Alle 130, 
> DK-8000 Aarhus C,
>              DENMARK
> FEATURES             Location/Qualifiers
>       source          1..9
>                       /organism="Bos taurus"
>                       /mol_type="mRNA"
>                       /db_xref="taxon:9913"
>                       /clone="pBB2I"
>                       /tissue_type="liver"
>       gene            <1..>9
>                       /gene="beta-2-gpI"
>       CDS             <1..>9
>                       /gene="beta-2-gpI"
>                       /codon_start=1
>                       /product="beta-2-glycoprotein I"
>                       /protein_id="CAA42669.1"
>                       /db_xref="GI:6"
>                       /db_xref="GOA:P17690"
>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>                       
> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>                       
> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>                       
> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>                       
> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>                       
> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>                       
> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>                       DASDVKPC"
>       sig_peptide     <1..>9
>                       /gene="beta-2-gpI"
> ORIGIN
>          1 ccagcgctc
> //
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From niels at genomics.dk  Tue Jul  3 00:41:24 2007
From: niels at genomics.dk (niels at genomics.dk)
Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST)
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>

Kevin,

Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
gets entries from file, and from those large parsed entries I can get a
simplified primary_seq object. But the SeqIO object includes feature
and annotation objects etc that takes time to make, and I wish to know
if there is a way to get a primari_seq object without this overhead. I
apologize if I overlooked it in the docs.

Niels


> Start by having a look at the following link:
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> SeqIO is how one reads or writes sequences to/from files.
> Bio::PrimarySeq is just an object that holds information about a
> sequence obtained from a file.
>
> As for how to parse a Genbank file into a list of features:
>
> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
> while (my $seq = $file->next_seq())
> {
> 	@features = $seq->all_SeqFeatures;
> 	# sort features by their primary tags
> 	for my $f (@features)
> 	{
> 		my $tag = $f->primary_tag;
> 		if ($tag eq 'CDS')
> 		{
> 			# @sorted_features holds all the Bio::PrimarySeq
> features obtained from the genbank file
> 			push @sorted_features, $f;
> 		}
> 	}
> }
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Niels Larsen
>> Sent: Monday, July 02, 2007 1:45 PM
>> Cc: bioperl-l List
>> Subject: [Bioperl-l] simple PrimarySeq question
>>
>> I write hoping someone could show me how to create a
>> PrimarySeq object without parsing features and all first. The
>> lines below return
>>
>> "Can't locate object method "next_seq" via package
>> "Bio::PrimarySeq" at ./tst2 line 16."
>>
>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>> The GenBank record after the __END__ is the "1.gb" file. I
>> could not find out how from the tutorial or the
>> Bio::PrimarySeq description.
>>
>> Niels L
>>
>>
>> #!/usr/bin/env perl
>>
>> use strict;
>> use warnings FATAL => qw ( all );
>>
>> use Data::Dumper;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>>
>> my ( $seq_h, $seq );
>>
>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>> -format => 'genbank' );
>>
>> $seq = $seq_h->next_seq();
>>
>> # print Dumper( $seq );
>>
>> __END__
>>
>> LOCUS       X60065                     9 bp    mRNA    linear
>>   MAM 14-NOV-2006
>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>> ACCESSION   X60065 REGION: 1..9
>> VERSION     X60065.1  GI:5
>> KEYWORDS    beta-2 glycoprotein I.
>> SOURCE      Bos taurus (cattle)
>>    ORGANISM  Bos taurus
>>              Eukaryota; Metazoa; Chordata; Craniata;
>> Vertebrata; Euteleostomi;
>>              Mammalia; Eutheria; Laurasiatheria;
>> Cetartiodactyla; Ruminantia;
>>              Pecora; Bovidae; Bovinae; Bos.
>> REFERENCE   1
>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>> Sottrup-Jensen,L. and
>>              Kristensen,T.
>>    TITLE     Complete primary structure of bovine beta
>> 2-glycoprotein I:
>>              localization of the disulfide bridges
>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>     PUBMED   1567819
>> REFERENCE   2  (bases 1 to 9)
>>    AUTHORS   Kristensen,T.
>>    TITLE     Direct Submission
>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>> Mol Biology,
>>              University of Aarhus, C F Mollers Alle 130,
>> DK-8000 Aarhus C,
>>              DENMARK
>> FEATURES             Location/Qualifiers
>>       source          1..9
>>                       /organism="Bos taurus"
>>                       /mol_type="mRNA"
>>                       /db_xref="taxon:9913"
>>                       /clone="pBB2I"
>>                       /tissue_type="liver"
>>       gene            <1..>9
>>                       /gene="beta-2-gpI"
>>       CDS             <1..>9
>>                       /gene="beta-2-gpI"
>>                       /codon_start=1
>>                       /product="beta-2-glycoprotein I"
>>                       /protein_id="CAA42669.1"
>>                       /db_xref="GI:6"
>>                       /db_xref="GOA:P17690"
>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>
>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>
>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>
>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>
>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>
>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>
>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>                       DASDVKPC"
>>       sig_peptide     <1..>9
>>                       /gene="beta-2-gpI"
>> ORIGIN
>>          1 ccagcgctc
>> //
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From hlapp at gmx.net  Tue Jul  3 02:36:19 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 2 Jul 2007 22:36:19 -0400
Subject: [Bioperl-l] simple PrimarySeq question
In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
References: <18054.63942.316904.413911@almost.alerce.com><F5F3DC8B-47B6-4B9E-8B94-82A514BDE7FD@gmx.net>
	<468963D3.3000007@genomics.dk>
	<1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu>
	<23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk>
Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net>

Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have  
examples for what you want to do:

      use Bio::SeqIO;
      # usually you won't instantiate this yourself - a SeqIO object -
      # you will have one already
      my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
      my $builder = $seqin->sequence_builder();

      # if you need only sequence, id, and description (e.g. for
      # conversion to FASTA format):
      $builder->want_none();
      $builder->add_wanted_slot('display_id','desc','seq');

      # if you want everything except the sequence and features
      $builder->want_all(1); # this is the default if it's untouched
      $builder->add_unwanted_slot('seq','features');

Let us know if that doesn't answer your question.

Note that this is currently only implemented for Genbank format.

	-hilmar

On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:

> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can  
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> 	@features = $seq->all_SeqFeatures;
>> 	# sort features by their primary tags
>> 	for my $f (@features)
>> 	{
>> 		my $tag = $f->primary_tag;
>> 		if ($tag eq 'CDS')
>> 		{
>> 			# @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> 			push @sorted_features, $f;
>> 		}
>> 	}
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS       X60065                     9 bp    mRNA    linear
>>>   MAM 14-NOV-2006
>>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION   X60065 REGION: 1..9
>>> VERSION     X60065.1  GI:5
>>> KEYWORDS    beta-2 glycoprotein I.
>>> SOURCE      Bos taurus (cattle)
>>>    ORGANISM  Bos taurus
>>>              Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>              Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>>              Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE   1
>>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>>              Kristensen,T.
>>>    TITLE     Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>>              localization of the disulfide bridges
>>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>>     PUBMED   1567819
>>> REFERENCE   2  (bases 1 to 9)
>>>    AUTHORS   Kristensen,T.
>>>    TITLE     Direct Submission
>>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>>              University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>>              DENMARK
>>> FEATURES             Location/Qualifiers
>>>       source          1..9
>>>                       /organism="Bos taurus"
>>>                       /mol_type="mRNA"
>>>                       /db_xref="taxon:9913"
>>>                       /clone="pBB2I"
>>>                       /tissue_type="liver"
>>>       gene            <1..>9
>>>                       /gene="beta-2-gpI"
>>>       CDS             <1..>9
>>>                       /gene="beta-2-gpI"
>>>                       /codon_start=1
>>>                       /product="beta-2-glycoprotein I"
>>>                       /protein_id="CAA42669.1"
>>>                       /db_xref="GI:6"
>>>                       /db_xref="GOA:P17690"
>>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>>                       DASDVKPC"
>>>       sig_peptide     <1..>9
>>>                       /gene="beta-2-gpI"
>>> ORIGIN
>>>          1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From ewijaya at gmail.com  Tue Jul  3 06:56:30 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 14:56:30 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at gmail.com  Tue Jul  3 07:00:16 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Tue, 3 Jul 2007 15:00:16 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com>

Dear all,
I was trying to perform check with this command:

$ perl -MGD -e 'print $GD::VERSION';

And it gave:

GD object version 2.32 does not match $GD::VERSION 2.35 at
/usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

Similarly my script that uses GD.pm doesn't execute.


I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29

Can anybody suggest how can I resolve my problem?

This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi

--
Edward


From ewijaya at i2r.a-star.edu.sg  Tue Jul  3 06:35:12 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 3 Jul 2007 14:35:12 +0800
Subject: [Bioperl-l] Problem with GD.pm version 2.35
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net>

 
Dear all, 
I was trying to perform check with this command:
 
$ perl -MGD -e 'print $GD::VERSION';

And it gave: 
 
GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
Compilation failed in require.
BEGIN failed--compilation aborted.

 
I have installed the latest version of libgd version 2.0.35 downloaded from
http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
 
Can anybody suggest how can I resolve my problem?
 
This is my Perl version:
This is perl, v5.8.8 built for i386-linux-thread-multi
 
--
Edward

------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.--------------------------------------------------------


From lstein at cshl.edu  Tue Jul  3 14:41:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 3 Jul 2007 10:40:26 -0401
Subject: [Bioperl-l] Problem with GD.pm version 2.35
In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com>
Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com>

This happens when there is a mismatch between the compiled (.so) portion of
GD and the perl (.pm) version. Typically it occurs when you have installed
GD incorrectly by, e.g., copying the .pm file into position rather than
using the make file.

Solution: Uninstall old versions of GD by manually finding all occurrences
of GD.so and GD.pm and removing them. Then reinstall the correct way.

Lincoln

On 7/3/07, Edward Wijaya <ewijaya at gmail.com> wrote:
>
> Dear all,
> I was trying to perform check with this command:
>
> $ perl -MGD -e 'print $GD::VERSION';
>
> And it gave:
>
> GD object version 2.32 does not match $GD::VERSION 2.35 at
> /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253.
> Compilation failed in require.
> BEGIN failed--compilation aborted.
>
> Similarly my script that uses GD.pm doesn't execute.
>
>
> I have installed the latest version of libgd version 2.0.35 downloaded
> from
> http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29
>
> Can anybody suggest how can I resolve my problem?
>
> This is my Perl version:
> This is perl, v5.8.8 built for i386-linux-thread-multi
>
> --
> Edward
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Jul  4 05:45:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 00:45:16 -0500
Subject: [Bioperl-l] genbank2gff3 - Name attribute?
Message-ID: <C790FCC2-81E5-4BB4-A9CB-E2E59E5ABE27@uiuc.edu>

I noticed that genbank2gff3.pl doesn't have an explicitly defined way  
of converting the gene/locus/etc name to a Name tag (for, say,  
GBrowse).  Any particular reason?

Should I stick with GFF2 for now?

chris


From bix at sendu.me.uk  Wed Jul  4 10:00:31 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 04 Jul 2007 11:00:31 +0100
Subject: [Bioperl-l] Splitting Bioperl
Message-ID: <468B6FBF.1070708@sendu.me.uk>

To summarise some previous threads:
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409

# Bioperl is currently one monolithic distribution of ~900 modules
# There is some desire to split it up into smaller functional groups
# There are some problems with that proposal
# An extreme variant of that proposal is to make the groups individual 
modules


Following this discussion:
http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
(especially Adam Kennedy's postings of 4/07, soon to appear in that 
archive), the extreme variant doesn't seem like a good idea.


I'm now suggesting that Steve's original split idea, as 
modified/expanded by Adam's driver and other ideas, is the best choice. 
The problems I previously identified can be solved in the same way they 
were solved in my extreme variant: the splits are done by Build.PL 
automation working on a single repository/code-base, not by splitting 
things up at the repository level.


As I see it, the way forward now is for someone interested enough to 
decide on the specifics of how things will be split and offer them up to 
the group for discussion. I don't mean vague possibilities of what might 
work as a split, but rather some real thought should go into it to make 
sure the split makes sense and will actually work in practice.

Following that, the splits can be implemented by some automated dist 
action of Build.PL.


If there isn't sufficient interest to make this happen, I don't see that 
as a terrible thing. There are benefits to keeping Bioperl monolithic, 
and some of the problems (eg. lack of updates) can be solved without 
changing its nature.


From cjfields at uiuc.edu  Wed Jul  4 14:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 09:53:45 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <468B6FBF.1070708@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>


On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote:

> To summarise some previous threads:
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ 
> focus=15409
>
> # Bioperl is currently one monolithic distribution of ~900 modules
> # There is some desire to split it up into smaller functional groups
> # There are some problems with that proposal
> # An extreme variant of that proposal is to make the groups individual
> modules
>
>
> Following this discussion:
> http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html
> (especially Adam Kennedy's postings of 4/07, soon to appear in that
> archive), the extreme variant doesn't seem like a good idea.

brian d foy made some sound arguments against it as well.

> I'm now suggesting that Steve's original split idea, as
> modified/expanded by Adam's driver and other ideas, is the best  
> choice.
> The problems I previously identified can be solved in the same way  
> they
> were solved in my extreme variant: the splits are done by Build.PL
> automation working on a single repository/code-base, not by splitting
> things up at the repository level.
>
> As I see it, the way forward now is for someone interested enough to
> decide on the specifics of how things will be split and offer them  
> up to
> the group for discussion. I don't mean vague possibilities of what  
> might
> work as a split, but rather some real thought should go into it to  
> make
> sure the split makes sense and will actually work in practice.

We've already identified a few (SearchIO, Tools, GBrowse-related, etc).
...
> If there isn't sufficient interest to make this happen, I don't see  
> that
> as a terrible thing. There are benefits to keeping Bioperl monolithic,
> and some of the problems (eg. lack of updates) can be solved without
> changing its nature.

If so, proposals that solve this problem need to be made as well.

If we stay monolithic, then here's mine: we start having fixed,  
regularly timed dev releases like Parrot, monthly or bimonthly (quite  
common on CPAN), with brief release reports on which bugs have been  
fixed, code has been added, so on.  Not every bug has to be fixed per  
dev release; if that were true there would never be releases for some  
of the XML parser packages.  No RCs for dev releases (it's a dev  
release!).  These would be 1.x.y.  We can then, every once in a  
while, have a bug-squashing session, hackathon, etc, and have regular  
non-dev release (1.x) that all core devs accept and that passes a  
particular milestone.

As for the advantage of a split approach, as mentioned previously it  
is to focus modules/tests/scripts into groups with related  
functions.  Even just splitting off ones with external reqs (XML  
parsers, GD, etc) into an 'aux' release would be an advantage, as it  
doesn't confront a new user with the burden of installing a large  
list of dependencies, some of which may be complicated for a perl  
newbie to either install from scratch (DBD::mysql, GD) or to get the  
latest bug-fixed prereq release for their OS (the recent debacle with  
XML::SAX::Expat issues come to mind, which wasn't immediately  
available for win32 as a PPM).

I'm fairly open to any approach as long as it's reasonably though  
out, though I am admittedly a bit biased towards the split approach.   
I do think some change is in order; I worry about there ever being a  
1.6 release at this point.

chris


From davila at ioc.fiocruz.br  Wed Jul  4 17:11:20 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Wed, 04 Jul 2007 14:11:20 -0300
Subject: [Bioperl-l] ESTs in EST format
Message-ID: <468BD4B8.5050105@ioc.fiocruz.br>

Dear All,

I am trying to get all ESTs from a given species (eg: Trypanosoma 
brucei) from Genbank in EST format (eg: 
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... 
while using Entrez I can "display" individual EST entries in EST format, 
this "EST format" is not an option in the main "display" menu for batch 
download ...

I dont see the EST format listed 
(http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO 
deal with, so wonder there would another BioPerl module to do this ? any 
tips, would be greatly appreciated ;-)

Kindest regards, Alberto


From jason at bioperl.org  Wed Jul  4 17:52:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 10:52:59 -0700
Subject: [Bioperl-l] ESTs in EST format
In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br>
References: <468BD4B8.5050105@ioc.fiocruz.br>
Message-ID: <D0D013CC-1D28-46D6-A94F-EA53C7EC5219@bioperl.org>

Currently we don't support this format as far as I know it isn't a  
published standard nor is it a format that you NCBI distributes this  
data in flat format for (i.e. genbank dumps).

Is there any reason why you can't get what you need from the GenBank  
format?
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb

-jason
On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote:

> Dear All,
>
> I am trying to get all ESTs from a given species (eg: Trypanosoma
> brucei) from Genbank in EST format (eg:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucest&id=10280980)...
> while using Entrez I can "display" individual EST entries in EST  
> format,
> this "EST format" is not an option in the main "display" menu for  
> batch
> download ...
>
> I dont see the EST format listed
> (http://www.bioperl.org/wiki/Sequence_formats) among the ones that  
> SeqIO
> deal with, so wonder there would another BioPerl module to do  
> this ? any
> tips, would be greatly appreciated ;-)
>
> Kindest regards, Alberto
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Wed Jul  4 18:37:22 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Jul 2007 13:37:22 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>


On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:

>  we start having fixed,
> regularly timed dev releases like Parrot, monthly or bimonthly (quite
> common on CPAN), with brief release reports on which bugs have been
> fixed, code has been added, so on.  Not every bug has to be fixed per
> dev release; if that were true there would never be releases for some
> of the XML parser packages.  No RCs for dev releases (it's a dev
> release!).  These would be 1.x.y.  We can then, every once in a
> while, have a bug-squashing session, hackathon, etc, and have regular
> non-dev release (1.x) that all core devs accept and that passes a
> particular milestone.


Regardless of whether we split or don't, I think these ideas of  
adding a little more structure to BioPerl's development cycles --  
especially having bug-squashing and hacking sessions, where we all  
band together and commit some time to cranking through a bunch of to- 
dos -- would be beneficial, particularly as a means to keeping a  
certain basal level of momentum in BioPerl.

Dave


From jason at bioperl.org  Wed Jul  4 19:45:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 4 Jul 2007 12:45:29 -0700
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
Message-ID: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>

I definitely agree - we can live up to the unstable "living on the  
edge" nature of dev releases a bit more perhaps?


On Jul 4, 2007, at 11:37 AM, David Messina wrote:

>
> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>
>>  we start having fixed,
>> regularly timed dev releases like Parrot, monthly or bimonthly (quite
>> common on CPAN), with brief release reports on which bugs have been
>> fixed, code has been added, so on.  Not every bug has to be fixed per
>> dev release; if that were true there would never be releases for some
>> of the XML parser packages.  No RCs for dev releases (it's a dev
>> release!).  These would be 1.x.y.  We can then, every once in a
>> while, have a bug-squashing session, hackathon, etc, and have regular
>> non-dev release (1.x) that all core devs accept and that passes a
>> particular milestone.
>
>
> Regardless of whether we split or don't, I think these ideas of
> adding a little more structure to BioPerl's development cycles --
> especially having bug-squashing and hacking sessions, where we all
> band together and commit some time to cranking through a bunch of to-
> dos -- would be beneficial, particularly as a means to keeping a
> certain basal level of momentum in BioPerl.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Wed Jul  4 20:54:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Jul 2007 15:54:14 -0500
Subject: [Bioperl-l] Splitting Bioperl
In-Reply-To: <B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
Message-ID: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>

I think what's partially responsible for slowing down releases is the  
expectation that each dev release is supposed to have all bugs fixed,  
work for every OS, etc.  In other words, act like a stable release.

A developer release by nature is living on the edge, so why not have  
regular dev releases?  We keep telling users to update to using  
bioperl-live whenever something breaks, anyway.  We could decide to  
split stuff off along the way into more 'stable' sections if there  
were more demand for it, and have the more API-volatile code  
(DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
'dev' tag until we feel it's ready for prime time.

chris

On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:

> I definitely agree - we can live up to the unstable "living on the
> edge" nature of dev releases a bit more perhaps?
>
>
> On Jul 4, 2007, at 11:37 AM, David Messina wrote:
>
>>
>> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote:
>>
>>>  we start having fixed,
>>> regularly timed dev releases like Parrot, monthly or bimonthly  
>>> (quite
>>> common on CPAN), with brief release reports on which bugs have been
>>> fixed, code has been added, so on.  Not every bug has to be fixed  
>>> per
>>> dev release; if that were true there would never be releases for  
>>> some
>>> of the XML parser packages.  No RCs for dev releases (it's a dev
>>> release!).  These would be 1.x.y.  We can then, every once in a
>>> while, have a bug-squashing session, hackathon, etc, and have  
>>> regular
>>> non-dev release (1.x) that all core devs accept and that passes a
>>> particular milestone.
>>
>>
>> Regardless of whether we split or don't, I think these ideas of
>> adding a little more structure to BioPerl's development cycles --
>> especially having bug-squashing and hacking sessions, where we all
>> band together and commit some time to cranking through a bunch of to-
>> dos -- would be beneficial, particularly as a means to keeping a
>> certain basal level of momentum in BioPerl.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Thu Jul  5 08:09:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 09:09:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
Message-ID: <468CA721.4020804@sheffield.ac.uk>

Chris Fields wrote:
> I think what's partially responsible for slowing down releases is the  
> expectation that each dev release is supposed to have all bugs fixed,  
> work for every OS, etc.  In other words, act like a stable release.
>
> A developer release by nature is living on the edge, so why not have  
> regular dev releases?  We keep telling users to update to using  
> bioperl-live whenever something breaks, anyway.  We could decide to  
> split stuff off along the way into more 'stable' sections if there  
> were more demand for it, and have the more API-volatile code  
> (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the  
> 'dev' tag until we feel it's ready for prime time.
>
> chris
>
> On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote:
>
>   
-- snip --

I agree, although would the dev releases still need to pass all the 
tests? I'm thinking of people installing via CPAN.

I also agree with what was said in a previous post about bringing back 
bioperl-run (and some others) back into the same repository as 
bioperl-core (after a successful move over to svn) and have Build.PL 
deal with creating the packages etc for CPAN. This would hopefully help 
keep the run package (and others) up to speed with the core package.

I also agree with previous posts about organising and/or having some 
naming convention for test data files. I think an approach whereby data 
files were organised into directory trees (1 - 3 deep) with names that 
elude to the type of data in that subtree/file rather than the tests 
that use it etc. For example:

t/data
    |__ formats
    |           |__ seq
    |           |        |__ legal_fasta
    |           |        |              |__ extension.fas
    |           |        |              |__ extension.fasta
    |           |        |              |__ extension.foo
    |           |        |              |__ extension.bar
    |           |        |              |__ no_extension
    |           |        |              |__ interleaved.fas
    |           |        |              |__ non_interleaved.fas
    |           |        |              |__ single_seq.fas
    |           |        |              |__ multiple_seq.fas
    |           |        |              |__ desc_line1.fas
    |           |        |              |__ desc_line2.fas
    |           |        |
    |           |        |__ illegal_fasta
    |           |        |              |__ illegal_chars.fas
    |           |        |              |__ 
some_other_illegal_alternative.fas
    |           |        |
    |           |        |__ legal_genbank
    |           |        |              |__ etc etc
    |           |        |
    |           |        |__ illegal_genank
    |           |                      |__ etc etc
    |           |
    |           |__ aln
    |           |__ blast
    |           |        |__ legal_blastx
    |           |        |
    |           |        |__ legal_blastp
    |           |        |
    |           |        |__ legal_tblastx
    |           |        |
    |           |        |__ legal_plastpsi
    |           |        |
    |           |        |__ legal_wublast
    |           |__ foo
    |           |__ bar
    |           |__ misc
    |
    |__ etc

This type of setup, might lend itself to having a test script simply try 
to parse all the files in a directory to ensure nothing fails (for legal 
file formats) and fails for illegal formats. Naming of the file paths 
would help test authors to identify a suitable data file for their own 
tests before adding their own to the t/data dir. It might also help to 
identify areas where example test data is currently lacking.

Thinking about this a little more, I think it would be a good idea to 
include Test::Exception in t/lib. We should also be testing that 
warnings and exceptions are generated when expected - e.g. illegal 
characters in seq files etc etc. Without these sorts of tests we are 
only getting half the story. This testing might account for a large 
chunk of the poor test coverage, particularly when it comes to branches 
in the code.

Anyway, this type of reorganisation couldn't take place until the svn 
repo is up and working.

I'd appreciate any comments on the above!
Nath


From bix at sendu.me.uk  Thu Jul  5 08:55:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 09:55:25 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <468CB1FD.7060301@sendu.me.uk>

Nathan S. Haigh wrote:
> I agree, although would the dev releases still need to pass all the 
> tests? I'm thinking of people installing via CPAN.

Yes, they'd all have to pass. 'Developer release' should never have the 
connotation of 'broken release'. However, getting all tests to pass is a 
lot easier than fixing all bugs in bugzilla.

(... which actually goes to show how poor our tests are)

Worst case, if we were forced to stick to a schedule but couldn't fix a 
failing test, we could always make it a 'todo' test.


> I also agree with what was said in a previous post about bringing back 
> bioperl-run (and some others) back into the same repository as 
> bioperl-core (after a successful move over to svn)

Agree (with myself essentially).


> I also agree with previous posts about organising and/or having some 
> naming convention for test data files. I think an approach whereby data 
> files were organised into directory trees (1 - 3 deep) with names that 
> elude to the type of data in that subtree/file rather than the tests 
> that use it etc. For example:
> 
> t/data
>     |__ formats
>     |           |__ seq
>     |           |        |__ legal_fasta
>     |           |        |              |__ extension.fas
[snip]

At that level, files don't need extensions and can have fully 
informative names that explain what's interesting or special about them.


> This type of setup, might lend itself to having a test script simply try 
> to parse all the files in a directory to ensure nothing fails (for legal 
> file formats) and fails for illegal formats.

Great idea.


> Thinking about this a little more, I think it would be a good idea to 
> include Test::Exception in t/lib.

Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.


> Anyway, this type of reorganisation couldn't take place until the svn 
> repo is up and working.

Agree.


From bix at sendu.me.uk  Thu Jul  5 09:39:10 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 10:39:10 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>
	<468CB1FD.7060301@sendu.me.uk>
Message-ID: <468CBC3E.1020408@sendu.me.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Thinking about this a little more, I think it would be a good idea to 
>> include Test::Exception in t/lib.
> 
> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.

I've now done that: BioperlTest loads Test::Exception, from the copy in 
t/lib if necessary.

So, in BioperlTest-using scripts you now have access to the methods 
dies_ok, lives_ok, throws_ok and lives_and.


From N.Haigh at sheffield.ac.uk  Thu Jul  5 10:01:04 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 11:01:04 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CB1FD.7060301@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk>

Quoting Sendu Bala <bix at sendu.me.uk>:

-- snip --
> 
> 
> > I also agree with previous posts about organising and/or having some 
> > naming convention for test data files. I think an approach whereby data 
> > files were organised into directory trees (1 - 3 deep) with names that 
> > elude to the type of data in that subtree/file rather than the tests 
> > that use it etc. For example:
> > 
> > t/data
> >     |__ formats
> >     |           |__ seq
> >     |           |        |__ legal_fasta
> >     |           |        |              |__ extension.fas
> [snip]
> 
> At that level, files don't need extensions and can have fully 
> informative names that explain what's interesting or special about them.
> 

You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside
the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to
check that the peek inside the file correctly determines the format.

-- snip --


From bix at sendu.me.uk  Thu Jul  5 10:04:16 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:04:16 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
Message-ID: <468CC220.804@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Sendu Bala <bix at sendu.me.uk>:
> 
> -- snip --
>> 
>>> I also agree with previous posts about organising and/or having
>>> some naming convention for test data files. I think an approach
>>> whereby data files were organised into directory trees (1 - 3
>>> deep) with names that elude to the type of data in that
>>> subtree/file rather than the tests that use it etc. For example:
>>> 
>>> t/data |__ formats |           |__ seq |           |        |__
>>> legal_fasta |           |        |              |__ extension.fas
>>> 
>> [snip]
>> 
>> At that level, files don't need extensions and can have fully 
>> informative names that explain what's interesting or special about
>> them.
>> 
> 
> You may be correct in most cases, however, isn't there a method for
> detecting the file format from the file extension and failing that it
> peeks inside the file? Therefore there should be a file extension for
> each of these to get good code coverage as well as each format not
> having an extension to check that the peek inside the file correctly
> determines the format.

Yes, you're quite correct.


From bix at sendu.me.uk  Thu Jul  5 10:47:12 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 11:47:12 +0100
Subject: [Bioperl-l] Warnings
Message-ID: <468CCC30.90406@sendu.me.uk>

I'm trying to get Test::Warn to work with Bioperl warnings as produced 
by Bio::Root::RootI::warn(). However, afaict the warnings must be 
generated with CORE::warn(), not print STDERR.

Is there any particular reason RootI::warn is done with print and not 
CORE::warn ? Can I change it to a warn?


From bix at sendu.me.uk  Thu Jul  5 13:04:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:04:50 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
Message-ID: <468CEC72.4090909@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> My guess is that using 'print STDERR' avoids showing sometimes annoying 
>    errordescription  at programname line  NN
> syntax being used.

Afaik,

CORE::warn "anything\n";

never includes the line number: messages with a new line always disable 
that feature. Bio::Root::RootI::warn /always/ puts new lines into the 
message, so they /never/ have the line number.


> On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
> objects is to find where warnings are coming from. Maybe extra text in 
> warnings leads to easier debugging.
> 
> I favour changing it.

So its my understanding there will be absolutely no difference in 
behaviour following this change (except that warning can be caught by 
Test::Warn). I just wanted to confirm my understanding.


From hlapp at gmx.net  Thu Jul  5 13:07:27 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Jul 2007 09:07:27 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>


On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:

> Chris Fields wrote:
>> I think what's partially responsible for slowing down releases is the
>> expectation that each dev release is supposed to have all bugs fixed,
>> work for every OS, etc.  In other words, act like a stable release.
>>

It doesn't. A stable release has a stable API that will be supported  
until the next stable release through point releases.

>> A developer release by nature is living on the edge, so why not have
>> regular dev releases?

There's no problem with regular dev releases, but tests will need to  
pass. There was never a stipulation that all bugs need to have been  
fixed. But all tests need to pass, so in an ideal world (in which  
everything is being tested) all tests passing would imply all (known)  
bugs fixed. Obviously, we don't live in an ideal world ...

If not everything passes then what is the big difference to a code  
snapshot? If using cvs (or svn) is too difficult for most people, we  
can consider creating a mechanism that puts up nightly snapshots for  
download.

> -- snip --
>
> I agree, although would the dev releases still need to pass all the
> tests? I'm thinking of people installing via CPAN.

For example, that's another point.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From heikki at sanbi.ac.za  Thu Jul  5 13:12:37 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 15:12:37 +0200
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <200707051512.38185.heikki@sanbi.ac.za>


One more suggestion:

It would be extemaly useful if we had a standard way of testing that a when a 
file is read into a bioperl object and then written out again into a same 
format, the input and output files are identical. If not, the test should 
show where the the differences start (showing all the differences would just 
clutter the screen).

This standard method/subroutine should be used to test all sequence and other 
text file IO.

Any takers? 

	-Heikki

On Thursday 05 July 2007 11:39:10 Sendu Bala wrote:
> Sendu Bala wrote:
> > Nathan S. Haigh wrote:
> >> Thinking about this a little more, I think it would be a good idea to
> >> include Test::Exception in t/lib.
> >
> > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
>
> I've now done that: BioperlTest loads Test::Exception, from the copy in
> t/lib if necessary.
>
> So, in BioperlTest-using scripts you now have access to the methods
> dies_ok, lives_ok, throws_ok and lives_and.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Jul  5 12:58:59 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 5 Jul 2007 14:58:59 +0200
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CCC30.90406@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
Message-ID: <200707051458.59921.heikki@sanbi.ac.za>

My guess is that using 'print STDERR' avoids showing sometimes annoying 
   errordescription  at programname line  NN
syntax being used.

On the other hand, the main reason we need to set verbosity to 1 in BioPerl 
objects is to find where warnings are coming from. Maybe extra text in 
warnings leads to easier debugging.

I favour changing it.

	-Heikki


On Thursday 05 July 2007 12:47:12 Sendu Bala wrote:
> I'm trying to get Test::Warn to work with Bioperl warnings as produced
> by Bio::Root::RootI::warn(). However, afaict the warnings must be
> generated with CORE::warn(), not print STDERR.
>
> Is there any particular reason RootI::warn is done with print and not
> CORE::warn ? Can I change it to a warn?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Jul  5 13:44:08 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 14:44:08 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF5A8.7040402@sendu.me.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that
> a when a file is read into a bioperl object and then written out
> again into a same format, the input and output files are identical.

As Hilmar has pointed out in the past, Bioperl doesn't aim for the files 
to be identical, only for none of the information to be lost and to be 
ouput in the correct format.

So a round-trip test should read in the original, store all the parsed 
data, write it out, then read in the written version and see if the new 
parsed data matches the original.


For simpler or ultra-strict file formats, though...

> If not, the test should show where the the differences start (showing
> all the differences would just clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence
> and other text file IO.
> 
> Any takers?

There's already something along these lines in t/SeqIO.t (the section
that uses Algorithm::Diff).

I copied that over from the old testformats.pl script but haven't really
taken the time to see if its a good way of doing the test.

Is it? Can someone come up with something better? Can someone generalise
it if necessary?

I imagine you could just read the files into arrays and use 
Test::More::is_deeply(). If that would be satisfactory I could easily 
add a little method to BioperlTest that did that.


From n.haigh at sheffield.ac.uk  Thu Jul  5 13:47:24 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 14:47:24 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk>
	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <468CF66C.2070907@sheffield.ac.uk>

Heikki Lehvaslaiho wrote:
> One more suggestion:
> 
> It would be extemaly useful if we had a standard way of testing that a when a 
> file is read into a bioperl object and then written out again into a same 
> format, the input and output files are identical. If not, the test should 
> show where the the differences start (showing all the differences would just 
> clutter the screen).
> 
> This standard method/subroutine should be used to test all sequence and other 
> text file IO.
> 
> Any takers? 
> 
> 	-Heikki
> 

Wouldn't this require info about the formatting of the file to be stored 
in the object as well, such that the same formatting could be used when 
writing the file?

Wouldn't a better approach be to read the contents of file1 into ojb1, 
write obj1 to file2 in the same format, and then read file2 into obj2 
and compare obj1 to obj2 to ensure we have all the same data.

Nath


From cjfields at uiuc.edu  Thu Jul  5 13:52:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 08:52:12 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CA721.4020804@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
Message-ID: <BECE91CB-980B-4063-8E85-291CC85DCDC1@uiuc.edu>


On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote:

> ...
> I agree, although would the dev releases still need to pass all the  
> tests? I'm thinking of people installing via CPAN.

Remains to be decided.  All current tests (net and non-non) should  
pass.  Any bug fixes should try to have added tests if possible, with  
in-process stuff as TODO's.  Network tests are left up to user  
discretion, so if they fail for any particular reason there is a way  
around them.

> I also agree with what was said in a previous post about bringing  
> back bioperl-run (and some others) back into the same repository as  
> bioperl-core (after a successful move over to svn) and have  
> Build.PL deal with creating the packages etc for CPAN. This would  
> hopefully help keep the run package (and others) up to speed with  
> the core package.

It's up to how we want to have everything split.  I don't think it's  
immediately prescient (there are more important priorities, i.e.  
bugs, svn) but I would say folding everything back into live and  
'splitting' them out using an automated Build process is a viable  
option.

> I also agree with previous posts about organising and/or having  
> some naming convention for test data files. I think an approach  
> whereby data files were organised into directory trees (1 - 3 deep)  
> with names that elude to the type of data in that subtree/file  
> rather than the tests that use it etc. For example:
>
> t/data
>    |__ formats
>    |           |__ seq
>    |           |        |__ legal_fasta
>    |           |        |              |__ extension.fas
>    |           |        |              |__ extension.fasta
>    |           |        |              |__ extension.foo
>    |           |        |              |__ extension.bar
>    |           |        |              |__ no_extension
>    |           |        |              |__ interleaved.fas
>    |           |        |              |__ non_interleaved.fas
>    |           |        |              |__ single_seq.fas
>    |           |        |              |__ multiple_seq.fas
>    |           |        |              |__ desc_line1.fas
>    |           |        |              |__ desc_line2.fas
>    |           |        |
>    |           |        |__ illegal_fasta
>    |           |        |              |__ illegal_chars.fas
>    |           |        |              |__  
> some_other_illegal_alternative.fas
>    |           |        |
>    |           |        |__ legal_genbank
>    |           |        |              |__ etc etc
>    |           |        |
>    |           |        |__ illegal_genank
>    |           |                      |__ etc etc
>    |           |
>    |           |__ aln
>    |           |__ blast
>    |           |        |__ legal_blastx
>    |           |        |
>    |           |        |__ legal_blastp
>    |           |        |
>    |           |        |__ legal_tblastx
>    |           |        |
>    |           |        |__ legal_plastpsi
>    |           |        |
>    |           |        |__ legal_wublast
>    |           |__ foo
>    |           |__ bar
>    |           |__ misc
>    |
>    |__ etc
>
> This type of setup, might lend itself to having a test script  
> simply try to parse all the files in a directory to ensure nothing  
> fails (for legal file formats) and fails for illegal formats.  
> Naming of the file paths would help test authors to identify a  
> suitable data file for their own tests before adding their own to  
> the t/data dir. It might also help to identify areas where example  
> test data is currently lacking.

...
This seems like more of a 'guess sequence' and format validation  
issue, something we've talked about before:

http://bugzilla.open-bio.org/show_bug.cgi?id=1508

The way I feel about it is sequence format validation and sequence  
parsing should be separate issues and therefore in separate classes  
(with parsing optionally preceded by validation), but that's  
something for another discussion.

> Thinking about this a little more, I think it would be a good idea  
> to include Test::Exception in t/lib. We should also be testing that  
> warnings and exceptions are generated when expected - e.g. illegal  
> characters in seq files etc etc. Without these sorts of tests we  
> are only getting half the story. This testing might account for a  
> large chunk of the poor test coverage, particularly when it comes  
> to branches in the code.
>
> Anyway, this type of reorganisation couldn't take place until the  
> svn repo is up and working.
>
> I'd appreciate any comments on the above!
> Nath

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 14:08:29 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:08:29 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CF5A8.7040402@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk>
Message-ID: <468CFB5D.6080406@sheffield.ac.uk>

Is there a way to install all the modules that are used in the tests? I 
mean there are cases where tests are skipped and pass if the required 
module for testing is not installed. Therefore, missing out a chunk of 
the tests. It would be desirable to be able to install all these modules 
in order to complete they whole test suite - any ideas if/how this can 
be done?

Cheers
Nath


From bix at sendu.me.uk  Thu Jul  5 14:15:34 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 15:15:34 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <468CFD06.3080604@sendu.me.uk>

Nathan S. Haigh wrote:
> Is there a way to install all the modules that are used in the tests? I 
> mean there are cases where tests are skipped and pass if the required 
> module for testing is not installed. Therefore, missing out a chunk of 
> the tests. It would be desirable to be able to install all these modules 
> in order to complete they whole test suite - any ideas if/how this can 
> be done?

Yes, add them as recommended (or perhaps 'build_requires') modules in 
Build.PL, then run Build.PL and install the modules when it asks you.

Everything should be in Build.PL already. If I missed something, please 
add it.


From cjfields at uiuc.edu  Thu Jul  5 14:18:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:08 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
Message-ID: <C3B6AF09-B395-4303-9B50-953C0FAAE8A7@uiuc.edu>


On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote:

> Is there a way to install all the modules that are used in the  
> tests? I
> mean there are cases where tests are skipped and pass if the required
> module for testing is not installed. Therefore, missing out a chunk of
> the tests. It would be desirable to be able to install all these  
> modules
> in order to complete they whole test suite - any ideas if/how this can
> be done?
>
> Cheers
> Nath

That's optionally done upon 'perl Build.PL', correct?  So if you  
choose not to install a particular prereq (i.e. XML::SAX), you  
shouldn't be forced to install it later just for tests.  Or am I  
misunderstanding you?

chris


From cjfields at uiuc.edu  Thu Jul  5 14:18:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:18:23 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468CC220.804@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
Message-ID: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>


On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:

> Nathan S. Haigh wrote:
>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>> ...<snip snips>
>>> At that level, files don't need extensions and can have fully
>>> informative names that explain what's interesting or special about
>>> them.
>>>
>>
>> You may be correct in most cases, however, isn't there a method for
>> detecting the file format from the file extension and failing that it
>> peeks inside the file? Therefore there should be a file extension for
>> each of these to get good code coverage as well as each format not
>> having an extension to check that the peek inside the file correctly
>> determines the format.
>
> Yes, you're quite correct.

I actually like Sendu's idea more, or the idea of each test suite  
having it's own directory.

Tests which need to guess/validate the format are probably best left  
sequestered to a specific suite focused on format guessing/ 
validation, at least in my opinion.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 14:22:40 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:22:40 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFD06.3080604@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk>
Message-ID: <468CFEB0.80201@sheffield.ac.uk>

Sendu Bala wrote:
> Nathan S. Haigh wrote:
>> Is there a way to install all the modules that are used in the tests? 
>> I mean there are cases where tests are skipped and pass if the 
>> required module for testing is not installed. Therefore, missing out a 
>> chunk of the tests. It would be desirable to be able to install all 
>> these modules in order to complete they whole test suite - any ideas 
>> if/how this can be done?
> 
> Yes, add them as recommended (or perhaps 'build_requires') modules in 
> Build.PL, then run Build.PL and install the modules when it asks you.
> 
> Everything should be in Build.PL already. If I missed something, please 
> add it.
> 

OK, to clarify using the test file Sendu mentioned in a previous post: 
t/SeqIO.t

This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String 
are not installed (the first two are not mentioned in Build.PL). 
However, if there are a lot of such skips in the whole test suite then 
there maybe few system with all these modules installed in order to 
conduct a complete test. These are the modules I'm referring to.

Nath


From n.haigh at sheffield.ac.uk  Thu Jul  5 14:30:05 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:30:05 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
Message-ID: <468D006D.6050806@sheffield.ac.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote:
> 
>> Nathan S. Haigh wrote:
>>> Quoting Sendu Bala <bix at sendu.me.uk>:
>>>> ...<snip snips>
>>>> At that level, files don't need extensions and can have fully
>>>> informative names that explain what's interesting or special about
>>>> them.
>>>>
>>>
>>> You may be correct in most cases, however, isn't there a method for
>>> detecting the file format from the file extension and failing that it
>>> peeks inside the file? Therefore there should be a file extension for
>>> each of these to get good code coverage as well as each format not
>>> having an extension to check that the peek inside the file correctly
>>> determines the format.
>>
>> Yes, you're quite correct.
> 
> I actually like Sendu's idea more, or the idea of each test suite having 
> it's own directory.
> 
> Tests which need to guess/validate the format are probably best left 
> sequestered to a specific suite focused on format guessing/validation, 
> at least in my opinion.
> 
> chris


How easily would this lend itself to using the same data for multiple 
tests, or is it likely to lead to/exacerbate a culture of adding 
duplicate data files in each "test suite" rather than reusing?

Nath


From cjfields at uiuc.edu  Thu Jul  5 14:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:33:46 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
Message-ID: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>


On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote:

> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote:
>
>> Chris Fields wrote:
>>> I think what's partially responsible for slowing down releases is  
>>> the
>>> expectation that each dev release is supposed to have all bugs  
>>> fixed,
>>> work for every OS, etc.  In other words, act like a stable release.
>
> It doesn't. A stable release has a stable API that will be  
> supported until the next stable release through point releases.

I agree, but I think there is still an expectation that 1.5.2 and  
beyond are more like true 'stable' releases even though we still  
designate them as 'developer.'   We unfortunately reinforce that when  
we tell users they need to update to v. 1.5.2 or bioperl-live to fix  
a particular bug in the 1.4 release.

There's nothing we can do about that now (hindsight is always 20/20,  
and 1.4 is just too old).  We (pumpkin, core devs) can try correcting  
that by ensuring any bug fixes be committed to any new stable branch  
as well as to live, at least until it becomes too problematic to  
maintain that particular stable branch (at which point we would go  
about getting ready for the next 'stable' and repeat the cycle over  
again).

>>> A developer release by nature is living on the edge, so why not have
>>> regular dev releases?
>
> There's no problem with regular dev releases, but tests will need  
> to pass. There was never a stipulation that all bugs need to have  
> been fixed. But all tests need to pass, so in an ideal world (in  
> which everything is being tested) all tests passing would imply all  
> (known) bugs fixed. Obviously, we don't live in an ideal world ...

...particularly when it comes to network-related tests and remote  
server problems (but those are by default not run, so there is a way  
around test fails there).  I agree here as well (all tests must  
pass).  As for the bug fixes, we can just stipulate which ones were  
fixed with the release (in a RELEASE_NOTES or similar), and maybe  
have TODO's in the test suite designating they are being worked on.

Basically, at regular intervals, maybe with a few weeks of lead time,  
the pumpkin would announce an impending dev. release.  Go through  
rounds of tests, bug fixes, etc.  When all tests pass post it on CPAN  
as a dev. release.  If we have a stable release branch with relevant  
bug fixes we can post that as well, again to the point where it  
becomes too problematic.

Would we just take a snapshot of MAIN and any relevant stable branch  
at that particular point for the CPAN release, just increasing the  
version number (1.x.y)?  Would it make sense to have a 1.x.y branch  
for each release (I don't think so, but maybe others disagree)?

> If not everything passes then what is the big difference to a code  
> snapshot? If using cvs (or svn) is too difficult for most people,  
> we can consider creating a mechanism that puts up nightly snapshots  
> for download.

If we feel a nightly snapshot is warranted we could do that though.   
I personally don't think there is a need, particularly since we have  
several means to obtain the latest code at any point in time  
(including the browsable CVS 'Download tarball').  We could state the  
next dev/stable CPAN release (pending on date dd/mm/yy) will have the  
bug fix, and if they want it immediately then pick it up from CVS.

>> -- snip --
>>
>> I agree, although would the dev releases still need to pass all the
>> tests? I'm thinking of people installing via CPAN.
>
> For example, that's another point.
>
>  	-hilmar

Yes, I agree.

As an aside, I don't think dev. releases pop up when you run a simple  
'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may  
know the answer to that.

chris 


From cjfields at uiuc.edu  Thu Jul  5 14:34:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:34:22 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>


On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:

>
> One more suggestion:
>
> It would be extemaly useful if we had a standard way of testing  
> that a when a
> file is read into a bioperl object and then written out again into  
> a same
> format, the input and output files are identical. If not, the test  
> should
> show where the the differences start (showing all the differences  
> would just
> clutter the screen).
>
> This standard method/subroutine should be used to test all sequence  
> and other
> text file IO.
>
> Any takers?
>
> 	-Heikki
...

I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
that do some checking, I think, but something like this would be of  
use.  However, what if the test file is old (as many in t/data are)  
and the format has changed?  GenBank and EMBL, for instance, have  
gone through several changes to format.

chris


From n.haigh at sheffield.ac.uk  Thu Jul  5 14:43:51 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu, 05 Jul 2007 15:43:51 +0100
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <468D03A7.3090408@sheffield.ac.uk>

Chris Fields wrote:
-- snip --

>>>
>>> I agree, although would the dev releases still need to pass all the
>>> tests? I'm thinking of people installing via CPAN.
>>
>> For example, that's another point.
>>
>>      -hilmar
> 
> Yes, I agree.
> 
> As an aside, I don't think dev. releases pop up when you run a simple 
> 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know 
> the answer to that.
> 
> chris


Thats right, it'll only install the non-developer releases (1.4 
currently). If you want to install the developer release from CPAN you 
need to know the path the archive and then do:

cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz

as detailed on the wiki:
http://www.bioperl.org/wiki/Release_1.5.2

Nath


From cjfields at uiuc.edu  Thu Jul  5 14:49:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:49:33 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <468CFEB0.80201@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<468CB1FD.7060301@sendu.me.uk>	<468CBC3E.1020408@sendu.me.uk>	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>


On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:

> Sendu Bala wrote:
>> ...
>> Yes, add them as recommended (or perhaps 'build_requires') modules in
>> Build.PL, then run Build.PL and install the modules when it asks you.
>>
>> Everything should be in Build.PL already. If I missed something,  
>> please
>> add it.
>>
>
> OK, to clarify using the test file Sendu mentioned in a previous post:
> t/SeqIO.t
>
> This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> IO::String
> are not installed (the first two are not mentioned in Build.PL).
> However, if there are a lot of such skips in the whole test suite then
> there maybe few system with all these modules installed in order to
> conduct a complete test. These are the modules I'm referring to.
>
> Nath

If they are only necessary for tests, work for all OSs, and are pure  
Perl they should be added to t/lib, like Test::More and the rest.  If  
they only work for some OSs they could be added to t/lib and skip  
based on OS, but they still must be pure Perl.  I would avoid  
anything that requires any compiling for XS or Inline altogether (I  
don't want to go down the nightmare road of OS-dependent compiler  
issues for a few tests).

Finally, if they are needed for core modules (not just tests) then  
they should be added to the core prereqs in Build.

chris


From cjfields at uiuc.edu  Thu Jul  5 14:52:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 09:52:58 -0500
Subject: [Bioperl-l] Warnings
In-Reply-To: <468CEC72.4090909@sendu.me.uk>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>


On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:

> ...
>
> So its my understanding there will be absolutely no difference in
> behaviour following this change (except that warning can be caught by
> Test::Warn). I just wanted to confirm my understanding.

You can always just try it out and run tests.  Might be interesting  
to see if anything breaks.

chris


From N.Haigh at sheffield.ac.uk  Thu Jul  5 14:58:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 15:58:30 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
> 
> >
> > One more suggestion:
> >
> > It would be extemaly useful if we had a standard way of testing  
> > that a when a
> > file is read into a bioperl object and then written out again into  
> > a same
> > format, the input and output files are identical. If not, the test  
> > should
> > show where the the differences start (showing all the differences  
> > would just
> > clutter the screen).
> >
> > This standard method/subroutine should be used to test all sequence  
> > and other
> > text file IO.
> >
> > Any takers?
> >
> > 	-Heikki
> ...
> 
> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t  
> that do some checking, I think, but something like this would be of  
> use.  However, what if the test file is old (as many in t/data are)  
> and the format has changed?  GenBank and EMBL, for instance, have  
> gone through several changes to format.
> 
> chris
> 
> 

Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes?

Nath


From N.Haigh at sheffield.ac.uk  Thu Jul  5 15:04:30 2007
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Thu,  5 Jul 2007 16:04:30 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

> 
> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote:
> 
> > Sendu Bala wrote:
> >> ...
> >> Yes, add them as recommended (or perhaps 'build_requires') modules in
> >> Build.PL, then run Build.PL and install the modules when it asks you.
> >>
> >> Everything should be in Build.PL already. If I missed something,  
> >> please
> >> add it.
> >>
> >
> > OK, to clarify using the test file Sendu mentioned in a previous post:
> > t/SeqIO.t
> >
> > This test skips tests if Algorithm::Diff, IO::ScalarArray or  
> > IO::String
> > are not installed (the first two are not mentioned in Build.PL).
> > However, if there are a lot of such skips in the whole test suite then
> > there maybe few system with all these modules installed in order to
> > conduct a complete test. These are the modules I'm referring to.
> >
> > Nath
> 
> If they are only necessary for tests, work for all OSs, and are pure  
> Perl they should be added to t/lib, like Test::More and the rest.  If  
> they only work for some OSs they could be added to t/lib and skip  
> based on OS, but they still must be pure Perl.  I would avoid  
> anything that requires any compiling for XS or Inline altogether (I  
> don't want to go down the nightmare road of OS-dependent compiler  
> issues for a few tests).

If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!?

> 
> Finally, if they are needed for core modules (not just tests) then  
> they should be added to the core prereqs in Build.
> 
> chris
> 


From bix at sendu.me.uk  Thu Jul  5 15:13:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:13:35 +0100
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <468D0A9F.4010709@sendu.me.uk>

Nathan S. Haigh wrote:
> Quoting Chris Fields <cjfields at uiuc.edu>:
>>> OK, to clarify using the test file Sendu mentioned in a previous
>>> post: t/SeqIO.t
>>> 
>>> This test skips tests if Algorithm::Diff, IO::ScalarArray or 
>>> IO::String are not installed
>> 
>> If they are only necessary for tests, work for all OSs, and are
>> pure Perl they should be added to t/lib, like Test::More and the
>> rest.  If they only work for some OSs they could be added to t/lib
>> and skip based on OS, but they still must be pure Perl.  I would
>> avoid anything that requires any compiling for XS or Inline
>> altogether (I don't want to go down the nightmare road of
>> OS-dependent compiler issues for a few tests).
> 
> If this is the case, there surely is no need to skip the tests if
> they should be provided in the t/lib dir. Am I missing something!?

That skip in SeqIO.t is new and I simply didn't think of them as 
important enough to make anyone install them or include them in t/lib.

I'd go ahead and add those modules, but like I say, it may make more 
sense just to use is_deeply(), removing the dependency on 
Algorithm::Diff and IO::ScalarArray completely.


From cjfields at uiuc.edu  Thu Jul  5 15:35:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:35:41 -0500
Subject: [Bioperl-l] Installing all modules required for testing
In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk>
	<468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk>
	<16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu>
	<1183647870.468d087ed4c80@webmail.shef.ac.uk>
Message-ID: <F97172F8-F59A-4CCD-9BBD-B763675EB92F@uiuc.edu>


On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote:

> ...
>> If they are only necessary for tests, work for all OSs, and are pure
>> Perl they should be added to t/lib, like Test::More and the rest.  If
>> they only work for some OSs they could be added to t/lib and skip
>> based on OS, but they still must be pure Perl.  I would avoid
>> anything that requires any compiling for XS or Inline altogether (I
>> don't want to go down the nightmare road of OS-dependent compiler
>> issues for a few tests).
>
> If this is the case, there surely is no need to skip the tests if  
> they should be provided in the t/lib dir. Am I missing something!?

No, you are correct, but these are currently not in t/lib (unless  
someone snuck them in....)

Of the modules you listed above, only one (IO::String) is required by  
the core modules.  The others are not.  Users shouldn't be forced to  
install Algorithm::Diff or IO::ScalarArray just to run tests, so  
anything not required should go into t/lib if at all possible.

If there any reasons (OS issues, list of prereqs) which preclude  
adding these to t/lib we need to ask ourselves (1) why we are using  
that module in the first place?  And, if there is a good reason, (2)  
can we skip them if they aren't present?  Both of those options are  
already available.

chris


From cjfields at uiuc.edu  Thu Jul  5 15:50:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:50:55 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <468D006D.6050806@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk>
	<1183629664.468cc1609891a@webmail.shef.ac.uk>
	<468CC220.804@sendu.me.uk>
	<D71E1DE2-C737-41C1-90FF-573A66925790@uiuc.edu>
	<468D006D.6050806@sheffield.ac.uk>
Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu>


On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote:

> ...
>> I actually like Sendu's idea more, or the idea of each test suite  
>> having it's own directory.
>> Tests which need to guess/validate the format are probably best  
>> left sequestered to a specific suite focused on format guessing/ 
>> validation, at least in my opinion.
>> chris
>
>
> How easily would this lend itself to using the same data for  
> multiple tests, or is it likely to lead to/exacerbate a culture of  
> adding duplicate data files in each "test suite" rather than reusing?
>
> Nath

If there is a group of test data used for more than one test suite we  
can group those together into a common use folder, or we can go by  
format.  I'm pretty open to anything, really, as long as it is more  
organized.

My point is really concerned more with validation/guessing.  I think  
we should limit those tests to their respective specific test suites,  
or even to sections within a particular test suite (for instance,  
genbank.t), but not to force sequence guessing or validation in other  
cases.  To me validation, guessing, and parsing are three distinct  
issues (much like XML parsers handle things), so they require three  
distinct tests.

As for true sequence validation, there is no official format  
validation scheme yet in BioPerl.  It's sort of unofficially  
intergrated into the sequence parsers themselves (something which I  
find to be problematic for several reasons too long to outline here).

chris


From cjfields at uiuc.edu  Thu Jul  5 15:54:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 10:54:42 -0500
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
	<200707051512.38185.heikki@sanbi.ac.za>
	<2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu>
	<1183647510.468d07168963c@webmail.shef.ac.uk>
Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu>


On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote:

> Quoting Chris Fields <cjfields at uiuc.edu>:
>
>>
>> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote:
>>
>>>
>>> One more suggestion:
>>>
>>> It would be extemaly useful if we had a standard way of testing
>>> that a when a
>>> file is read into a bioperl object and then written out again into
>>> a same
>>> format, the input and output files are identical. If not, the test
>>> should
>>> show where the the differences start (showing all the differences
>>> would just
>>> clutter the screen).
>>>
>>> This standard method/subroutine should be used to test all sequence
>>> and other
>>> text file IO.
>>>
>>> Any takers?
>>>
>>> 	-Heikki
>> ...
>>
>> I agree.  There are some 'round-trip' tests with genbank.t or SeqIO.t
>> that do some checking, I think, but something like this would be of
>> use.  However, what if the test file is old (as many in t/data are)
>> and the format has changed?  GenBank and EMBL, for instance, have
>> gone through several changes to format.
>>
>> chris
>>
>>
>
> Is there any way to distinguish variants apart other than just  
> layout? e.g. a version number of the likes?
>
> Nath

I don't think so; this veers back into the whole validation issue  
(i.e. does the record fit certain specifications).  There are  
examples of seq records from different sources which bioperl is  
expected to parse, for example Ensembl GenBank records.  Some of  
those have feature tags or annotation fields which may not appear in  
output when using write_seq().

I don't think it's as important to replicate the output data exactly  
like the input as much as it's important to have the data represented  
in a Bio::Seq object (or any other Bio* instance) in a consistent  
manner and have the ability to incorporate new fields (such as the  
recent addition of genome projects) transparently.  The latter is  
hard to do with the current genbank parser (you have to specifically  
code for it), but it is a bit easier to do with the driver-handler  
model I'm working on.

chris


From bix at sendu.me.uk  Thu Jul  5 15:56:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:56:29 +0100
Subject: [Bioperl-l] Test related Suggestions
In-Reply-To: <468CBC3E.1020408@sendu.me.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<468CB1FD.7060301@sendu.me.uk>
	<468CBC3E.1020408@sendu.me.uk>
Message-ID: <468D14AD.8050007@sendu.me.uk>

Sendu Bala wrote:
> Sendu Bala wrote:
>> Nathan S. Haigh wrote:
>>> Thinking about this a little more, I think it would be a good idea to 
>>> include Test::Exception in t/lib.
>> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm.
> 
> I've now done that: BioperlTest loads Test::Exception, from the copy in 
> t/lib if necessary.
> 
> So, in BioperlTest-using scripts you now have access to the methods 
> dies_ok, lives_ok, throws_ok and lives_and.

And I've also now added in support for Test::Warn, giving you 
warning_is, warnings_are, warning_like and warnings_like.

I've updated the HOWTO as well:
http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

You can see these things in action in t/seq_quality.t


From bix at sendu.me.uk  Thu Jul  5 15:57:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 05 Jul 2007 16:57:23 +0100
Subject: [Bioperl-l] Warnings
In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
References: <468CCC30.90406@sendu.me.uk>
	<200707051458.59921.heikki@sanbi.ac.za>
	<468CEC72.4090909@sendu.me.uk>
	<2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu>
Message-ID: <468D14E3.6030104@sendu.me.uk>

Chris Fields wrote:
> 
> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote:
> 
>> ...
>>
>> So its my understanding there will be absolutely no difference in
>> behaviour following this change (except that warning can be caught by
>> Test::Warn). I just wanted to confirm my understanding.
> 
> You can always just try it out and run tests.  Might be interesting to 
> see if anything breaks.

I've made the change. Everything seems ok as far as I can tell.


From dmessina at wustl.edu  Thu Jul  5 16:02:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:02:26 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
Message-ID: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>


On Jul 5, 2007, at 9:33 AM, Chris Fields wrote:
> I agree, but I think there is still an expectation that 1.5.2 and
> beyond are more like true 'stable' releases even though we still
> designate them as 'developer.'   We unfortunately reinforce that when
> we tell users they need to update to v. 1.5.2 or bioperl-live to fix
> a particular bug in the 1.4 release.

I know this has been discussed before, but while we're talking about  
future release plans, it might be worth revisiting the BioPerl policy  
of designating only even-numbered releases as 'stable'. It's taking  
so long to get from 1.4 to 1.6. While the principle of keeping a  
stable API between 'stable' releases is valid in the ideal case, I  
think that continuing to label 1.5.2 (or whatever the latest 'dev'  
release is) as a developer release (which implies potentially  
unstable or bleeding-edge code) is highly misleading since we would  
never ever tell anyone to get 1.4 instead.

Alternatively, if we adopt a more aggressive release schedule as  
Chris proposed a couple days ago, then perhaps we could agree to push  
out an even-numbered release once a year or so, so that there is a  
'stable' release we could recommend.


> If we feel a nightly snapshot is warranted we could do that though.
> I personally don't think there is a need, particularly since we have
> several means to obtain the latest code at any point in time
> (including the browsable CVS 'Download tarball').  We could state the
> next dev/stable CPAN release (pending on date dd/mm/yy) will have the
> bug fix, and if they want it immediately then pick it up from CVS.

To make it easier for people to obtain the latest tarball, we could  
put the 'download tarball' link directly on the 'Getting_BioPerl'  
wiki page instead of only a link to the viewcvs interface. That way  
they wouldn't have to navigate the source tree to figure out which  
tarball they want (which is almost always going to be the bioperl- 
live tarball).

I think the actual URL underlying the 'Download tarball' link on  
viewcvs is stable:

	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
live.tar.gz?tarball=1


Dave


From cjfields at uiuc.edu  Thu Jul  5 16:13:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:13:30 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
Message-ID: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>


On Jul 5, 2007, at 11:02 AM, David Messina wrote:

> ...
> I know this has been discussed before, but while we're talking  
> about future release plans, it might be worth revisiting the  
> BioPerl policy of designating only even-numbered releases as  
> 'stable'. It's taking so long to get from 1.4 to 1.6. While the  
> principle of keeping a stable API between 'stable' releases is  
> valid in the ideal case, I think that continuing to label 1.5.2 (or  
> whatever the latest 'dev' release is) as a developer release (which  
> implies potentially unstable or bleeding-edge code) is highly  
> misleading since we would never ever tell anyone to get 1.4 instead.
>
> Alternatively, if we adopt a more aggressive release schedule as  
> Chris proposed a couple days ago, then perhaps we could agree to  
> push out an even-numbered release once a year or so, so that there  
> is a 'stable' release we could recommend.

I think the idea of 'stable' is best summarized back in Hilmar's post  
(i.e. we support a particular API for that release).  The 1.5  
releases I believe break some aspects of 1.4 API (some of the Feature/ 
Annotation stuff introduced before the official 1.5 release).  We  
still need to address some of those issues before a 1.6 which seems  
to be the only real stumbling block, but they are unfortunately not  
well-documented and are somewhat interwoven with GMOD code.

> ...
> To make it easier for people to obtain the latest tarball, we could  
> put the 'download tarball' link directly on the 'Getting_BioPerl'  
> wiki page instead of only a link to the viewcvs interface. That way  
> they wouldn't have to navigate the source tree to figure out which  
> tarball they want (which is almost always going to be the bioperl- 
> live tarball).
>
> I think the actual URL underlying the 'Download tarball' link on  
> viewcvs is stable:
>
> 	http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- 
> live.tar.gz?tarball=1
>
> Dave

Sounds reasonable enough.  Do you want to do the honors?

chris


From dmessina at wustl.edu  Thu Jul  5 16:44:28 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 11:44:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <BF212044-F565-434B-882F-507974566B66@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>


> [Chris]
> The 1.5 releases I believe break some aspects of 1.4 API

Yes, this is true.

I question, though, whether it's relevant given that virtually no one  
uses 1.4 anymore. In any case, I would venture that the number of  
people who would be bitten by the 1.4->1.5 API change is much smaller  
than the number of people who download 1.4 and then ask us why it  
doesn't work.

I think that, rather than continuing to call 1.5.x the developer  
release in order to adhere to the API guarantee, it would be much  
clearer to users if we state clearly that everyone should download  
1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
changes.


>> [me]
>> we could put the 'download tarball' link directly on the  
>> 'Getting_BioPerl' wiki page
>
> [Chris]
> Sounds reasonable enough.  Do you want to do the honors?

Done.


Dave


From cjfields at uiuc.edu  Thu Jul  5 16:57:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 11:57:28 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>

On Jul 5, 2007, at 11:44 AM, David Messina wrote:

>
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no  
> one uses 1.4 anymore. In any case, I would venture that the number  
> of people who would be bitten by the 1.4->1.5 API change is much  
> smaller than the number of people who download 1.4 and then ask us  
> why it doesn't work.
>
> I think that, rather than continuing to call 1.5.x the developer  
> release in order to adhere to the API guarantee, it would be much  
> clearer to users if we state clearly that everyone should download  
> 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API  
> changes.

You'd be surprised how many are still using bioperl 1.2.3 (Ensembl)  
and 1.4 (any admin too scared to go with a 'dev' release).  The real  
answer is to get out a stable 1.6 ASAP.  The problem we currently  
have is (horrible Texas pun) 'too many pokers in the fire.'  We have  
svn migration, major changes in the test suite, talk about splitting  
bioperl, a lot of bugs to sort through, new code to add or work on,  
etc.  Not to mention our $jobs!

I think we should just bite the bullet and proceed with pulling out  
the controversial operator overloading in Bio::Annotation*, deprecate  
the tag methods in AnnotatableI, and go about fixing everything up.   
If that occurs (which seems to be the major impediment) and we get  
GMOD/GBrowse playing well with BioPerl then we can aim for a new  
stable release, and then institute a regular release cycle.

chris


From bpederse at gmail.com  Thu Jul  5 17:58:24 2007
From: bpederse at gmail.com (Brent Pedersen)
Date: Thu, 5 Jul 2007 10:58:24 -0700
Subject: [Bioperl-l] slippy map for genomic features.
Message-ID: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>

hi,
here's a side project i've been tinkering on in googlecode svn that
may be useful to some.
http://code.google.com/p/genome-browser/
it's a simple hack on top of OpenLayers (openlayers.org) to provide a
javascript slippy map interface and API to view and browse genomic
features. It can be used with any image generation program that can
accept &xmin= and &xmax= parameters through the url. -- though i
havent had it working it bioperl as bioperl generates images of
different height depending on the number of tracks.

there's a live example of the code in SVN here:
http://toxic.berkeley.edu/bpederse/genome-browser/
with images generated by a colleague's modules on first request. those
images are then cached by a simple perl script included in the SVN
repo. all subsequent requests are returned from the cache.
an image request (automatically generated by the javascript) looks like:
http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
but any implementation need only implement xmin and xmax. all other
parameters will be used for caching but are not required.

if anyone is interested in getting this going with bioperl image
generation--or improving the project in any way, let me know and i'll
add you as a committer and provide any javascript support that i can.

-brent

tar ball download:
http://genome-browser.googlecode.com/files/genome-browser-0.02.tar


From dmessina at wustl.edu  Thu Jul  5 18:39:16 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 5 Jul 2007 13:39:16 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <DD6F2CE5-FE79-48D2-9410-FACA35AFEF9C@wustl.edu>

> The real answer is to get out a stable 1.6 ASAP.  The problem we  
> currently have is (horrible Texas pun) 'too many pokers in the  
> fire.'  We have svn migration, major changes in the test suite,  
> talk about splitting bioperl, a lot of bugs to sort through, new  
> code to add or work on, etc.  Not to mention our $jobs!

Yep, I hear ya.


> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*,  
> deprecate the tag methods in AnnotatableI, and go about fixing  
> everything up.  If that occurs (which seems to be the major  
> impediment) and we get GMOD/GBrowse playing well with BioPerl then  
> we can aim for a new stable release, and then institute a regular  
> release cycle.

That's a great plan. You're right -- better to devote energy to 1.6  
than to interim solutions.

Alright, I give, I give! :)
Dave


From glauberwagner at yahoo.com.br  Thu Jul  5 19:56:43 2007
From: glauberwagner at yahoo.com.br (Glauber Wagner)
Date: Thu, 5 Jul 2007 16:56:43 -0300 (ART)
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <839755.95349.qm@web36514.mail.mud.yahoo.com>

Dear All,

I have a problem if Bio::DB::Query::GenBank module. I
am trying to count the number of protein sequences and
the module did not return the expected number by count
object.

use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

$query_string = "Trypanosoma cruzi[Organism]";

  my $query =
Bio::DB::Query::GenBank->new(-db=>'protein',
                                           
-query=>$query_string);
   my $count = $query->count;
   my @ids   = $query->ids;

print "$count\n";

Thanks.
Glauber


____________________________________________________________________________________
Novo Yahoo! Cad?? - Experimente uma nova busca.
http://yahoo.com.br/oqueeuganhocomisso 


From cjfields at uiuc.edu  Thu Jul  5 20:21:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 15:21:49 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <839755.95349.qm@web36514.mail.mud.yahoo.com>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
Message-ID: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>

NCBI esearch doesn't seem to be working at the moment.  I'm getting  
'Internal Server Error' at this time.  Try back again at a later point.

chris

On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:

> Dear All,
>
> I have a problem if Bio::DB::Query::GenBank module. I
> am trying to count the number of protein sequences and
> the module did not return the expected number by count
> object.
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> $query_string = "Trypanosoma cruzi[Organism]";
>
>   my $query =
> Bio::DB::Query::GenBank->new(-db=>'protein',
>
> -query=>$query_string);
>    my $count = $query->count;
>    my @ids   = $query->ids;
>
> print "$count\n";
>
> Thanks.
> Glauber
>
>
>
>
> ______________________________________________________________________ 
> ______________
> Novo Yahoo! Cad?? - Experimente uma nova busca.
> http://yahoo.com.br/oqueeuganhocomisso
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mitch_skinner at berkeley.edu  Thu Jul  5 21:22:38 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 05 Jul 2007 14:22:38 -0700
Subject: [Bioperl-l] slippy map for genomic features.
In-Reply-To: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
References: <e183a99d0707051058y4d5a3f35v8a0aff35dc90c394@mail.gmail.com>
Message-ID: <468D611E.7020904@berkeley.edu>

Hi,

FWIW, we've been working on something similar:
http://genome.biowiki.org/dmel/static/browser/prototype_gbrowse.html
based on GBrowse/Bio::Graphics and javascript that Andrew wrote from 
scratch (with the prototype library).  When our project was starting up 
(fall 05) Andrew looked but didn't find openlayers; I'm not sure if it 
was public back then but their current svn only goes back to 2006.

I think that things like layout (bumping) ought to be done in advance on 
a chromosome-wide basis; otherwise it's difficult to keep features from 
ending up at different heights on neighboring tiles.  And it would be 
difficult for the server to know what was being clicked on.  So we've 
been doing some up-front work to either do layout or to just render all 
the tiles in advance:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/TileGenerator.pm?revision=1.1&view=markup
which is driven by this script:
http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ajax/server/generate-tiles.pl?revision=1.14&view=markup

Or you could just not bump at all, I guess.  I think of that as 
important functionality but I'd be interested in hearing about use cases 
where it's not necessary.  It's not just bumping, though; things like 
text labels also make it difficult to predict exactly what pixels a 
feature will span if you only have its genomic coordinates.

To make features clickable we've been using imagemaps; it simplifies the 
server code but it bogs down the client quite a bit.

I'd certainly be interested in seeing if there are ways we could work 
together; if you're at Berkeley maybe we could meet.

Regards,
Mitch

Brent Pedersen wrote:
> hi,
> here's a side project i've been tinkering on in googlecode svn that
> may be useful to some.
> http://code.google.com/p/genome-browser/
> it's a simple hack on top of OpenLayers (openlayers.org) to provide a
> javascript slippy map interface and API to view and browse genomic
> features. It can be used with any image generation program that can
> accept &xmin= and &xmax= parameters through the url. -- though i
> havent had it working it bioperl as bioperl generates images of
> different height depending on the number of tracks.
>
> there's a live example of the code in SVN here:
> http://toxic.berkeley.edu/bpederse/genome-browser/
> with images generated by a colleague's modules on first request. those
> images are then cached by a simple perl script included in the SVN
> repo. all subsequent requests are returned from the cache.
> an image request (automatically generated by the javascript) looks like:
> http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512
> but any implementation need only implement xmin and xmax. all other
> parameters will be used for caching but are not required.
>
> if anyone is interested in getting this going with bioperl image
> generation--or improving the project in any way, let me know and i'll
> add you as a committer and provide any javascript support that i can.
>
> -brent
>
> tar ball download:
> http://genome-browser.googlecode.com/files/genome-browser-0.02.tar
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From cjfields at uiuc.edu  Thu Jul  5 21:42:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Jul 2007 16:42:40 -0500
Subject: [Bioperl-l] Bio::DB::Query::GenBank failures
In-Reply-To: <190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
References: <839755.95349.qm@web36514.mail.mud.yahoo.com>
	<190D7522-9681-410A-9B04-CFB70E328EC4@uiuc.edu>
Message-ID: <3219E785-D475-4C21-ABCC-89FABD502E05@uiuc.edu>

Update: seems to be back up.  Give it a try now.

chris

On Jul 5, 2007, at 3:21 PM, Chris Fields wrote:

> NCBI esearch doesn't seem to be working at the moment.  I'm getting
> 'Internal Server Error' at this time.  Try back again at a later  
> point.
>
> chris
>
> On Jul 5, 2007, at 2:56 PM, Glauber Wagner wrote:
>
>> Dear All,
>>
>> I have a problem if Bio::DB::Query::GenBank module. I
>> am trying to count the number of protein sequences and
>> the module did not return the expected number by count
>> object.
>>
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>>
>> $query_string = "Trypanosoma cruzi[Organism]";
>>
>>   my $query =
>> Bio::DB::Query::GenBank->new(-db=>'protein',
>>
>> -query=>$query_string);
>>    my $count = $query->count;
>>    my @ids   = $query->ids;
>>
>> print "$count\n";
>>
>> Thanks.
>> Glauber
>>
>>
>>
>>
>> _____________________________________________________________________ 
>> _
>> ______________
>> Novo Yahoo! Cad?? - Experimente uma nova busca.
>> http://yahoo.com.br/oqueeuganhocomisso
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Fri Jul  6 07:09:17 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 08:09:17 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
Message-ID: <468DEA9D.6010809@sheffield.ac.uk>

David Messina wrote:
>> [Chris]
>> The 1.5 releases I believe break some aspects of 1.4 API
>>     
>
> Yes, this is true.
>
> I question, though, whether it's relevant given that virtually no one  
> uses 1.4 anymore. In any case, I would venture that the number of  
> people who would be bitten by the 1.4->1.5 API change is much smaller  
> than the number of people who download 1.4 and then ask us why it  
> doesn't work.
>   

I'm not really up-to-speed with how the API should remain stable etc. Is 
the idea that the API should be stable from 1.4 though the 1.5 dev and 
then the next stale release can change that API? So any stable to stable 
upgrade could involve an API change while a stable to dev upgrade should 
have the same API? Does a stable API mean that the same method calls are 
available in a newer release....what about adding new methods to a newer 
release?

How are these API changes currently tracked? It seems to me that 
Test::More might be able to help in testing the API:

can_ok($module, @methods);


Nath


From n.haigh at sheffield.ac.uk  Fri Jul  6 11:10:14 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 12:10:14 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
Message-ID: <468E2316.1030804@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm taking a look at the tests for Bio::Variation::RNAChange.

If you create a new oject without arguments:
my $obj = Bio::Variation::RNAChange->new();

What do you expect the following to return:
$obj->label();

I thought it would probably be:
'inframe'

However you get:
'inframe, deletion'

Can anyone in the know explain what behaviour would be expected?

Cheers
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjiMVczuW2jkwy2gRAv+0AJ9tA/1WgEbTRCen+FCi/DU/P2RnAwCfbGit
B8DxDViDOcx2gTFjSwQ2kNg=
=SroY
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 12:54:33 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 13:54:33 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E2316.1030804@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
Message-ID: <468E3B89.3090202@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan S. Haigh wrote:
> I'm taking a look at the tests for Bio::Variation::RNAChange.
> 
> If you create a new oject without arguments:
> my $obj = Bio::Variation::RNAChange->new();
> 
> What do you expect the following to return:
> $obj->label();
> 
> I thought it would probably be:
> 'inframe'
> 
> However you get:
> 'inframe, deletion'
> 
> Can anyone in the know explain what behaviour would be expected?
> 
> Cheers
> Nath

Following on from this, AAChange has the following two methods:
add_Allele() and allele_mut()

It appears that allele_mut is only capable of remembering 1 allele at a
time, whereas add_Allele() is provided to add support for mutliple
alleles - is that correct?

However, add_Allele() also calls allele_mut(), such that mutliple calls
to add_Allele will result in the overwriting of the allele being
remembered by allele_mut(). Things are further complicated by the fact
that label() uses allele_mut() to decide on the label to return.
Shouldn't label know aout multiple alleles set by multiple calls to
add_Allele?

It may be my lack of understanding alleles and what these classes are
intending to do, but trying to rewrite the test scripts to improve code
coverage has let me a little confused!

Thanks
Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjjuJczuW2jkwy2gRAgogAKDXAn8h5iFIBCjtQgxYsrUGofYpOwCguC6I
b8ZOENvDDDIxphAoxeKg8/E=
=f/sa
-----END PGP SIGNATURE-----


From tanzeem.mb at gmail.com  Thu Jul  5 06:39:34 2007
From: tanzeem.mb at gmail.com (tanzeem)
Date: Wed, 4 Jul 2007 23:39:34 -0700 (PDT)
Subject: [Bioperl-l] Problem working with remoteblast submit method in
 webbrowser.
In-Reply-To: <11114623.post@talk.nabble.com>
References: <11114623.post@talk.nabble.com>
Message-ID: <11441586.post@talk.nabble.com>


Ifound it myself.run apache as root and disable selinux, the problem will not
recur.

tanzeem wrote:
> 
>  I have a program which uses the Bio perl remoteblast module which
> compares a aminoacid  fasta file with swissprot database. The
> submit_blast() method  works successfully when   run  from commandline.But
> when the program is run from web browser it returns -1. I was trying to
> adapt the code from Remoteblast synopsis for my need.
> 

-- 
View this message in context: http://www.nabble.com/Problem-working-with-remoteblast-submit-method-in-webbrowser.-tf3919886.html#a11441586
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cain.cshl at gmail.com  Fri Jul  6 13:00:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 06 Jul 2007 09:00:32 -0400
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
Message-ID: <1183726832.2566.34.camel@localhost.localdomain>

On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
> 
> I think we should just bite the bullet and proceed with pulling out  
> the controversial operator overloading in Bio::Annotation*, deprecate  
> the tag methods in AnnotatableI, and go about fixing everything up.   
> If that occurs (which seems to be the major impediment) and we get  
> GMOD/GBrowse playing well with BioPerl then we can aim for a new  
> stable release, and then institute a regular release cycle.
> 
I think this sounds like a good idea to me too.  I'm planning on having
a GMOD hackathon at the end of the summer; if I had a new API by then,
we could focus on fixing anything that gets broken by the changes.

Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070706/d77c2d90/attachment.sig>

From cjfields at uiuc.edu  Fri Jul  6 13:10:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 6 Jul 2007 08:10:41 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468DEA9D.6010809@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
Message-ID: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>


On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:

> David Messina wrote:
>>> [Chris]
>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>
>>
>> Yes, this is true.
>>
>> I question, though, whether it's relevant given that virtually no one
>> uses 1.4 anymore. In any case, I would venture that the number of
>> people who would be bitten by the 1.4->1.5 API change is much smaller
>> than the number of people who download 1.4 and then ask us why it
>> doesn't work.
>>
>
> I'm not really up-to-speed with how the API should remain stable  
> etc. Is
> the idea that the API should be stable from 1.4 though the 1.5 dev and
> then the next stale release can change that API? So any stable to  
> stable
> upgrade could involve an API change while a stable to dev upgrade  
> should
> have the same API? Does a stable API mean that the same method  
> calls are
> available in a newer release....what about adding new methods to a  
> newer
> release?
>
> How are these API changes currently tracked? It seems to me that
> Test::More might be able to help in testing the API:
>
> can_ok($module, @methods);
>
>
> Nath	

It's basically a 'contract' of sorts between the devs (us) and users  
(us/them) that the API won't change for the extent of that release  
series, thus ensuring any scripts out there generating tons of data  
won't break down if they attempt to call a renamed method.  We try to  
maintain the API state anyway for those reasons, but in a dev release  
series we might decide to change some method names for consistency  
and deprecate older ambiguously-named methods (see below).  For a  
stable release it's critical the API remain intact.

There are a few methods which are considered deprecated or will be  
deprecated.  For instance, we recently talked about changes to method  
names which use case to specify whether you're receiving an object  
(get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
nested list, or whether to use each_* vs next_* for iterators.   
Consistency is nice!

chris 


From heikki at sanbi.ac.za  Fri Jul  6 13:20:26 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 6 Jul 2007 15:20:26 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E3B89.3090202@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
Message-ID: <200707061520.27000.heikki@sanbi.ac.za>

Hi Nat,

These modules have not been touched for a while and were developed for a 
specific task. A revire is defiitely in order.

The way RNAChange->label was written, it should return 'inframe' when given no 
alleles, but 'no change' would actually be better.

The multiple alleles were originally though to be a good idea, but the 
vocabulary for labels was developed for single allele, only, The use of the 
module ended up being limited to single allele, so add_allele() behaviour was  
conveniently ignored but not removed. :(

	-Heikki


On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> Nathan S. Haigh wrote:
> > I'm taking a look at the tests for Bio::Variation::RNAChange.
> >
> > If you create a new oject without arguments:
> > my $obj = Bio::Variation::RNAChange->new();
> >
> > What do you expect the following to return:
> > $obj->label();
> >
> > I thought it would probably be:
> > 'inframe'
> >
> > However you get:
> > 'inframe, deletion'
> >
> > Can anyone in the know explain what behaviour would be expected?
> >
> > Cheers
> > Nath
>
> Following on from this, AAChange has the following two methods:
> add_Allele() and allele_mut()
>
> It appears that allele_mut is only capable of remembering 1 allele at a
> time, whereas add_Allele() is provided to add support for mutliple
> alleles - is that correct?
>
> However, add_Allele() also calls allele_mut(), such that mutliple calls
> to add_Allele will result in the overwriting of the allele being
> remembered by allele_mut(). Things are further complicated by the fact
> that label() uses allele_mut() to decide on the label to return.
> Shouldn't label know aout multiple alleles set by multiple calls to
> add_Allele?
>
> It may be my lack of understanding alleles and what these classes are
> intending to do, but trying to rewrite the test scripts to improve code
> coverage has let me a little confused!
>
> Thanks
> Nath
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From schlesi at ebi.ac.uk  Fri Jul  6 14:24:05 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Fri, 6 Jul 2007 15:24:05 +0100
Subject: [Bioperl-l] Unrooting a tree
Message-ID: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>

Hi,

I am reading a rooted tree in newick format from a string (i.e. a
bifurcation at the root) and would like to unroot it (i.e. a
trifurcation at the root). I tried getting a grandchild of the root
and adding it as a direct child, but that does not seem to work (the
root still only has two descendents and the tree structure gets messed
up). Is there a nice way to do this directly in bioperl? Doing it on
the newick string is possible of course, but not nice.

Thanks
  Felix


From n.haigh at sheffield.ac.uk  Fri Jul  6 15:37:19 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:37:19 +0100
Subject: [Bioperl-l] API Changes
In-Reply-To: <E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
Message-ID: <468E61AF.9040106@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:
> 
> On Jul 6, 2007, at 2:09 AM, Nathan S. Haigh wrote:
> 
>> David Messina wrote:
>>>> [Chris]
>>>> The 1.5 releases I believe break some aspects of 1.4 API
>>>>
>>>
>>> Yes, this is true.
>>>
>>> I question, though, whether it's relevant given that virtually no one
>>> uses 1.4 anymore. In any case, I would venture that the number of
>>> people who would be bitten by the 1.4->1.5 API change is much smaller
>>> than the number of people who download 1.4 and then ask us why it
>>> doesn't work.
>>>
>>
>> I'm not really up-to-speed with how the API should remain stable etc. Is
>> the idea that the API should be stable from 1.4 though the 1.5 dev and
>> then the next stale release can change that API? So any stable to stable
>> upgrade could involve an API change while a stable to dev upgrade should
>> have the same API? Does a stable API mean that the same method calls are
>> available in a newer release....what about adding new methods to a newer
>> release?
>>
>> How are these API changes currently tracked? It seems to me that
>> Test::More might be able to help in testing the API:
>>
>> can_ok($module, @methods);
>>
>>
>> Nath   
> 
> It's basically a 'contract' of sorts between the devs (us) and users
> (us/them) that the API won't change for the extent of that release
> series, thus ensuring any scripts out there generating tons of data
> won't break down if they attempt to call a renamed method.  We try to
> maintain the API state anyway for those reasons, but in a dev release
> series we might decide to change some method names for consistency and
> deprecate older ambiguously-named methods (see below).  For a stable
> release it's critical the API remain intact.

Hmm, still not 100% clear - it is Friday!

So, someone running a script that was designed when 1.4 was released
should still be able to run their script for all future releases. So all
changes need to be backward compatible?

So you have several situations regarding method names:
1) Adding new methods should e fine since past scripts don't know about
them and won't have used them
2) Removing methods would break past scripts that used them
3) Renamed methods would break past scripts that used the old name

A stable API to me, means the same method calls should still be able to
accept the same arguments (inc the constructor) and return the same
object/data etc.

What if a module is pretty outdated and would benefit from a rewrite -
should all the old method names be included, what if this makes coding
difficult?

> 
> There are a few methods which are considered deprecated or will be
> deprecated.  For instance, we recently talked about changes to method
> names which use case to specify whether you're receiving an object
> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs. nested
> list, or whether to use each_* vs next_* for iterators.  Consistency is
> nice!
> 

You mean the use of case to signify objects vs data being returned are
to be deprecated or encouraged? What was the outcome of the each_* vs
next_*?

Nath


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmGvczuW2jkwy2gRAkGeAKDBXVSBvN0b39xbK1+2RLed35knSQCgz3pk
kAWH1zVa1ycopijl761cvkQ=
=fppH
-----END PGP SIGNATURE-----


From n.haigh at sheffield.ac.uk  Fri Jul  6 15:43:41 2007
From: n.haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri, 06 Jul 2007 16:43:41 +0100
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <200707061520.27000.heikki@sanbi.ac.za>
References: <468E2316.1030804@sheffield.ac.uk>
	<468E3B89.3090202@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
Message-ID: <468E632D.4090801@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heikki Lehvaslaiho wrote:
> Hi Nat,
> 
> These modules have not been touched for a while and were developed for a 
> specific task. A revire is defiitely in order.
> 
> The way RNAChange->label was written, it should return 'inframe' when given no 
> alleles, but 'no change' would actually be better.

Wouldn't this effectively be changing the API since past scripts "could"
expect "inframe" to be returned.

> 
> The multiple alleles were originally though to be a good idea, but the 
> vocabulary for labels was developed for single allele, only, The use of the 
> module ended up being limited to single allele, so add_allele() behaviour was  
> conveniently ignored but not removed. :(

So add_Allele() and each_Allele() should be deprecated in favour of
allele_mut()?

- From my post about API's.....how should the capitalisation of
add_Allele() and each_Allele() be changed?

Cheers
Nath


> 
> 	-Heikki
> 
> 
> 
> On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
>> Nathan S. Haigh wrote:
>>> I'm taking a look at the tests for Bio::Variation::RNAChange.
>>>
>>> If you create a new oject without arguments:
>>> my $obj = Bio::Variation::RNAChange->new();
>>>
>>> What do you expect the following to return:
>>> $obj->label();
>>>
>>> I thought it would probably be:
>>> 'inframe'
>>>
>>> However you get:
>>> 'inframe, deletion'
>>>
>>> Can anyone in the know explain what behaviour would be expected?
>>>
>>> Cheers
>>> Nath
>> Following on from this, AAChange has the following two methods:
>> add_Allele() and allele_mut()
>>
>> It appears that allele_mut is only capable of remembering 1 allele at a
>> time, whereas add_Allele() is provided to add support for mutliple
>> alleles - is that correct?
>>
>> However, add_Allele() also calls allele_mut(), such that mutliple calls
>> to add_Allele will result in the overwriting of the allele being
>> remembered by allele_mut(). Things are further complicated by the fact
>> that label() uses allele_mut() to decide on the label to return.
>> Shouldn't label know aout multiple alleles set by multiple calls to
>> add_Allele?
>>
>> It may be my lack of understanding alleles and what these classes are
>> intending to do, but trying to rewrite the test scripts to improve code
>> coverage has let me a little confused!
>>
>> Thanks
>> Nath
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjmMtczuW2jkwy2gRAgQHAKC+S5mVh4lqR95NmgR6z+aU9br5lQCfc6ue
GBHuSHfsesX1ko55s+ME2Zc=
=tkG8
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Sat Jul  7 20:57:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 15:57:37 -0500
Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions
In-Reply-To: <1183726832.2566.34.camel@localhost.localdomain>
References: <468B6FBF.1070708@sendu.me.uk>
	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>
	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>
	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>
	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>
	<468CA721.4020804@sheffield.ac.uk>
	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>
	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>
	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>
	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<AE1AF127-4D2A-458C-8EF0-F3D8D3B239E5@uiuc.edu>
	<1183726832.2566.34.camel@localhost.localdomain>
Message-ID: <198D3F24-8510-453D-9201-21F2CCEC3519@uiuc.edu>

We'll prob. get a start soon, then.  I'll let you know when we start.

chris

On Jul 6, 2007, at 8:00 AM, Scott Cain wrote:

> On Thu, 2007-07-05 at 11:57 -0500, Chris Fields wrote:
>>
>> I think we should just bite the bullet and proceed with pulling out
>> the controversial operator overloading in Bio::Annotation*, deprecate
>> the tag methods in AnnotatableI, and go about fixing everything up.
>> If that occurs (which seems to be the major impediment) and we get
>> GMOD/GBrowse playing well with BioPerl then we can aim for a new
>> stable release, and then institute a regular release cycle.
>>
> I think this sounds like a good idea to me too.  I'm planning on  
> having
> a GMOD hackathon at the end of the summer; if I had a new API by then,
> we could focus on fixing anything that gets broken by the changes.
>
> Scott
>
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sat Jul  7 21:17:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 7 Jul 2007 16:17:14 -0500
Subject: [Bioperl-l] API Changes
In-Reply-To: <468E61AF.9040106@sheffield.ac.uk>
References: <468B6FBF.1070708@sendu.me.uk>	<9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu>	<6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu>	<B05B75A3-FB79-4125-89B4-4B9DC4443CF6@bioperl.org>	<F1A8255F-B58E-41E3-8EE1-DBAC11593428@uiuc.edu>	<468CA721.4020804@sheffield.ac.uk>	<21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net>	<B5011E60-D686-4E71-A1AB-46990548ABD1@uiuc.edu>	<FBB24EB4-7728-476E-98EA-77A53C870A89@wustl.edu>	<BF212044-F565-434B-882F-507974566B66@uiuc.edu>
	<8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu>
	<468DEA9D.6010809@sheffield.ac.uk>
	<E0AE6402-AC6A-4C2A-BF5B-EB02670FEA62@uiuc.edu>
	<468E61AF.9040106@sheffield.ac.uk>
Message-ID: <369F72D5-E5A3-4A33-BDEC-D462A339474F@uiuc.edu>


On Jul 6, 2007, at 10:37 AM, Nathan S. Haigh wrote:

> ...
> Hmm, still not 100% clear - it is Friday!
>
> So, someone running a script that was designed when 1.4 was released
> should still be able to run their script for all future releases.  
> So all
> changes need to be backward compatible?

It helps.  For instance, if we change method names (rename each_Foo  
as next_Foo), we should have each_Foo delegate to next_Foo for the  
time being.  If we plan on deprecating the old method altogether we  
would add a warning message when it's called, then delegate.

It's a better solution than just changing the method outright, which  
means the user has to search through docs to find the renamed method.

> So you have several situations regarding method names:
> 1) Adding new methods should e fine since past scripts don't know  
> about
> them and won't have used them
> 2) Removing methods would break past scripts that used them
> 3) Renamed methods would break past scripts that used the old name
>
> A stable API to me, means the same method calls should still be  
> able to
> accept the same arguments (inc the constructor) and return the same
> object/data etc.

Yes.

> What if a module is pretty outdated and would benefit from a rewrite -
> should all the old method names be included, what if this makes coding
> difficult?

It depends on the module.  If a complete rewrite is needed then maybe  
starting with a new module/interface is best, and we could deprecate  
the older module completely.  That has been done already with  
Bio::Tools::BPLite (in favor of SearchIO) and a few other modules.

>> There are a few methods which are considered deprecated or will be
>> deprecated.  For instance, we recently talked about changes to method
>> names which use case to specify whether you're receiving an object
>> (get_Foo) vs. data (get_foo), a list (get_Foos), a flattened vs.  
>> nested
>> list, or whether to use each_* vs next_* for iterators.   
>> Consistency is
>> nice!
>>
>
> You mean the use of case to signify objects vs data being returned are
> to be deprecated or encouraged? What was the outcome of the each_* vs
> next_*?
>
> Nath

Here's the section I added to the wiki (it started in a thread a few  
weeks or so ago, so it's a summary really):

http://www.bioperl.org/wiki/Advanced_BioPerl#Method_names

Feel free to add to it or make suggestions.

BTWm Hilmar mentioned there was a movement to rename methods in old  
code to follow these recs but it was never completed.  It should be  
taken up again at some point but the recommendations are mainly here  
for newer code.

chris


From heikki at sanbi.ac.za  Sun Jul  8 07:32:21 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 8 Jul 2007 09:32:21 +0200
Subject: [Bioperl-l] Bio::Variation::RNAChange
In-Reply-To: <468E632D.4090801@sheffield.ac.uk>
References: <468E2316.1030804@sheffield.ac.uk>
	<200707061520.27000.heikki@sanbi.ac.za>
	<468E632D.4090801@sheffield.ac.uk>
Message-ID: <200707080932.21818.heikki@sanbi.ac.za>

On Friday 06 July 2007 17:43:41 Nathan S. Haigh wrote:
> Heikki Lehvaslaiho wrote:
> > Hi Nat,
> >
> > These modules have not been touched for a while and were developed for a
> > specific task. A revire is defiitely in order.
> >
> > The way RNAChange->label was written, it should return 'inframe' when
> > given no alleles, but 'no change' would actually be better.
>
> Wouldn't this effectively be changing the API since past scripts "could"
> expect "inframe" to be returned.

Checking tha actal usage and what happens when you do change of a nucleotide 
to itself, you get the label 'silent'. I guess that would be a valid lable 
value even when the alleles are not initialised, too.

> > The multiple alleles were originally though to be a good idea, but the
> > vocabulary for labels was developed for single allele, only, The use of
> > the module ended up being limited to single allele, so add_allele()
> > behaviour was conveniently ignored but not removed. :(
>
> So add_Allele() and each_Allele() should be deprecated in favour of
> allele_mut()?

Yes.

> From my post about API's.....how should the capitalisation of
> add_Allele() and each_Allele() be changed?

Definitely, keept the current ones as deprecated alternatives.


    -Heikki

> Cheers
> Nath
>
> > 	-Heikki
> >
> > On Friday 06 July 2007 14:54:33 Nathan S. Haigh wrote:
> >> Nathan S. Haigh wrote:
> >>> I'm taking a look at the tests for Bio::Variation::RNAChange.
> >>>
> >>> If you create a new oject without arguments:
> >>> my $obj = Bio::Variation::RNAChange->new();
> >>>
> >>> What do you expect the following to return:
> >>> $obj->label();
> >>>
> >>> I thought it would probably be:
> >>> 'inframe'
> >>>
> >>> However you get:
> >>> 'inframe, deletion'
> >>>
> >>> Can anyone in the know explain what behaviour would be expected?
> >>>
> >>> Cheers
> >>> Nath
> >>
> >> Following on from this, AAChange has the following two methods:
> >> add_Allele() and allele_mut()
> >>
> >> It appears that allele_mut is only capable of remembering 1 allele at a
> >> time, whereas add_Allele() is provided to add support for mutliple
> >> alleles - is that correct?
> >>
> >> However, add_Allele() also calls allele_mut(), such that mutliple calls
> >> to add_Allele will result in the overwriting of the allele being
> >> remembered by allele_mut(). Things are further complicated by the fact
> >> that label() uses allele_mut() to decide on the label to return.
> >> Shouldn't label know aout multiple alleles set by multiple calls to
> >> add_Allele?
> >>
> >> It may be my lack of understanding alleles and what these classes are
> >> intending to do, but trying to rewrite the test scripts to improve code
> >> coverage has let me a little confused!
> >>
> >> Thanks
> >> Nath
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From xing.y.hu at gmail.com  Mon Jul  9 06:26:40 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Mon, 09 Jul 2007 14:26:40 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
Message-ID: <4691D520.60700@gmail.com>

Hi friends,

    I wrote a script for getting genomic sequence file from GenBank. To 
fulfill that target, I used DB::GenBank module to get the sequence via 
get_Seq_by_acc, and it works well. But this time, facing enormous amount 
of ESTs, I have no idea how to download them swiftly and elegantly.

    PROBLEM DESCRIPTION:
    goal: download all EST files of a specific species from GenBank, say 
Arabidopsis Thaliana or Oryza sativa(rice).
    other: whether all of ESTs are in a single file or separatedly 
placed does not matter.

    Can I use a bioperl script to achieve that? And How? I really 
appreciate.

Xing.


From akozik at atgc.org  Mon Jul  9 12:25:14 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Mon, 09 Jul 2007 05:25:14 -0700
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4691D520.60700@gmail.com>
References: <4691D520.60700@gmail.com>
Message-ID: <4692292A.1080900@atgc.org>

To download genomic sequences or ESTs for any organism (in various 
formats) you can use NCBI Taxonomy Browser:
http://www.ncbi.nlm.nih.gov/Taxonomy/

you can use taxonomy id to access different organisms, Arabidopsis for 
example (3702):
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702

or by direct web link:
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1

assembled genomes can be accessed via ftp:
ftp://ftp.ncbi.nih.gov/genomes/

To download large amount of selected sequences (ESTs for example) you 
can use batch Entrez:
http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
(select EST for EST, it's critical)

It seems, to solve the problem you describe, you don't need to use 
bioperl. NCBI GenBank Entrez provides all necessary tools to work on 
these simple and frequent tasks.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


Xing Hu wrote:
> Hi friends,
> 
>     I wrote a script for getting genomic sequence file from GenBank. To 
> fulfill that target, I used DB::GenBank module to get the sequence via 
> get_Seq_by_acc, and it works well. But this time, facing enormous amount 
> of ESTs, I have no idea how to download them swiftly and elegantly.
> 
>     PROBLEM DESCRIPTION:
>     goal: download all EST files of a specific species from GenBank, say 
> Arabidopsis Thaliana or Oryza sativa(rice).
>     other: whether all of ESTs are in a single file or separatedly 
> placed does not matter.
> 
>     Can I use a bioperl script to achieve that? And How? I really 
> appreciate.
> 
> Xing.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Jul  9 14:17:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Jul 2007 09:17:23 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <4692292A.1080900@atgc.org>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
Message-ID: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>

Caveat: if you have millions of ESTs please consider NOT using my  
eutil script below or NCBI Batch Entrez, which would repeatedly hit  
the NCBI server thousands of times.  At least try looking for other  
ways to retrieve the data you want (ftp, organism-specific resources  
like Ensembl, so on), or run any scripts or data retrieval in off  
hours so you don't overtax the NCBI server.

There is a way you can use BioPerl if you don't mind living on the  
bleeding edge by using bioperl-live (core code from CVS).  I have  
been working on a set of modules for the last year  
(Bio::DB::EUtilities) which interact with all the various eutils for  
building data pipelines which uses the NCBI CGI interface.  You could  
possibly retrieve all relevant ESTs using a variation of the example  
script here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch

Note that the code examples do NOT work with rel. 1.5.2 code as the  
API has changed quite a bit; I'm working to rectify some of that.

The script I would use is below.  It retrieves batches of 500  
sequences (in fasta format) at a time, for a total of 10000 max seq  
records, saving the raw record data directly to a file (appending as  
you go along).  I added an eval block to check the server status and  
redo the call up to 4 times before giving up completely.  Using eval  
this way hasn't been extensively tested but should work.

---------------------------------------

use Bio::DB::EUtilities;

my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'nucest',
                                        -term => 'txid3702',
                                        -usehistory => 'y',
                                        -keep_histories => 1);

my $count = $factory->get_count;

print "Count: $count\n";

if (my $hist = $factory->next_History) {
     print "History returned\n";
     # note db carries over from above
     $factory->set_parameters(-eutil => 'efetch',
                              -rettype => 'fasta',
                              -history => $hist);
     my ($retmax, $retstart) = (500,0);
     my $retry = 1;
     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq  
records to return
     RETRIEVE_SEQS:
     while ($retstart < $maxcount) {
         print "Returning from ",$retstart+1," to ",$retstart+ 
$retmax,"\n";
         $factory->set_parameters(-retmax => $retmax,
                                 -retstart => $retstart);
         # check in case of server error
         eval{
             $factory->get_Response(-file => ">>ESTs.fas");
         };
         if ($@) {
             die "Server error: $@.  Try again later" if $retry == 5;
             print STDERR "Server error, redo #$retry\n";
             $retry++ && redo RETRIEVE_SEQS;
         }
         $retstart += $retmax;
     }
}


---------------------------------------


chris

On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:

> To download genomic sequences or ESTs for any organism (in various
> formats) you can use NCBI Taxonomy Browser:
> http://www.ncbi.nlm.nih.gov/Taxonomy/
>
> you can use taxonomy id to access different organisms, Arabidopsis for
> example (3702):
> http://www.ncbi.nlm.nih.gov/sites/entrez? 
> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>
> or by direct web link:
> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>
> assembled genomes can be accessed via ftp:
> ftp://ftp.ncbi.nih.gov/genomes/
>
> To download large amount of selected sequences (ESTs for example) you
> can use batch Entrez:
> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
> (select EST for EST, it's critical)
>
> It seems, to solve the problem you describe, you don't need to use
> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
> these simple and frequent tasks.
>
> -Alex
>
> -- 
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 East Health Sciences Drive
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email#1: akozik at atgc.org
> email#2: akozik at gmail.com
> web: http://www.atgc.org/
>
>
>
> Xing Hu wrote:
>> Hi friends,
>>
>>     I wrote a script for getting genomic sequence file from  
>> GenBank. To
>> fulfill that target, I used DB::GenBank module to get the sequence  
>> via
>> get_Seq_by_acc, and it works well. But this time, facing enormous  
>> amount
>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>
>>     PROBLEM DESCRIPTION:
>>     goal: download all EST files of a specific species from  
>> GenBank, say
>> Arabidopsis Thaliana or Oryza sativa(rice).
>>     other: whether all of ESTs are in a single file or separatedly
>> placed does not matter.
>>
>>     Can I use a bioperl script to achieve that? And How? I really
>> appreciate.
>>
>> Xing.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Mon Jul  9 18:08:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 9 Jul 2007 11:08:07 -0700
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
Message-ID: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>

I don't think there is a function for this yet but it would be a good  
one to have.
I assume you don't really want to take a shot at writing it though?

To make this work I think you have to create a new node which  
contains the trifurcation and this node is what the root is set to.

-jason

On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:

> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From lstein at cshl.edu  Mon Jul  9 21:35:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 9 Jul 2007 17:35:49 -0400
Subject: [Bioperl-l] JOB NOTICE: Looking for CSHL bioinformatics core manager
Message-ID: <6dce9a0b0707091435h3d134b05oa6f7da24839c24bb@mail.gmail.com>

Hi Folks,

Sorry for the job spam. We're looking for a manager of the Cold Spring
Harbor Laboratory bioinformatics core facility. This is a semi-independent
staff position supporting  CSHL scientific researchers by providing
consultation, data mining and software development activities. You will have
a software staff of two, a  nice salary, good health benefits, and an
exciting and dynamic environment to work in. I'm looking for someone with a
strong bioinformatics background, at least five years experience programming
Perl, Java or Python in a academic or commercial environment, and management
experience. If you are interested, please send your CV and cover letter to
me.

Thanks,

Lincoln

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From stewarta at nmrc.navy.mil  Mon Jul  9 22:16:12 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Mon, 9 Jul 2007 18:16:12 -0400
Subject: [Bioperl-l] rpsblast
Message-ID: <9DF71DFB-F54E-4392-89E3-33345EC2DB36@nmrc.navy.mil>

When I run...   $result = $factory->rpsblast($seq);   ... where $seq  
is a Bio::Seq object, it seems to simply copy the $seq object to  
$result;  When I run something similar... $rpsblast('/path/to/ 
myFile');    ... the value of $result then becomes '/path/to/myFile'.

Anyone else encounter this?


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From jason_stajich at berkeley.edu  Tue Jul 10 01:36:10 2007
From: jason_stajich at berkeley.edu (Jason Stajich)
Date: Mon, 9 Jul 2007 18:36:10 -0700
Subject: [Bioperl-l] BOSC2007
Message-ID: <E6F5077E-50A3-489E-94B0-109FCAE6200F@berkeley.edu>

I posted a quick note about meeting up at BOSC/ISMB this year. If you  
are attending, please sign your name on the page or at least express  
an interest on whether you are interested in a BoF.  We'll try and  
discuss some of the current topics in BioPerl development as well try  
and use the time to coordinate any development that benefits from the  
face-to-face time.

http://bioperl.org/wiki/BOSC2007_Meetup
http://bioperl.org/news/2007/07/09/are-you-going-to-ismbbosc-2007/

-jason
--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From schlesi at ebi.ac.uk  Tue Jul 10 12:58:00 2007
From: schlesi at ebi.ac.uk (Felix Schlesinger)
Date: Tue, 10 Jul 2007 13:58:00 +0100
Subject: [Bioperl-l] Unrooting a tree
In-Reply-To: <22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
References: <7317d50c0707060724g614ca3a2h8beceada040d3584@mail.gmail.com>
	<22B52D0B-D3C3-4EF8-98EC-8C2A9F267362@bioperl.org>
Message-ID: <7317d50c0707100558m76853bf8s37ee1e8852835306@mail.gmail.com>

Hi,

>  I don't think there is a function for this yet but it would be a good one
> to have.
> I assume you don't really want to take a shot at writing it though?
> To make this work I think you have to create a new node which contains the
> trifurcation and this node is what the root is set to.

Creating a new root is fine, but what would the (3) children of that
node be? I took a different approach now, where I iterate over all
(indirect) descendents of the root, find the first one which does not
have the root as its direct ancestor and move it up the tree, i.e.

foreach my $d ($root->get_all_Descendents){
  if ($d->ancestor != $root){
    $d->ancestor->remove_Descendent($d);
    if ($root->add_Descendent($d, 1) == 3){
    last;
  }}}

This will make the old root a trifurcation. It does the right thing
for what I am trying to do, but is not general I believe (it does for
example at the moment not worry about branch length). Also instead of
taking the first, taking the most distant possible subtree of a clade
up to the root might be better.

Felix


> On Jul 6, 2007, at 7:24 AM, Felix Schlesinger wrote:
>
> Hi,
>
> I am reading a rooted tree in newick format from a string (i.e. a
> bifurcation at the root) and would like to unroot it (i.e. a
> trifurcation at the root). I tried getting a grandchild of the root
> and adding it as a direct child, but that does not seem to work (the
> root still only has two descendents and the tree structure gets messed
> up). Is there a nice way to do this directly in bioperl? Doing it on
> the newick string is possible of course, but not nice.
>
> Thanks
>   Felix
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


From xing.y.hu at gmail.com  Tue Jul 10 13:29:36 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Tue, 10 Jul 2007 21:29:36 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
Message-ID: <469389C0.5060303@gmail.com>

Thanks you guys.

I had to confess that how stupid I was. The easiest way seems to be the 
way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
fact, I knew that but I thought it was necessary to have all items 
selected before pressing save to launch download. So I was desperate to 
find a button that could achieve that without hundreds of thousands of 
clicking by me. "What about select none of those items at all?" -- This 
idea finally came to me after days of struggling and the problem was solved.

Xing


Chris Fields wrote:
> Caveat: if you have millions of ESTs please consider NOT using my 
> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
> the NCBI server thousands of times.  At least try looking for other 
> ways to retrieve the data you want (ftp, organism-specific resources 
> like Ensembl, so on), or run any scripts or data retrieval in off 
> hours so you don't overtax the NCBI server.
>
> There is a way you can use BioPerl if you don't mind living on the 
> bleeding edge by using bioperl-live (core code from CVS).  I have been 
> working on a set of modules for the last year (Bio::DB::EUtilities) 
> which interact with all the various eutils for building data pipelines 
> which uses the NCBI CGI interface.  You could possibly retrieve all 
> relevant ESTs using a variation of the example script here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>
> Note that the code examples do NOT work with rel. 1.5.2 code as the 
> API has changed quite a bit; I'm working to rectify some of that.
>
> The script I would use is below.  It retrieves batches of 500 
> sequences (in fasta format) at a time, for a total of 10000 max seq 
> records, saving the raw record data directly to a file (appending as 
> you go along).  I added an eval block to check the server status and 
> redo the call up to 4 times before giving up completely.  Using eval 
> this way hasn't been extensively tested but should work.
>
> ---------------------------------------
>
> use Bio::DB::EUtilities;
>
> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                        -db => 'nucest',
>                                        -term => 'txid3702',
>                                        -usehistory => 'y',
>                                        -keep_histories => 1);
>
> my $count = $factory->get_count;
>
> print "Count: $count\n";
>
> if (my $hist = $factory->next_History) {
>     print "History returned\n";
>     # note db carries over from above
>     $factory->set_parameters(-eutil => 'efetch',
>                              -rettype => 'fasta',
>                              -history => $hist);
>     my ($retmax, $retstart) = (500,0);
>     my $retry = 1;
>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
> records to return
>     RETRIEVE_SEQS:
>     while ($retstart < $maxcount) {
>         print "Returning from ",$retstart+1," to 
> ",$retstart+$retmax,"\n";
>         $factory->set_parameters(-retmax => $retmax,
>                                 -retstart => $retstart);
>         # check in case of server error
>         eval{
>             $factory->get_Response(-file => ">>ESTs.fas");
>         };
>         if ($@) {
>             die "Server error: $@.  Try again later" if $retry == 5;
>             print STDERR "Server error, redo #$retry\n";
>             $retry++ && redo RETRIEVE_SEQS;
>         }
>         $retstart += $retmax;
>     }
> }
>
>
> ---------------------------------------
>
>
> chris
>
> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>
>> To download genomic sequences or ESTs for any organism (in various
>> formats) you can use NCBI Taxonomy Browser:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>
>> you can use taxonomy id to access different organisms, Arabidopsis for
>> example (3702):
>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>
>>
>> or by direct web link:
>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>
>>
>> assembled genomes can be accessed via ftp:
>> ftp://ftp.ncbi.nih.gov/genomes/
>>
>> To download large amount of selected sequences (ESTs for example) you
>> can use batch Entrez:
>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>> (select EST for EST, it's critical)
>>
>> It seems, to solve the problem you describe, you don't need to use
>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>> these simple and frequent tasks.
>>
>> -Alex
>>
>> --Alexander Kozik
>> Bioinformatics Specialist
>> Genome and Biomedical Sciences Facility
>> 451 East Health Sciences Drive
>> University of California
>> Davis, CA 95616-8816
>> Phone: (530) 754-9127
>> email#1: akozik at atgc.org
>> email#2: akozik at gmail.com
>> web: http://www.atgc.org/
>>
>>
>>
>> Xing Hu wrote:
>>> Hi friends,
>>>
>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>> amount
>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>
>>>     PROBLEM DESCRIPTION:
>>>     goal: download all EST files of a specific species from GenBank, 
>>> say
>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>     other: whether all of ESTs are in a single file or separatedly
>>> placed does not matter.
>>>
>>>     Can I use a bioperl script to achieve that? And How? I really
>>> appreciate.
>>>
>>> Xing.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From davila at ioc.fiocruz.br  Tue Jul 10 13:58:29 2007
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Tue, 10 Jul 2007 10:58:29 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <469389C0.5060303@gmail.com>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com>
Message-ID: <46939085.40906@ioc.fiocruz.br>

Hi Xing,

Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
and 13971 from T. cruzi 
(http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
  that I cannot download at once in GenBank format... even when I select 
"GenBank" format in the Display menu I can only see and get/download 500 
ESTs each time...

I also downloaded all ESTs from GenBank (a pity there are not subsets of 
them !) but merging all them generate a file bigger than 120GB to be 
processed...

Just asked Diogo (my student) to give a try to the script sent by Chris 
Fields.. so finger crossed ;-)

Cheers, Alberto


Xing Hu wrote:
> Thanks you guys.
> 
> I had to confess that how stupid I was. The easiest way seems to be the 
> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
> fact, I knew that but I thought it was necessary to have all items 
> selected before pressing save to launch download. So I was desperate to 
> find a button that could achieve that without hundreds of thousands of 
> clicking by me. "What about select none of those items at all?" -- This 
> idea finally came to me after days of struggling and the problem was solved.
> 
> Xing
> 
> 
> 
> Chris Fields wrote:
>> Caveat: if you have millions of ESTs please consider NOT using my 
>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>> the NCBI server thousands of times.  At least try looking for other 
>> ways to retrieve the data you want (ftp, organism-specific resources 
>> like Ensembl, so on), or run any scripts or data retrieval in off 
>> hours so you don't overtax the NCBI server.
>>
>> There is a way you can use BioPerl if you don't mind living on the 
>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>> which interact with all the various eutils for building data pipelines 
>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>> relevant ESTs using a variation of the example script here:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>
>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>> API has changed quite a bit; I'm working to rectify some of that.
>>
>> The script I would use is below.  It retrieves batches of 500 
>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>> records, saving the raw record data directly to a file (appending as 
>> you go along).  I added an eval block to check the server status and 
>> redo the call up to 4 times before giving up completely.  Using eval 
>> this way hasn't been extensively tested but should work.
>>
>> ---------------------------------------
>>
>> use Bio::DB::EUtilities;
>>
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                        -db => 'nucest',
>>                                        -term => 'txid3702',
>>                                        -usehistory => 'y',
>>                                        -keep_histories => 1);
>>
>> my $count = $factory->get_count;
>>
>> print "Count: $count\n";
>>
>> if (my $hist = $factory->next_History) {
>>     print "History returned\n";
>>     # note db carries over from above
>>     $factory->set_parameters(-eutil => 'efetch',
>>                              -rettype => 'fasta',
>>                              -history => $hist);
>>     my ($retmax, $retstart) = (500,0);
>>     my $retry = 1;
>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>> records to return
>>     RETRIEVE_SEQS:
>>     while ($retstart < $maxcount) {
>>         print "Returning from ",$retstart+1," to 
>> ",$retstart+$retmax,"\n";
>>         $factory->set_parameters(-retmax => $retmax,
>>                                 -retstart => $retstart);
>>         # check in case of server error
>>         eval{
>>             $factory->get_Response(-file => ">>ESTs.fas");
>>         };
>>         if ($@) {
>>             die "Server error: $@.  Try again later" if $retry == 5;
>>             print STDERR "Server error, redo #$retry\n";
>>             $retry++ && redo RETRIEVE_SEQS;
>>         }
>>         $retstart += $retmax;
>>     }
>> }
>>
>>
>> ---------------------------------------
>>
>>
>> chris
>>
>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>
>>> To download genomic sequences or ESTs for any organism (in various
>>> formats) you can use NCBI Taxonomy Browser:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>
>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>> example (3702):
>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>
>>>
>>> or by direct web link:
>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>
>>>
>>> assembled genomes can be accessed via ftp:
>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>
>>> To download large amount of selected sequences (ESTs for example) you
>>> can use batch Entrez:
>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>> (select EST for EST, it's critical)
>>>
>>> It seems, to solve the problem you describe, you don't need to use
>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>> these simple and frequent tasks.
>>>
>>> -Alex
>>>
>>> --Alexander Kozik
>>> Bioinformatics Specialist
>>> Genome and Biomedical Sciences Facility
>>> 451 East Health Sciences Drive
>>> University of California
>>> Davis, CA 95616-8816
>>> Phone: (530) 754-9127
>>> email#1: akozik at atgc.org
>>> email#2: akozik at gmail.com
>>> web: http://www.atgc.org/
>>>
>>>
>>>
>>> Xing Hu wrote:
>>>> Hi friends,
>>>>
>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>> amount
>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>
>>>>     PROBLEM DESCRIPTION:
>>>>     goal: download all EST files of a specific species from GenBank, 
>>>> say
>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>     other: whether all of ESTs are in a single file or separatedly
>>>> placed does not matter.
>>>>
>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>> appreciate.
>>>>
>>>> Xing.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>


From cjfields at uiuc.edu  Tue Jul 10 14:05:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:05:43 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>
	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
Message-ID: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>

Just make sure you're using the latest from CVS.  Let me know if it  
doesn't work and I'll look into it.

chris

On Jul 10, 2007, at 8:58 AM, Alberto Davila wrote:

> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei  
> ESTs
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=8)
> and 13971 from T. cruzi
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693 
> [Organism:exp]&cmd=Search&db=nucest&QueryKey=11)
>   that I cannot download at once in GenBank format... even when I  
> select
> "GenBank" format in the Display menu I can only see and get/ 
> download 500
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not  
> subsets of
> them !) but merging all them generate a file bigger than 120GB to be
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by  
> Chris
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to  
>> be the
>> way using NCBI Taxonomy Browser which suggested by alex. As a  
>> matter of
>> fact, I knew that but I thought it was necessary to have all items
>> selected before pressing save to launch download. So I was  
>> desperate to
>> find a button that could achieve that without hundreds of  
>> thousands of
>> clicking by me. "What about select none of those items at all?" --  
>> This
>> idea finally came to me after days of struggling and the problem  
>> was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>> Caveat: if you have millions of ESTs please consider NOT using my
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit
>>> the NCBI server thousands of times.  At least try looking for other
>>> ways to retrieve the data you want (ftp, organism-specific resources
>>> like Ensembl, so on), or run any scripts or data retrieval in off
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the
>>> bleeding edge by using bioperl-live (core code from CVS).  I have  
>>> been
>>> working on a set of modules for the last year (Bio::DB::EUtilities)
>>> which interact with all the various eutils for building data  
>>> pipelines
>>> which uses the NCBI CGI interface.  You could possibly retrieve all
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-. 
>>> 3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500
>>> sequences (in fasta format) at a time, for a total of 10000 max seq
>>> records, saving the raw record data directly to a file (appending as
>>> you go along).  I added an eval block to check the server status and
>>> redo the call up to 4 times before giving up completely.  Using eval
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms,  
>>>> Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez? 
>>>> db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi? 
>>>> mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for  
>>>> example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to  
>>>> work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from  
>>>>> GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the  
>>>>> sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and  
>>>>> elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from  
>>>>> GenBank,
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From diogoat at gmail.com  Tue Jul 10 14:15:20 2007
From: diogoat at gmail.com (Diogo Tschoeke)
Date: Tue, 10 Jul 2007 11:15:20 -0300
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
Message-ID: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>

Deal All,
I use this script bellow, and it`s work very fine!
I only changed the query! And the script gave me the 5133 EST from T.
brucei.

#################################################################################
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query = Bio::DB::Query::GenBank->new
                                (-query   =>'gbdiv est[prop] AND Trypanosoma
brucei [organism]',
                                db      => 'nucleotide');
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_query($query);

my $out = Bio::SeqIO->new(-format => 'Genbank',
                          -file => '>>Tbrucei.EST.fasta');
while (my $seq = $seqio->next_seq){
         $out->write_seq($seq);
                        }
####################################################################

Diogo Tschoeke/Fiocruz (Alberto`s Student)


From cjfields at uiuc.edu  Tue Jul 10 14:35:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 09:35:03 -0500
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
References: <4691D520.60700@gmail.com> <4692292A.1080900@atgc.org>
	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>
	<469389C0.5060303@gmail.com> <46939085.40906@ioc.fiocruz.br>
	<2D7B714D-EE7C-467A-A0D9-0F9EB12CBA99@uiuc.edu>
	<638512560707100715y4a692566n24e438322c8d919d@mail.gmail.com>
Message-ID: <4D704A90-A88A-44A3-B514-E5031CBF288C@uiuc.edu>

That will work as well; the key difference between my example and  
this one is that the seq stream retrieved using Bio::DB::GenBank  
passes through Bio::SeqIO while Bio::DB::EUtilities saves the raw seq  
record directly to a file (or callback or HTTP::Response) for  
optionally parsing later.

If you have problems with Bio::SeqIO you can always use  
Bio::DB::EUtilities to get around the issue until we resolve it.

chris

On Jul 10, 2007, at 9:15 AM, Diogo Tschoeke wrote:

> Deal All,
> I use this script bellow, and it`s work very fine!
> I only changed the query! And the script gave me the 5133 EST from T.
> brucei.
>
> ###################################################################### 
> ###########
> use strict;
> use warnings;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $query = Bio::DB::Query::GenBank->new
>                                 (-query   =>'gbdiv est[prop] AND  
> Trypanosoma
> brucei [organism]',
>                                 db      => 'nucleotide');
> my $gb = new Bio::DB::GenBank;
> my $seqio = $gb->get_Stream_by_query($query);
>
> my $out = Bio::SeqIO->new(-format => 'Genbank',
>                           -file => '>>Tbrucei.EST.fasta');
> while (my $seq = $seqio->next_seq){
>          $out->write_seq($seq);
>                         }
> ####################################################################
>
> Diogo Tschoeke/Fiocruz (Alberto`s Student)
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hartzell at alerce.com  Tue Jul 10 16:50:31 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 12:50:31 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
Message-ID: <18067.47319.254632.538811@almost.alerce.com>

Jason Stajich writes:
 > [...]
 > Do you know how to have svn commit messages generate summary emails  
 > as well?

I've made a local installation of the SVN::Notify bits in my home
directory and set up its notification script.  If folks are happy with
it then I'll work on getting The Powers That Be to do a real install
and we'll use it for the real repository.

It's currently configured to include diffs inline in the message.  I
prefer them as an attachment, but the current configuration of the
bioperl-guts-l list stalls messages w/ attachments and requires admin
intervention.  I have a support@ request going on it and will change
it if/when we get the issue resolved.

So, to review:

   svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/

is the top of the repository and

   svn co svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/bioperl-live/trunk 

will get you the main branch of bioperl-live.

Remember that the repository is transient, don't put anything
important in there....

Have at it, but remember that the entire world will see your commit
messages.

g.


From xing.y.hu at gmail.com  Tue Jul 10 17:08:35 2007
From: xing.y.hu at gmail.com (Xing Hu)
Date: Wed, 11 Jul 2007 01:08:35 +0800
Subject: [Bioperl-l] How to download EST files via bioperl script?
In-Reply-To: <46939085.40906@ioc.fiocruz.br>
References: <4691D520.60700@gmail.com>	<4692292A.1080900@atgc.org>	<7CC3390F-A5F5-4832-9E3E-E2B96018D31D@uiuc.edu>	<469389C0.5060303@gmail.com>
	<46939085.40906@ioc.fiocruz.br>
Message-ID: <4693BD13.2070509@gmail.com>

Hi Alberto,

Yes, I know that there is only choice for showing no more than 500 
entries on the NCBI website. However, I completely ignored that (doesn't 
mean that I have not seen that), and pulled down the "send to" and chose 
"file". Then a small window popped up, after saying yes to that, the 
downloading started. You might ask me how I know that it was not a batch 
of only 5 (default selection) or 500 ESTs? To be honest, I don't know at 
the first time. But the download has accumulated to millions bytes since 
then(due to my bad network condition, I have no idea when it will reach 
the end), and that doesn't look like a little batch of ESTs less than 
one thousand. Actually, I wrote a script to count the sequences within 
the temporary file and got a number much bigger than ten thousand. So I 
guess it works.

BTW, I never thought Bio::DB::Genbank can do that! Again, thanks you guys!

Xing


Alberto Davila wrote:
> Hi Xing,
>
> Unfortunately that did not work for me... there are 5133 T. brucei ESTs 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5691[Organism:exp]&cmd=Search&db=nucest&QueryKey=8) 
> and 13971 from T. cruzi 
> (http://www.ncbi.nlm.nih.gov/sites/entrez?term=txid5693[Organism:exp]&cmd=Search&db=nucest&QueryKey=11) 
>   that I cannot download at once in GenBank format... even when I select 
> "GenBank" format in the Display menu I can only see and get/download 500 
> ESTs each time...
>
> I also downloaded all ESTs from GenBank (a pity there are not subsets of 
> them !) but merging all them generate a file bigger than 120GB to be 
> processed...
>
> Just asked Diogo (my student) to give a try to the script sent by Chris 
> Fields.. so finger crossed ;-)
>
> Cheers, Alberto
>
>
> Xing Hu wrote:
>   
>> Thanks you guys.
>>
>> I had to confess that how stupid I was. The easiest way seems to be the 
>> way using NCBI Taxonomy Browser which suggested by alex. As a matter of 
>> fact, I knew that but I thought it was necessary to have all items 
>> selected before pressing save to launch download. So I was desperate to 
>> find a button that could achieve that without hundreds of thousands of 
>> clicking by me. "What about select none of those items at all?" -- This 
>> idea finally came to me after days of struggling and the problem was solved.
>>
>> Xing
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> Caveat: if you have millions of ESTs please consider NOT using my 
>>> eutil script below or NCBI Batch Entrez, which would repeatedly hit 
>>> the NCBI server thousands of times.  At least try looking for other 
>>> ways to retrieve the data you want (ftp, organism-specific resources 
>>> like Ensembl, so on), or run any scripts or data retrieval in off 
>>> hours so you don't overtax the NCBI server.
>>>
>>> There is a way you can use BioPerl if you don't mind living on the 
>>> bleeding edge by using bioperl-live (core code from CVS).  I have been 
>>> working on a set of modules for the last year (Bio::DB::EUtilities) 
>>> which interact with all the various eutils for building data pipelines 
>>> which uses the NCBI CGI interface.  You could possibly retrieve all 
>>> relevant ESTs using a variation of the example script here:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#esearch-.3Eefetch
>>>
>>> Note that the code examples do NOT work with rel. 1.5.2 code as the 
>>> API has changed quite a bit; I'm working to rectify some of that.
>>>
>>> The script I would use is below.  It retrieves batches of 500 
>>> sequences (in fasta format) at a time, for a total of 10000 max seq 
>>> records, saving the raw record data directly to a file (appending as 
>>> you go along).  I added an eval block to check the server status and 
>>> redo the call up to 4 times before giving up completely.  Using eval 
>>> this way hasn't been extensively tested but should work.
>>>
>>> ---------------------------------------
>>>
>>> use Bio::DB::EUtilities;
>>>
>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                        -db => 'nucest',
>>>                                        -term => 'txid3702',
>>>                                        -usehistory => 'y',
>>>                                        -keep_histories => 1);
>>>
>>> my $count = $factory->get_count;
>>>
>>> print "Count: $count\n";
>>>
>>> if (my $hist = $factory->next_History) {
>>>     print "History returned\n";
>>>     # note db carries over from above
>>>     $factory->set_parameters(-eutil => 'efetch',
>>>                              -rettype => 'fasta',
>>>                              -history => $hist);
>>>     my ($retmax, $retstart) = (500,0);
>>>     my $retry = 1;
>>>     my $maxcount = $count < 10000 ? $count : 10000; # set max # seq 
>>> records to return
>>>     RETRIEVE_SEQS:
>>>     while ($retstart < $maxcount) {
>>>         print "Returning from ",$retstart+1," to 
>>> ",$retstart+$retmax,"\n";
>>>         $factory->set_parameters(-retmax => $retmax,
>>>                                 -retstart => $retstart);
>>>         # check in case of server error
>>>         eval{
>>>             $factory->get_Response(-file => ">>ESTs.fas");
>>>         };
>>>         if ($@) {
>>>             die "Server error: $@.  Try again later" if $retry == 5;
>>>             print STDERR "Server error, redo #$retry\n";
>>>             $retry++ && redo RETRIEVE_SEQS;
>>>         }
>>>         $retstart += $retmax;
>>>     }
>>> }
>>>
>>>
>>> ---------------------------------------
>>>
>>>
>>> chris
>>>
>>> On Jul 9, 2007, at 7:25 AM, Alexander Kozik wrote:
>>>
>>>       
>>>> To download genomic sequences or ESTs for any organism (in various
>>>> formats) you can use NCBI Taxonomy Browser:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/
>>>>
>>>> you can use taxonomy id to access different organisms, Arabidopsis for
>>>> example (3702):
>>>> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid3702 
>>>>
>>>>
>>>> or by direct web link:
>>>> http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&name=Arabidopsis+thaliana&lvl=0&srchmode=1 
>>>>
>>>>
>>>> assembled genomes can be accessed via ftp:
>>>> ftp://ftp.ncbi.nih.gov/genomes/
>>>>
>>>> To download large amount of selected sequences (ESTs for example) you
>>>> can use batch Entrez:
>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html
>>>> http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Nucleotide
>>>> (select EST for EST, it's critical)
>>>>
>>>> It seems, to solve the problem you describe, you don't need to use
>>>> bioperl. NCBI GenBank Entrez provides all necessary tools to work on
>>>> these simple and frequent tasks.
>>>>
>>>> -Alex
>>>>
>>>> --Alexander Kozik
>>>> Bioinformatics Specialist
>>>> Genome and Biomedical Sciences Facility
>>>> 451 East Health Sciences Drive
>>>> University of California
>>>> Davis, CA 95616-8816
>>>> Phone: (530) 754-9127
>>>> email#1: akozik at atgc.org
>>>> email#2: akozik at gmail.com
>>>> web: http://www.atgc.org/
>>>>
>>>>
>>>>
>>>> Xing Hu wrote:
>>>>         
>>>>> Hi friends,
>>>>>
>>>>>     I wrote a script for getting genomic sequence file from GenBank. To
>>>>> fulfill that target, I used DB::GenBank module to get the sequence via
>>>>> get_Seq_by_acc, and it works well. But this time, facing enormous 
>>>>> amount
>>>>> of ESTs, I have no idea how to download them swiftly and elegantly.
>>>>>
>>>>>     PROBLEM DESCRIPTION:
>>>>>     goal: download all EST files of a specific species from GenBank, 
>>>>> say
>>>>> Arabidopsis Thaliana or Oryza sativa(rice).
>>>>>     other: whether all of ESTs are in a single file or separatedly
>>>>> placed does not matter.
>>>>>
>>>>>     Can I use a bioperl script to achieve that? And How? I really
>>>>> appreciate.
>>>>>
>>>>> Xing.
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>           
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>         
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>       
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From bix at sendu.me.uk  Tue Jul 10 17:14:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Jul 2007 18:14:29 +0100
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
Message-ID: <4693BE75.4090005@sendu.me.uk>

George Hartzell wrote:
> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails  
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.

Can I put a vote in that you don't? I search through email body text in 
my archive of guts to find certain diffs, so really like the diffs inline.

Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
in the subject? Seems redundant and makes it harder to see what was 
changed in a small email client window.


From aaron.j.mackey at gsk.com  Tue Jul 10 17:20:15 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 10 Jul 2007 13:20:15 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.47319.254632.538811@almost.alerce.com>
Message-ID: <OF37443F52.13AE1143-ON85257314.005D5FF0-85257314.005F432E@gsk.com>

George, this is all very nice to finally have, thank you for your efforts!

Any chance that the diff-as-attachment vs. diffs-inline question can be 
different for each subscriber?  The utility of the "guts" mailing list (to 
me) is that it's an encyclopedia of browsable, skimmable, and searchable 
diffs, not just a date-stamped record of diffs (if so, why provide an 
attachment at all, just provide a URL to the diff in the respository).

Thanks again,

-Aaron


bioperl-l-bounces at lists.open-bio.org wrote on 07/10/2007 12:50:31 PM:

> Jason Stajich writes:
>  > [...]
>  > Do you know how to have svn commit messages generate summary emails 
>  > as well?
> 
> I've made a local installation of the SVN::Notify bits in my home
> directory and set up its notification script.  If folks are happy with
> it then I'll work on getting The Powers That Be to do a real install
> and we'll use it for the real repository.
> 
> It's currently configured to include diffs inline in the message.  I
> prefer them as an attachment, but the current configuration of the
> bioperl-guts-l list stalls messages w/ attachments and requires admin
> intervention.  I have a support@ request going on it and will change
> it if/when we get the issue resolved.
> 
> So, to review:
> 
>    svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2/
> 
> is the top of the repository and
> 
>    svn co svn+ssh://dev.open-bio.
> org/home/hartzell/bioperl_take2/bioperl-live/trunk 
> 
> will get you the main branch of bioperl-live.
> 
> Remember that the repository is transient, don't put anything
> important in there....
> 
> Have at it, but remember that the entire world will see your commit
> messages.
> 
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Tue Jul 10 18:18:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Jul 2007 13:18:07 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>


On Jul 10, 2007, at 12:14 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Jason Stajich writes:
>>> [...]
>>> Do you know how to have svn commit messages generate summary emails
>>> as well?
>>
>> I've made a local installation of the SVN::Notify bits in my home
>> directory and set up its notification script.  If folks are happy  
>> with
>> it then I'll work on getting The Powers That Be to do a real install
>> and we'll use it for the real repository.
>>
>> It's currently configured to include diffs inline in the message.  I
>> prefer them as an attachment, but the current configuration of the
>> bioperl-guts-l list stalls messages w/ attachments and requires admin
>> intervention.  I have a support@ request going on it and will change
>> it if/when we get the issue resolved.
>
> Can I put a vote in that you don't? I search through email body  
> text in
> my archive of guts to find certain diffs, so really like the diffs  
> inline.
>
> Also, is there any way to get rid of the 'bioperl' in [bioperl  
> revision]
> in the subject? Seems redundant and makes it harder to see what was
> changed in a small email client window.

Agree on both counts; the devs have gotten used to seeing the diffs  
inline.

We prob. need to schedule a specific day/time when the switchover  
would take place so we can announce (so everyone knows and no one can  
gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
some tools a while ago...

chris


From hartzell at alerce.com  Tue Jul 10 20:09:09 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:09:09 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <4693BE75.4090005@sendu.me.uk>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
Message-ID: <18067.59237.519166.454578@almost.alerce.com>

Sendu Bala writes:
 > George Hartzell wrote:
 > > Jason Stajich writes:
 > >  > [...]
 > >  > Do you know how to have svn commit messages generate summary emails  
 > >  > as well?
 > > 
 > > I've made a local installation of the SVN::Notify bits in my home
 > > directory and set up its notification script.  If folks are happy with
 > > it then I'll work on getting The Powers That Be to do a real install
 > > and we'll use it for the real repository.
 > > 
 > > It's currently configured to include diffs inline in the message.  I
 > > prefer them as an attachment, but the current configuration of the
 > > bioperl-guts-l list stalls messages w/ attachments and requires admin
 > > intervention.  I have a support@ request going on it and will change
 > > it if/when we get the issue resolved.
 > 
 > Can I put a vote in that you don't? I search through email body text in 
 > my archive of guts to find certain diffs, so really like the diffs inline.

Ok, three votes against attachments.  Anyone want to vote in support,
otherwise I'll just leave 'em inline.

 > Also, is there any way to get rid of the 'bioperl' in [bioperl revision] 
 > in the subject? Seems redundant and makes it harder to see what was 
 > changed in a small email client window.

Sure.  The default's just [RevisionNumber].  Does that work for folk?

g.


From hartzell at alerce.com  Tue Jul 10 20:11:36 2007
From: hartzell at alerce.com (George Hartzell)
Date: Tue, 10 Jul 2007 16:11:36 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
Message-ID: <18067.59384.247108.463648@almost.alerce.com>

Chris Fields writes:
 > [...]
 > We prob. need to schedule a specific day/time when the switchover  
 > would take place so we can announce (so everyone knows and no one can  
 > gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out  
 > some tools a while ago...

I haven't done anything about it.

I think that we also need to have some input from the admin/support
folk about access methods (https, etc...).

Are we going to want to mirror the repository anywhere?

g.


From hartzell at alerce.com  Wed Jul 11 13:17:08 2007
From: hartzell at alerce.com (George Hartzell)
Date: Wed, 11 Jul 2007 09:17:08 -0400
Subject: [Bioperl-l] extra hook functionality for svn repos?
Message-ID: <18068.55380.626778.486775@almost.alerce.com>


There are a bunch of "contributed" hook scripts at

  http://subversion.tigris.org/tools_contrib.html#hook_scripts

Given that many bioperl users depend on case-preserving but
case-insensitive file systems, I'm wondering if hooking up the
case-insensitive.py script might be worthwhile.

Likewise, the check-mime-type.pl script might help us keep
svn:mime-type and svn:eol-style properties up to date.

There are others there, but none that I found interesting.

How big-brother do we want the repository to be?

g.


From cjfields at uiuc.edu  Wed Jul 11 13:40:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Jul 2007 08:40:54 -0500
Subject: [Bioperl-l] extra hook functionality for svn repos?
In-Reply-To: <18068.55380.626778.486775@almost.alerce.com>
References: <18068.55380.626778.486775@almost.alerce.com>
Message-ID: <A13F608F-16FA-4432-AA2F-83674E3A73F4@uiuc.edu>


On Jul 11, 2007, at 8:17 AM, George Hartzell wrote:

>
> There are a bunch of "contributed" hook scripts at
>
>   http://subversion.tigris.org/tools_contrib.html#hook_scripts
>
> Given that many bioperl users depend on case-preserving but
> case-insensitive file systems, I'm wondering if hooking up the
> case-insensitive.py script might be worthwhile.

I'm not sure how often we run into this, though.  Anyone know?

> Likewise, the check-mime-type.pl script might help us keep
> svn:mime-type and svn:eol-style properties up to date.

The latter two might be nice.  I thought we planned on defaulting to  
a simple 'plain text' mime type on commits if it isn't specifically  
predefined, but maybe this way is better?

> There are others there, but none that I found interesting.
>
> How big-brother do we want the repository to be?
>
> g.

'Friendly' big-brother, not 'dystopian' big-brother.

chris


From marian.thieme at lycos.de  Wed Jul 11 09:05:18 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 09:05:18 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178019848@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/eec1aa42/attachment-0004.html>

From dmessina at wustl.edu  Wed Jul 11 20:14:17 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 11 Jul 2007 15:14:17 -0500
Subject: [Bioperl-l] submitting code
In-Reply-To: <188661178019848@lycos-europe.com>
References: <188661178019848@lycos-europe.com>
Message-ID: <4DF90B9A-7FFA-4867-B5D3-E6F05EC84BBC@wustl.edu>

Hi Marian,

Thanks so much for contributing! The best way would be to create a  
Bugzilla ticket and then attach the code to that ticket. One of the  
developers will check it in and give you feedback if there are any  
little tweaks that would be helpful*.

Would you be able to include documentation and test cases with your  
module?

Dave


* For more info:
http://www.bioperl.org/wiki/FAQ#I. 
27ve_got_an_idea_for_a_module_how_do_I_contribute_it.3F
http://www.bioperl.org/wiki/Developer_Information
http://www.bioperl.org/wiki/Becoming_a_developer
http://bioperl.org/pipermail/bioperl-l/2003-February/011226.html


--
Dave Messina
Senior Analyst, Assembly Group
Genome Sequencing Center
Washington University
St. Louis, MO


From marian.thieme at lycos.de  Wed Jul 11 15:12:20 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Jul 2007 15:12:20 +0000
Subject: [Bioperl-l] submitting code
Message-ID: <188661178030343@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070711/c95991b8/attachment-0004.html>

From e-just at northwestern.edu  Thu Jul 12 14:37:03 2007
From: e-just at northwestern.edu (Eric Just)
Date: Thu, 12 Jul 2007 09:37:03 -0500
Subject: [Bioperl-l] Job opening in Chicago
Message-ID: <fa1fe35c0707120737i71c6c26fq7635e350da9bf23f@mail.gmail.com>

Hello everyone,

We have an opening at dictyBase (Northwestern University in Chicago)
for a Bioinformatics Software Engineer.  This job involves writing and
maintaining software for a genome database using Chado/OO-Perl/Bioperl
and many other state of the art technologies.

For more information please see:
http://dictybase.org/dictybase_jobs.htm

Thanks,
Eric


From cjfields at uiuc.edu  Thu Jul 12 16:09:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Jul 2007 11:09:02 -0500
Subject: [Bioperl-l] DB::SeqFeature::Store::GFF3Loader question
Message-ID: <A8310D54-F800-43BE-B6C3-3879206CE697@uiuc.edu>

I have been running into some GFF formatting issues where the  
attributes column is left undef (no '.'), which causes  
GFF3Loader::parse_attributes() to complain with an 'use of undefined  
string with split' warning.  Would it be okay with the powers that be  
(Scott, Lincoln) to add a warning or exception there?  I'm guessing a  
warning is better in this case, as just returning works fine.

chris


From jason at bioperl.org  Fri Jul 13 17:30:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 13:30:05 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18067.59384.247108.463648@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
Message-ID: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>

I'll try and look into this and other stuff with the migration in  
next week or so - maybe we'll make some time to talk it through  
during BOSC.  I don't know yet when I'll actually have time to think  
about it properly.

I am still worried about doing https because of the current system we  
have supporting user logins and that we didn't want to run a web  
server on the main repository machine and we'll have to install DAV  
on the main repository machine.  if ssh+svn is going to be sufficient  
hurdle for people, note it was already a hurdle for them with CVS,  
but we'll have to think a bit more on it.

We might be able to do some sort of NFS (or other exported FS) but  
exported to the webserver machine but that is may be a recipe for  
disaster.

-jason
On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:

> Chris Fields writes:
>> [...]
>> We prob. need to schedule a specific day/time when the switchover
>> would take place so we can announce (so everyone knows and no one can
>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>> some tools a while ago...
>
> I haven't done anything about it.
>
> I think that we also need to have some input from the admin/support
> folk about access methods (https, etc...).
>
> Are we going to want to mirror the repository anywhere?
>
> g.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From cjfields at uiuc.edu  Fri Jul 13 18:29:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 13:29:22 -0500
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <5F5EB9B6-11AF-4D20-95B1-EBBD40A98962@uiuc.edu>

I don't think there's a huge rush on this since BOSC is imminent. If  
devs really want https then we can try adding it after migration, but  
if it becomes too much of a headache (particularly for the web  
admins) I wouldn't worry about it.

chris

On Jul 13, 2007, at 12:30 PM, Jason Stajich wrote:

> I'll try and look into this and other stuff with the migration in
> next week or so - maybe we'll make some time to talk it through
> during BOSC.  I don't know yet when I'll actually have time to think
> about it properly.
>
> I am still worried about doing https because of the current system we
> have supporting user logins and that we didn't want to run a web
> server on the main repository machine and we'll have to install DAV
> on the main repository machine.  if ssh+svn is going to be sufficient
> hurdle for people, note it was already a hurdle for them with CVS,
> but we'll have to think a bit more on it.
>
> We might be able to do some sort of NFS (or other exported FS) but
> exported to the webserver machine but that is may be a recipe for
> disaster.
>
> -jason
> On Jul 10, 2007, at 4:11 PM, George Hartzell wrote:
>
>> Chris Fields writes:
>>> [...]
>>> We prob. need to schedule a specific day/time when the switchover
>>> would take place so we can announce (so everyone knows and no one  
>>> can
>>> gripe).  Did we ever resolve the svn->cvs issue?  Jason pointed out
>>> some tools a while ago...
>>
>> I haven't done anything about it.
>>
>> I think that we also need to have some input from the admin/support
>> folk about access methods (https, etc...).
>>
>> Are we going to want to mirror the repository anywhere?
>>
>> g.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sheris at eps.berkeley.edu  Fri Jul 13 18:42:32 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Fri, 13 Jul 2007 11:42:32 -0700
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
Message-ID: <200707131142.32366.sheris@eps.berkeley.edu>

Hi,
I have a collection of sequencing reads aligned with a consensus sequence that 
I input into a Bio::PopGen::Population object in order to calculate allele 
frequencies. The consensus sequence is included to force clustalw to give a 
better alignment. However,  I need to remove the consensus sequence before 
calculating allele frequencies in the individual reads. I'm having trouble 
with this part of it. I get the following error message:

"Can't locate object method "person_id" via package "Bio::PopGen::Individual" 		
at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line 49."

Here is the code snippet producing the error. $pop is a 
Bio::PopGen::Population object.

	my @consensus = "gene_consensus";
	$pop->remove_Individuals(@consensus);

I also tried:
	my @consensus = $pop->get_Individuals(-unique_id => "gene_consensus"); 
	$pop->remove_Individuals(@consensus);

which produced the same error. Can anyone send me in the right direction? I 
suspect this is a simple problem.

Sheri

-- 
Sheri Simmons
Department of Earth and Planetary Sciences
University of California, Berkeley
Berkeley, CA 94720-4767


From jason at bioperl.org  Fri Jul 13 20:17:31 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 16:17:31 -0400
Subject: [Bioperl-l] Problem with Bio::PopGen::Individual
In-Reply-To: <200707131142.32366.sheris@eps.berkeley.edu>
References: <200707131142.32366.sheris@eps.berkeley.edu>
Message-ID: <99A3513A-7DBE-4C89-B38B-8C2B76B0E14F@bioperl.org>

Hi Sheri -

Shoot - that was my fault - bug in the code where I was only using  
"Person" not Individuals for the code when I was testing.

I've commited a bugfix to CVS - do you need me to send you the  
updated file or are you comfortable grabbing the code from CVS or  
http://code.open-bio.org

This is the change - you may have a different version of BioPerl than  
what is in CVS so you may have to make the changes on line 260 rather  
than 282 -- or you can upgrade to latest code via CVS (although this  
is probably harder for you since you've got stuff installed in /usr/ 
share)':

RCS file: /home/repository/bioperl/bioperl-live/Bio/PopGen/ 
Population.pm,v
retrieving revision 1.22
diff -r1.22 Population.pm
282c282
<       unshift @tosplice, $i if( $namehash{$ind->person_id} );
---
 >       unshift @tosplice, $i if( $namehash{$ind->unique_id} );

-jason
On Jul 13, 2007, at 2:42 PM, Sheri Simmons wrote:

> Hi,
> I have a collection of sequencing reads aligned with a consensus  
> sequence that
> I input into a Bio::PopGen::Population object in order to calculate  
> allele
> frequencies. The consensus sequence is included to force clustalw  
> to give a
> better alignment. However,  I need to remove the consensus sequence  
> before
> calculating allele frequencies in the individual reads. I'm having  
> trouble
> with this part of it. I get the following error message:
>
> "Can't locate object method "person_id" via package  
> "Bio::PopGen::Individual" 		
> at /usr/share/perl5/Bio/PopGen/Population.pm line 260, <GEN0> line  
> 49."
>
> Here is the code snippet producing the error. $pop is a
> Bio::PopGen::Population object.
>
> 	my @consensus = "gene_consensus";
> 	$pop->remove_Individuals(@consensus);
>
> I also tried:
> 	my @consensus = $pop->get_Individuals(-unique_id =>  
> "gene_consensus");
> 	$pop->remove_Individuals(@consensus);
>
> which produced the same error. Can anyone send me in the right  
> direction? I
> suspect this is a simple problem.
>
> Sheri
>
> -- 
> Sheri Simmons
> Department of Earth and Planetary Sciences
> University of California, Berkeley
> Berkeley, CA 94720-4767
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From hartzell at alerce.com  Fri Jul 13 20:34:14 2007
From: hartzell at alerce.com (George Hartzell)
Date: Fri, 13 Jul 2007 16:34:14 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
Message-ID: <18071.57798.130368.703488@almost.alerce.com>

Jason Stajich writes:
 > I'll try and look into this and other stuff with the migration in  
 > next week or so - maybe we'll make some time to talk it through  
 > during BOSC.  I don't know yet when I'll actually have time to think  
 > about it properly.
 > 
 > I am still worried about doing https because of the current system we  
 > have supporting user logins and that we didn't want to run a web  
 > server on the main repository machine and we'll have to install DAV  
 > on the main repository machine.  if ssh+svn is going to be sufficient  
 > hurdle for people, note it was already a hurdle for them with CVS,  
 > but we'll have to think a bit more on it.
 > [...]

How are you thinking about providing anonymous readonly non-dev access
to the repository?  svn+ssh using an anonymous/guest account (can it
be screwed down tightly enough?)  svn-mirror the repo onto the public
machine and do DAV there w/out having to worry about authenticating
the devs?

g.


From jason at bioperl.org  Fri Jul 13 21:33:29 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Jul 2007 17:33:29 -0400
Subject: [Bioperl-l] Take 2 of the new subversion repository.
In-Reply-To: <18071.57798.130368.703488@almost.alerce.com>
References: <18054.63942.316904.413911@almost.alerce.com>
	<D8C71EF7-6E2E-498E-8638-373512ADE3EE@bioperl.org>
	<18067.47319.254632.538811@almost.alerce.com>
	<4693BE75.4090005@sendu.me.uk>
	<C022268F-4632-4027-ADBE-F842B3308E21@uiuc.edu>
	<18067.59384.247108.463648@almost.alerce.com>
	<58D2D775-F7AB-4E57-AC4E-861AA2DC4AEA@bioperl.org>
	<18071.57798.130368.703488@almost.alerce.com>
Message-ID: <5C42D957-BCCA-46B6-8121-3313CE4B0F2A@bioperl.org>


On Jul 13, 2007, at 4:34 PM, George Hartzell wrote:

> Jason Stajich writes:
>> I'll try and look into this and other stuff with the migration in
>> next week or so - maybe we'll make some time to talk it through
>> during BOSC.  I don't know yet when I'll actually have time to think
>> about it properly.
>>
>> I am still worried about doing https because of the current system we
>> have supporting user logins and that we didn't want to run a web
>> server on the main repository machine and we'll have to install DAV
>> on the main repository machine.  if ssh+svn is going to be sufficient
>> hurdle for people, note it was already a hurdle for them with CVS,
>> but we'll have to think a bit more on it.
>> [...]
>
> How are you thinking about providing anonymous readonly non-dev access
> to the repository?  svn+ssh using an anonymous/guest account (can it
> be screwed down tightly enough?)  svn-mirror the repo onto the public
> machine and do DAV there w/out having to worry about authenticating
> the devs?
>
We'll do svn on the public anonymous machine like we already do with  
CVS and with SVN

See:
http://code.open-bio.org
  AND
http://code.open-bio.org/svnweb/
See blipkit.

-jason
> g.
>
>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From scrosson at uchicago.edu  Fri Jul 13 22:15:30 2007
From: scrosson at uchicago.edu (Sean Crosson)
Date: Fri, 13 Jul 2007 22:15:30 +0000 (UTC)
Subject: [Bioperl-l] ace to fasta conversion
Message-ID: <loom.20070714T000856-94@post.gmane.org>

I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
and it works great.  We're now trying to convert a big (250 MB) .ace file to
fasta.  The documentation suggests I can do this, but everytime I run the script
below, it outputs an empty .fas file.  Does anyone have any suggestions on how
to make this script work?  Does SeqIO really convert between these file types? 
Thanks for your help.

#!/usr/bin/perl -w

use Bio::SeqIO;


$in  = Bio::SeqIO->new(-file => "454Contigs.ace",
                       -format => 'ace');
$out = Bio::SeqIO->new(-file => ">454Contigs.fas",
                       -format => 'fasta');
while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }


From cvillamar at gmail.com  Fri Jul 13 23:24:04 2007
From: cvillamar at gmail.com (Carlos Villacorta)
Date: Fri, 13 Jul 2007 16:24:04 -0700
Subject: [Bioperl-l] beginner problem with fasta headers
Message-ID: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>

hi all,
I have a embl sequence file, when formatting to fasta with Seqio it
gives a long string header for each sequence that my following
phylogenetic software cannot handle...
Does anyone knows how to format those embl or genbank files to fasta
but retrieving in the headers just two or three fields (e.g. id | gene
| sp_name)?
Any advice with this problem would be very appreciated, thanks!


From j_martin at lbl.gov  Sat Jul 14 00:05:45 2007
From: j_martin at lbl.gov (Joel Martin)
Date: Fri, 13 Jul 2007 17:05:45 -0700
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <loom.20070714T000856-94@post.gmane.org>
References: <loom.20070714T000856-94@post.gmane.org>
Message-ID: <20070714000544.GB29841@eniac.jgi-psf.org>

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

On Fri, Jul 13, 2007 at 10:15:30PM +0000, Sean Crosson wrote:
> I've been using this Bio::SeqIO script to convert EMBL and swissprot to fasta
> and it works great.  We're now trying to convert a big (250 MB) .ace file to
> fasta.  The documentation suggests I can do this, but everytime I run the script
> below, it outputs an empty .fas file.  Does anyone have any suggestions on how
> to make this script work?  Does SeqIO really convert between these file types? 
> Thanks for your help.
> 
> #!/usr/bin/perl -w
> 
> use Bio::SeqIO;
> 
> 
> $in  = Bio::SeqIO->new(-file => "454Contigs.ace",
>                        -format => 'ace');
> $out = Bio::SeqIO->new(-file => ">454Contigs.fas",
>                        -format => 'fasta');
> while ( $seq = $in->next_seq() ) {$out->write_seq($seq); }
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Sat Jul 14 04:06:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 13 Jul 2007 23:06:27 -0500
Subject: [Bioperl-l] beginner problem with fasta headers
In-Reply-To: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
References: <60199ee40707131624y6ea2c896r8a32f9bef04121e9@mail.gmail.com>
Message-ID: <0089195A-4935-49F2-A8E7-C1F9B8A34D4E@uiuc.edu>

Some reading material...

http://www.bioperl.org/wiki/ 
FAQ#Accession_numbers_are_not_present_for_FASTA_sequence_files
http://www.bioperl.org/wiki/ 
FAQ#I_would_like_to_make_my_own_custom_fasta_header_- 
_how_do_I_do_this.3F
http://www.bioperl.org/wiki/FASTA_sequence_format#Note

Quiz on Monday!

chris

On Jul 13, 2007, at 6:24 PM, Carlos Villacorta wrote:

> hi all,
> I have a embl sequence file, when formatting to fasta with Seqio it
> gives a long string header for each sequence that my following
> phylogenetic software cannot handle...
> Does anyone knows how to format those embl or genbank files to fasta
> but retrieving in the headers just two or three fields (e.g. id | gene
> | sp_name)?
> Any advice with this problem would be very appreciated, thanks!
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From scrosson at uchicago.edu  Sat Jul 14 03:43:59 2007
From: scrosson at uchicago.edu (scrosson)
Date: Fri, 13 Jul 2007 20:43:59 -0700 (PDT)
Subject: [Bioperl-l] ace to fasta conversion
In-Reply-To: <20070714000544.GB29841@eniac.jgi-psf.org>
References: <loom.20070714T000856-94@post.gmane.org>
	<20070714000544.GB29841@eniac.jgi-psf.org>
Message-ID: <11590811.post@talk.nabble.com>


This problem now makes sense.  I've been playing with Bio::Assembly::IO,
which does indeed read phrap .ace files.  Does anyone have an idea how to
pull the assembled contigs out of a Bio::Assembly object and write them out
as multi-fasta (or strings for that matter)?  None of our workstations are
running phrap/consed and I'd love to see these contigs.

Sean 
       

Hello,
	the SeqIO 'ace' is an AceDB file, the 454*ace you are trying to use
is a phrap/consed ace file.  They aren't related at all. You might try
poking
around in Bio::AssemblyIO which should read assembly ace files.

Joel

-- 
View this message in context: http://www.nabble.com/ace-to-fasta-conversion-tf4077370.html#a11590811
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bioperlanand at yahoo.com  Sat Jul 14 17:55:53 2007
From: bioperlanand at yahoo.com (Anand Venkatraman)
Date: Sat, 14 Jul 2007 10:55:53 -0700 (PDT)
Subject: [Bioperl-l] a question on obtain PDB records using bioperl
Message-ID: <798126.17426.qm@web36804.mail.mud.yahoo.com>

Hi everybody,

Is there a method in Bioperl to obtain PDB record(s) on the fly, i.e. something similar to Bio:Perl methods to retrieve EMBL or GenBank records.

Thanks in advance,

Anand

       
---------------------------------
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. 


From johnsonm at gmail.com  Tue Jul 17 18:23:58 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 17 Jul 2007 13:23:58 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
Message-ID: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>

I'm tinkering with parsing iprscan reports with BioPerl.  I noticed that this:

  my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format => 'interpro');

  while (my $seq = $seqio->next_seq()) {
      ...
  }

Does not work unless I first 'use XML::DOM::XPath'.  I get this error:

  Can't locate object method "findnodes" via package
"XML::DOM::Document" at
bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
30.

I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
suck in XML::DOM::Xpath.  I see that t/interpro.t requires
XML::DOM::XPath:

test_begin(-tests => 17,
                -requires_module => 'XML::DOM::XPath');

Is suppose the reason the test specs a require XML::DOM::XPath is so
that tests can be skipped if XML::DOM::XPath is not available.
Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?


From sac at bioperl.org  Tue Jul 17 19:49:32 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 17 Jul 2007 12:49:32 -0700
Subject: [Bioperl-l] Ohloh account for bioperl
Message-ID: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>

I came across a web app that tracks various metrics for open source
projects, noticed that bioperl wasn't listed, and added it:

http://www.ohloh.net/projects/6685

Seems like an interesting resource that could help add some
visibility. It creates metrics by directly processing the source code
repository. I hooked it up to the CVS repos for bioperl-live, -db,
-run, and -pipeline. It has yet to do its analysis at this point.

Feel free to create Ohloh accounts for yourselves. When you add
yourself as a contributor to Bioperl, you can indicate the username
associated with your commits, but this requires that it first process
the commit logs to figure out what the usernames are. You can still
create an account, just update it later with your username.

Steve


From cjfields at uiuc.edu  Tue Jul 17 21:04:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Jul 2007 16:04:44 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
Message-ID: <5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>


On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:

> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed  
> that this:
>
>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>  
> 'interpro');
>
>   while (my $seq = $seqio->next_seq()) {
>       ...
>   }
>
> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
>
>   Can't locate object method "findnodes" via package
> "XML::DOM::Document" at
> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> 30.
>
> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> XML::DOM::XPath:
>
> test_begin(-tests => 17,
>                 -requires_module => 'XML::DOM::XPath');
>
> Is suppose the reason the test specs a require XML::DOM::XPath is so
> that tests can be skipped if XML::DOM::XPath is not available.
> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?

You're right; I think tests passed b/c XML::DOM::XPath (if present),  
was eval'd as a required module.  When I commented out the spot where  
it is eval'd in the test suite I can replicate this error.  I have  
added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it  
passes fine.

Thanks for the heads up!

chris


From xianranli78 at yahoo.com.cn  Wed Jul 18 05:55:19 2007
From: xianranli78 at yahoo.com.cn (Xianran Li)
Date: Wed, 18 Jul 2007 13:55:19 +0800
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from gff3 file
Message-ID: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>

Hi,

I want to extract some infomation  from the gff3 file like:

12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
   
The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?

Thanks for your help.


Xianran Li


From georg.otto at tuebingen.mpg.de  Wed Jul 18 09:32:26 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Wed, 18 Jul 2007 11:32:26 +0200
Subject: [Bioperl-l] run megablast
Message-ID: <m1r6n66or9.fsf@tuebingen.mpg.de>


Hi,

is there a module to run megablast in a script (equivalent to ncbi
blast in StandAloneBlast.pm)?

Cheers,

Georg


From jeevitesh at ibab.ac.in  Wed Jul 18 10:03:24 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 15:33:24 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <47819.192.168.1.125.1184753004.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 07:15:33 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 12:45:33 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <55933.192.168.1.125.1184742933.squirrel@webmail.ibab.ac.in>

Hi Friends,

we need your valuable help in finding the SHARED PATH BETWEEN TWO NODES OF A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES.

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From jeevitesh at ibab.ac.in  Wed Jul 18 08:45:50 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Wed, 18 Jul 2007 14:15:50 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <43613.192.168.1.125.1184748350.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED DISTANCE BETWEEN TWO NODES as illustrated in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

we need to find the shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From cain.cshl at gmail.com  Wed Jul 18 13:10:40 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 09:10:40 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from
	gff3	file
In-Reply-To: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
Message-ID: <1184764240.2570.31.camel@localhost.localdomain>

Hi Xianran Li,

Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
as Bio::DB::GFF3), then you can use the attributes method to get
anything in the ninth column:

  my ($name) = $gene->attributes('Name');

The parenthesis are needed around $name because the attributes method
returns a list and the parens capture the first item of the list into
$name.

Scott


On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> Hi,
> 
> I want to extract some infomation  from the gff3 file like:
> 
> 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
>    
> The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> 
> Thanks for your help.
> 
> 
> Xianran Li
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/c66ec18b/attachment.sig>

From johnsonm at gmail.com  Wed Jul 18 20:53:00 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 18 Jul 2007 15:53:00 -0500
Subject: [Bioperl-l] Should Bio::SeqIO::interpro 'use XML::DOM::XPath'?
In-Reply-To: <469DB6C6.9010702@pasteur.fr>
References: <ebf5eb170707171123y33239dfey2dc39e4316c62cee@mail.gmail.com>
	<5BA80A44-4A88-46D8-B945-7BB4C82E308E@uiuc.edu>
	<469DB6C6.9010702@pasteur.fr>
Message-ID: <ebf5eb170707181352v4d59ec81kfb6f706ca4643cc7@mail.gmail.com>

The output from InterProScan, invoked thusly:

iprscan -cli -seqtype p -i input_file -o output_file -format xml

On 7/18/07, Emmanuel Quevillon <tuco at pasteur.fr> wrote:
> Hi guys,
>
> I read your email and I wondered which iprscan file you've
> been talking about? Is it the file produced by InterProScan
> or the file called match.xml representing the whole uniprot
> database against InterPro? Reading the xml parser
> implemented into Bio::SeqIO::interpro, I guess it is the
> second one?
> In such case, I just want to let you know that the xml
> schema changed and the file name also. It is now called
> match_complete.xml.
> I attached the DTD to be able to see the new structure.
> Here is an example of the new data representation.
>
>
> <protein id="A0A000" name="A0A000_9ACTO" length="394"
> crc64="F1DD0C1042811B48">
>      <match id="G3DSA:3.40.640.10"
> name="PyrdxlP-dep_Trfase_major_sub1" dbname="GENE3D"
> status="T" evd="HMMPfam">
>        <ipr id="IPR015421" name="Pyridoxal
> phosphate-dependent transferase, major region, subdomain 1"
> type="Domain" />
>        <lcn start="52" end="288" score="4.30000170645879E-75" />
>      </match>
>      <match id="PTHR13693:SF7" name="PTHR13693:SF7"
> dbname="PANTHER" status="T" evd="not_rel">
>        <lcn start="33" end="389" score="0.0" />
>      </match>
> </protein>
>
> As you can see some time there is no interpro info (no ipr
> element).
>
> I think it would be good to change also the interpro parser ?
>
> Regards
>
> Emmanuel
>
> Chris Fields wrote:
> > On Jul 17, 2007, at 1:23 PM, Mark Johnson wrote:
> >
> >> I'm tinkering with parsing iprscan reports with BioPerl.  I noticed
> >> that this:
> >>
> >>   my $seqio = Bio::SeqIO->new(-file => $iprscan_file, -format =>
> >> 'interpro');
> >>
> >>   while (my $seq = $seqio->next_seq()) {
> >>       ...
> >>   }
> >>
> >> Does not work unless I first 'use XML::DOM::XPath'.  I get this error:
> >>
> >>   Can't locate object method "findnodes" via package
> >> "XML::DOM::Document" at
> >> bioperl-cvs/bioperl-live//Bio/SeqIO/interpro.pm line 136, <GEN0> line
> >> 30.
> >>
> >> I see that Bio::SeqIO has 'use XML::DOM', but that doesn't seem to
> >> suck in XML::DOM::Xpath.  I see that t/interpro.t requires
> >> XML::DOM::XPath:
> >>
> >> test_begin(-tests => 17,
> >>                 -requires_module => 'XML::DOM::XPath');
> >>
> >> Is suppose the reason the test specs a require XML::DOM::XPath is so
> >> that tests can be skipped if XML::DOM::XPath is not available.
> >> Shouldn't Bio::SeqIO::interpro 'use XML::DOM::XPath', though?
> >
> > You're right; I think tests passed b/c XML::DOM::XPath (if present),
> > was eval'd as a required module.  When I commented out the spot where
> > it is eval'd in the test suite I can replicate this error.  I have
> > added 'use XML::DOM::XPath' to SeqIO::interpro now in CVS and it
> > passes fine.
> >
> > Thanks for the heads up!
> >
> > chris
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cain.cshl at gmail.com  Thu Jul 19 02:47:53 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Wed, 18 Jul 2007 22:47:53 -0400
Subject: [Bioperl-l] extract information with Bio::DB::GFF3 from	gff3
	file
In-Reply-To: <008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
References: <005001c7c900$3a403d60$ed07a8c0@BGI.LOCAL>
	<1184764240.2570.31.camel@localhost.localdomain>
	<008801c7c9ad$1fa5c030$ed07a8c0@BGI.LOCAL>
Message-ID: <1184813273.2570.96.camel@localhost.localdomain>

[Please always reply to the mailing list so that answers can archived]


Yes, because commas are not allowed in GFF3 in an unescaped form.
Essentially, you are doing this with your GFF3:

  Name=receptor kinase ORK10;Name= putative

and when you do this:

  my ($name) = $gene->attributes('Name');

you are getting the first item in the list of names, and I suspect which
one you get is random.

To fix it, you need to replace the comma with %2C (the URL escape code
for a comma).  If you generated this GFF3, you will need to add a step
to URI encode your attribute strings.  If you got it from someone else,
you should point out to them that their GFF is flawed.

Scott


On Thu, 2007-07-19 at 10:32 +0800, Xianran Li wrote:
> However, the $name return the string "putative" rather than "receptor kinase ORK10". Is any particular reason? 
> 
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Assuming $gene is a Bio::DB::GFF::Feature object (there is no such thing
> as Bio::DB::GFF3), then you can use the attributes method to get
> anything in the ninth column:
> 
>   my ($name) = $gene->attributes('Name');
> 
> The parenthesis are needed around $name because the attributes method
> returns a list and the parens capture the first item of the list into
> $name.
> 
> Scott
> 
> 
> On Wed, 2007-07-18 at 13:55 +0800, Xianran Li wrote:
> > Hi,
> > 
> > I want to extract some infomation  from the gff3 file like:
> > 
> > 12001 . gene 854759 857385 . - . ID=12001.t00153;Name=receptor kinase ORK10, putative
> >    
> > The gene position can be reterived as $gene->start, but how can I get the annotation infomatin (receptor kinase ORK10) ?
> > 
> > Thanks for your help.
> > 
> > 
> > Xianran Li
> ----- Original Message ----- 
> From: "Scott Cain" <cain.cshl at gmail.com>
> To: "Xianran Li" <xianranli78 at yahoo.com.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 18, 2007 9:10 PM
> Subject: Re: [Bioperl-l] extract information with Bio::DB::GFF3 fromgff3 file
> 
> 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l&#0;??i??'?????h??&
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070718/86cf671f/attachment.sig>

From acutter at eeb.utoronto.ca  Fri Jul 20 02:25:08 2007
From: acutter at eeb.utoronto.ca (Asher Cutter)
Date: Thu, 19 Jul 2007 22:25:08 -0400
Subject: [Bioperl-l] tree comparisons with bioperl
Message-ID: <46A01D04.5040209@eeb.utoronto.ca>

I was reading over the functions for working with trees in bioperl. I am 
looking for something that will compare two topologies and report back 
if they are equivalent. i.e. something like:

does ((a,(b,c)) == ((A,B),C) ? (in this case, no)

But of course in reality they would be more complicated topologies. This 
would be useful for simulating random trees to compare with some given 
topology of interest.

I saw the methods for testing for monophyly and paraphyly, but not much 
beyond that...perhaps I have missed something?

Any suggestions?

Thanks,
Asher

-- 

___________________________________
Asher D. Cutter
Assistant Professor
Department of Ecology & Evolutionary Biology
University of Toronto
25 Harbord St.
Toronto, ON, M5S 3G5

tel: 416-978-4602
email: acutter at eeb.utoronto.ca
http://www.eeb.utoronto.ca/faculty/faculty_profile.cfm?prof_id=130
___________________________________


From jeevitesh at ibab.ac.in  Fri Jul 20 04:25:22 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Fri, 20 Jul 2007 09:55:22 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <53244.192.168.1.125.1184905522.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO NODES as illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6

Any comment on this will be greatly appreciated.

With Thanks & regards
jeevitesh


From n.haigh at sheffield.ac.uk  Sun Jul 22 11:34:58 2007
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sun, 22 Jul 2007 12:34:58 +0100
Subject: [Bioperl-l] Ohloh account for bioperl
In-Reply-To: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
References: <8f200b4c0707171249n193c5d3fi2aa1cb8b6c102ef7@mail.gmail.com>
Message-ID: <46A340E2.4040505@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Chervitz wrote:
> I came across a web app that tracks various metrics for open source
> projects, noticed that bioperl wasn't listed, and added it:
> 
> http://www.ohloh.net/projects/6685
> 
> Seems like an interesting resource that could help add some
> visibility. It creates metrics by directly processing the source code
> repository. I hooked it up to the CVS repos for bioperl-live, -db,
> -run, and -pipeline. It has yet to do its analysis at this point.
> 
> Feel free to create Ohloh accounts for yourselves. When you add
> yourself as a contributor to Bioperl, you can indicate the username
> associated with your commits, but this requires that it first process
> the commit logs to figure out what the usernames are. You can still
> create an account, just update it later with your username.
> 
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Nice to see the graphs of number of commits each developer has made over
the last 5 years and how new developers have arisen while those more
"seasoned" developers can relax a little more -proof of an excellent
open source project!

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGo0Dih5z4PPfwHQoRAua4AJ9nxDJeqAZIbyv0M3g+6Y2xWzkEEgCgnHBO
4JWvG5Gy+H/UqpeXYAcSCX0=
=LrFt
-----END PGP SIGNATURE-----


From cjfields at uiuc.edu  Mon Jul 23 03:53:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 22 Jul 2007 22:53:48 -0500
Subject: [Bioperl-l] run megablast
In-Reply-To: <m1r6n66or9.fsf@tuebingen.mpg.de>
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
Message-ID: <1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>

StandAloneBlast runs the megablast executable directly, though I  
think you can specify a MegaBlast search using blastall with the '-n'  
flag.

We could probably add this functionality in fairly easily since  
SearchIO can parse megablast output; no one's had the need to code it  
yet.

chris

On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:

>
> Hi,
>
> is there a module to run megablast in a script (equivalent to ncbi
> blast in StandAloneBlast.pm)?
>
> Cheers,
>
> Georg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jeevitesh at ibab.ac.in  Mon Jul 23 10:34:36 2007
From: jeevitesh at ibab.ac.in (jeevitesh at ibab.ac.in)
Date: Mon, 23 Jul 2007 16:04:36 +0530 (IST)
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared Distance
Message-ID: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>

Hi Friends,

We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
A TREE.

The Distance method of TreeIO in Bioperl module gives the total distance.

But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
illustrated
in figure.

Suppose we have a tree
    A                C
     \              /
      \2          2/
       \__________/
       /    6     \
      /2          2\
     /              \
    B                D

The shared path between AB and AC is 2.
and for AC and BD the shared path is 6.

We need to find the shared distance as said above.

Kindly helps us it will help our research a lot.

With Thanks & regards
jeevitesh


From bix at sendu.me.uk  Mon Jul 23 11:08:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 23 Jul 2007 12:08:23 +0100
Subject: [Bioperl-l] Please Help us - regarding TreeIO for Shared
	Distance
In-Reply-To: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
References: <34830.192.168.1.125.1185186876.squirrel@webmail.ibab.ac.in>
Message-ID: <46A48C27.6060905@sendu.me.uk>

jeevitesh at ibab.ac.in wrote:
> Hi Friends,
> 
> We need your valuable help in finding the SHARED PATH BETWEEN TWO NODES PAIR OF
> A TREE.

Please stop sending this message. We heard you the first time. If no one 
answered, either no one knows the answer or no one understood you.


> The Distance method of TreeIO in Bioperl module gives the total distance.
> 
> But we NEED TO FIND THE SHARED(COMMON) DISTANCE BETWEEN TWO PAIR OF NODES as
> illustrated
> in figure.
> 
> Suppose we have a tree
>     A                C
>      \              /
>       \2          2/
>        \__________/
>        /    6     \
>       /2          2\
>      /              \
>     B                D
> 
> The shared path between AB and AC is 2.
> and for AC and BD the shared path is 6.

I don't follow. But if you already know how to work the answer out, 
describe the algorithm in words and maybe someone can code it up for you.


From georg.otto at tuebingen.mpg.de  Mon Jul 23 13:56:46 2007
From: georg.otto at tuebingen.mpg.de (Georg Otto)
Date: Mon, 23 Jul 2007 15:56:46 +0200
Subject: [Bioperl-l] run megablast
References: <m1r6n66or9.fsf@tuebingen.mpg.de>
	<1D20EBF8-3E8E-49DD-A66F-4690B4094F7C@uiuc.edu>
Message-ID: <m11weznrz5.fsf@tuebingen.mpg.de>

Thanks a lot! I guess I should have read the blast documentation more
carefully....

Best,

Georg

Chris Fields <cjfields at uiuc.edu> writes:
> StandAloneBlast runs the megablast executable directly, though I  
> think you can specify a MegaBlast search using blastall with the '-n'  
> flag.
>
> We could probably add this functionality in fairly easily since  
> SearchIO can parse megablast output; no one's had the need to code it  
> yet.
>
> chris
>
> On Jul 18, 2007, at 4:32 AM, Georg Otto wrote:
>
>>
>> Hi,
>>
>> is there a module to run megablast in a script (equivalent to ncbi
>> blast in StandAloneBlast.pm)?
>>
>> Cheers,
>>
>> Georg
>>


From cjfields at uiuc.edu  Mon Jul 23 15:41:35 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Jul 2007 10:41:35 -0500
Subject: [Bioperl-l] Bio::Assembly bug/feature?
Message-ID: <52744D70-CED6-49DB-8A17-0998F125D9AD@uiuc.edu>

To all:

I think I have found a major problem with Bio::Assembly; this was  
first noticed on Mac OS X in relation to bug 2320 and  
Bio::Assembly::IO.  I am uncertain whether this is meant to be a  
feature or a bug but it certainly needs to be documented or fixed as  
it leads to subtle errors.  I also can't see the advantage of this  
approach, but maybe I can be enlightened?  Either way, I think it's  
worth a discussion for those willing to follow.  I'll add as a bug  
later if needed.

A bit of background: each instance of a Bio::Assembly::Contig has a  
Bio::SeqFeature::Collection instance attached to it; each  
Bio::SeqFeature::Collection itself has a tied DB_File handle attached  
which remains open during the lifetime of the Bio::SF::Collection  
object.  When using Bio::Assembly one adds the various Contig objects  
to a Bio::Assembly::Scaffold.  So, for instance, if one had ~1000  
Contigs in a Scaffold, one would also have ~1000 open tied db  
handles, one per Contig instance.  So far, so good.

Unfortunately, when adding a ton of Contig objects to a  
Bio::Assembly::Scaffold one can run into a host of system-dependent  
issues based on resource usage limits (as one might expect).  This  
script:

------------------------------
use Bio::Assembly::Scaffold;
use Bio::Assembly::Contig;
use Bio::SeqFeature::Generic;

my $scaffold = Bio::Assembly::Scaffold->new();

for my $id (1..15000) {
     print "Contig #$id\n";
     my $contig = Bio::Assembly::Contig->new(-id => $id);
     my $feat = Bio::SeqFeature::Generic->new(-start=>1,
                                            -end=>10,
                                            -strand=>1);
     $contig->add_features([$feat]);
     $scaffold->add_contig($contig);
}
------------------------------

may fail on Mac OS X when one reaches the maximum number of open file  
descriptors possible on Mac OS X (on UNIX'y systems, this is 'ulimit - 
n'); the call to tie the DB_File handle in SF::Collection fails  
silently, so later on when called on you get the following:

...
Contig #251
Contig #252
Contig #253
Contig #254
Can't call method "put" on an undefined value at /Users/cjfields/src/ 
bioperl-live/Bio/SeqFeature/Collection.pm line 225.

I have added an exception to catch this.  On Mac OS X you can  
increase the file descriptor limit using ulimit, at least to a  
certain point.  However, when testing this out on dev.open-bio.org  
(Linux) the 'tie' sometimes fails (and the exception pops up), but it  
isn't dependent on 'ulimit -n'.  This is what happens more often:

...
Contig #10567
Contig #10568
Contig #10569
Contig #10570
Out of memory!

Sometimes followed by a seg fault.  Ick!

Any ideas? For instance, should we set this up so that one  
SF::Collection is used for all the Contigs (since each one has a  
unique ID anyway)?  Leave as is and document/track the issue as a  
bug?  Both?

chris


From ba6450 at wayne.edu  Mon Jul 23 20:06:14 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 16:06:14 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723160614.EEU90041@mirapointms6.wayne.edu>

Hello everyone:

I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:

[code]
use Bio::Tools::Run::Phylo::PAML::Codeml;
use Bio::AlignIO;
use Bio::TreeIO;

my $alignio = Bio::AlignIO->new(-format => 'phylip',
			         -file   => 'NM_000034.CDSalign.paml');

my $aln = $alignio->next_aln;

my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
my $tree   = $treeio->next_tree;

my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();

$codeml->alignment($aln);
$codeml->tree($tree);

my ($rc,$parser) = $codeml->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();
print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
[/code]

It gives the following error when I try to compile:

[error]
------------ EXCEPTION: Bio::Root::Exception -------------
MSG: unable to find or run executable for 'codeml'
STACK: Error::throw
STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
-----------------------------------------------------------
Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
[/error]

Any idea, guys?

Munirul Islam
Phd Student
Computer Science
Wayne State University


From arareko at campus.iztacala.unam.mx  Mon Jul 23 21:19:24 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 23 Jul 2007 16:19:24 -0500
Subject: [Bioperl-l] error running codeml
In-Reply-To: <20070723160614.EEU90041@mirapointms6.wayne.edu>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
Message-ID: <46A51B5C.9080808@campus.iztacala.unam.mx>

Apparently, your script isn't able to locate the codeml executable in 
your Windows environment. Do you have the PAML package installed? 
Instructions on how to install it are located here:

http://abacus.gene.ucl.ac.uk/software/paml.html

Regards,
Mauricio.

Munirul Islam wrote:
> Hello everyone:
> 
> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
> 
> [code]
> use Bio::Tools::Run::Phylo::PAML::Codeml;
> use Bio::AlignIO;
> use Bio::TreeIO;
> 
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
> 			         -file   => 'NM_000034.CDSalign.paml');
> 
> my $aln = $alignio->next_aln;
> 
> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> my $tree   = $treeio->next_tree;
> 
> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> 
> $codeml->alignment($aln);
> $codeml->tree($tree);
> 
> my ($rc,$parser) = $codeml->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> [/code]
> 
> It gives the following error when I try to compile:
> 
> [error]
> ------------ EXCEPTION: Bio::Root::Exception -------------
> MSG: unable to find or run executable for 'codeml'
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> -----------------------------------------------------------
> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> [/error]
> 
> Any idea, guys?
> 
> Munirul Islam
> Phd Student
> Computer Science
> Wayne State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From ba6450 at wayne.edu  Mon Jul 23 23:53:22 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Mon, 23 Jul 2007 19:53:22 -0400 (EDT)
Subject: [Bioperl-l] error running codeml
Message-ID: <20070723195322.EEV22403@mirapointms6.wayne.edu>

Thanks Mauricio. 

I needed to add an environment variable for the paml directiory. 

$ENV{'PAMLDIR'} = 'c:\paml3.15\bin'; 

One question ... I would like to save the temp files.  So, what modification do I need to make such that 
$obj->save_tempfiles returns 1 within codeml.pm? 

Regards 

Munir

---- Original message ----
>Date: Mon, 23 Jul 2007 16:19:24 -0500
>From: Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>  
>Subject: Re: [Bioperl-l] error running codeml  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Apparently, your script isn't able to locate the codeml executable in 
>your Windows environment. Do you have the PAML package installed? 
>Instructions on how to install it are located here:
>
>http://abacus.gene.ucl.ac.uk/software/paml.html
>
>Regards,
>Mauricio.
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is the code:
>> 
>> [code]
>> use Bio::Tools::Run::Phylo::PAML::Codeml;
>> use Bio::AlignIO;
>> use Bio::TreeIO;
>> 
>> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>> 			         -file   => 'NM_000034.CDSalign.paml');
>> 
>> my $aln = $alignio->next_aln;
>> 
>> my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
>> my $tree   = $treeio->next_tree;
>> 
>> my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
>> 
>> $codeml->alignment($aln);
>> $codeml->tree($tree);
>> 
>> my ($rc,$parser) = $codeml->run();
>> my $result = $parser->next_result;
>> my $MLmatrix = $result->get_MLmatrix();
>> print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
>> print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
>> print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
>> [/code]
>> 
>> It gives the following error when I try to compile:
>> 
>> [error]
>> ------------ EXCEPTION: Bio::Root::Exception -------------
>> MSG: unable to find or run executable for 'codeml'
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
>> STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
>> -----------------------------------------------------------
>> Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
>> [/error]
>> 
>> Any idea, guys?
>> 
>> Munirul Islam
>> Phd Student
>> Computer Science
>> Wayne State University
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>
>-- 
>MAURICIO HERRERA CUADRA
>arareko at campus.iztacala.unam.mx
>Laboratorio de Gen?tica
>Unidad de Morfofisiolog?a y Funci?n
>Facultad de Estudios Superiores Iztacala, UNAM
>


From jason at bioperl.org  Tue Jul 24 07:19:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 24 Jul 2007 09:19:18 +0200
Subject: [Bioperl-l] error running codeml
In-Reply-To: <46A51B5C.9080808@campus.iztacala.unam.mx>
References: <20070723160614.EEU90041@mirapointms6.wayne.edu>
	<46A51B5C.9080808@campus.iztacala.unam.mx>
Message-ID: <8273f6c20707240019q1f5e55c9i79a3142a92e2be6e@mail.gmail.com>

when you initialize the Codeml object just pass in my $codeml =
Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1);

OR do
$codeml->save_tempfiles(1);

You may want to set you TEMPDIR as well and you print out where the tempdir
is located with
print $codeml->tempdir;
and I think you can get the temp outfile.
my $name = $codeml->outfile_name;
print "name is $name\n";

-jason
On 7/23/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
>
> Apparently, your script isn't able to locate the codeml executable in
> your Windows environment. Do you have the PAML package installed?
> Instructions on how to install it are located here:
>
> http://abacus.gene.ucl.ac.uk/software/paml.html
>
> Regards,
> Mauricio.
>
>
> Munirul Islam wrote:
> > Hello everyone:
> >
> > I am new to bioperl.  I am running perl in Eclipse in Windows.  Here is
> the code:
> >
> > [code]
> > use Bio::Tools::Run::Phylo::PAML::Codeml;
> > use Bio::AlignIO;
> > use Bio::TreeIO;
> >
> > my $alignio = Bio::AlignIO->new(-format => 'phylip',
> >                                -file   => 'NM_000034.CDSalign.paml');
> >
> > my $aln = $alignio->next_aln;
> >
> > my $treeio = Bio::TreeIO->new(-format => 'newick', -file => 'tree.txt');
> > my $tree   = $treeio->next_tree;
> >
> > my $codeml = Bio::Tools::Run::Phylo::PAML::Codeml->new();
> >
> > $codeml->alignment($aln);
> > $codeml->tree($tree);
> >
> > my ($rc,$parser) = $codeml->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> > print "Ka = ", $MLmatrix->[0]->[1]->{'dN'},"\n";
> > print "Ks = ", $MLmatrix->[0]->[1]->{'dS'},"\n";
> > print "Ka/Ks = ", $MLmatrix->[0]->[1]->{'omega'},"\n";
> > [/code]
> >
> > It gives the following error when I try to compile:
> >
> > [error]
> > ------------ EXCEPTION: Bio::Root::Exception -------------
> > MSG: unable to find or run executable for 'codeml'
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359
> > STACK: C:/Perl/site/lib/Bio/Tools/Run/Phylo/PAML/Codeml.pm:572
> > -----------------------------------------------------------
> > Can't remove directory C:\DOCUME~1\MUNIRU~1\LOCALS~1\Temp\SqSqwJKDLI
> (Permission denied) at C:/Perl/lib/File/Temp.pm line 898
> > [/error]
> >
> > Any idea, guys?
> >
> > Munirul Islam
> > Phd Student
> > Computer Science
> > Wayne State University
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Tue Jul 24 21:16:54 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 17:16:54 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724171654.EEX04380@mirapointms6.wayne.edu>

Hello everyone:

I am having problem loading a sequence file from within a directory.  

#############################################################
$dirname = "rundir";
opendir (DIR, $dirname) || die("can't open $dirname");
      
while (defined($file = readdir(DIR))) {
    next if $file =~ /^\.\.?$/;		# skip . and ..
    $abs_path = File::Spec->rel2abs( $file ) ;
    
    # gives a file not found exception for the following code
    my $alignio = Bio::AlignIO->new(-format => 'nexus',
				-file   => $abs_path);
    my $aln = $alignio->next_aln;
    @sequencenames -> $aln->_read_taxlabels;
	  		
    foreach $taxa (@sequencenames) {
	print $taxa . "\n";
    } 		
}        
#############################################################

Your suggestions please.

Regards,

Munirul Islam
PhD Student
Computer Science
Wayne State University
Detroit, Michigan, USA


From bix at sendu.me.uk  Tue Jul 24 22:39:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Jul 2007 23:39:33 +0100
Subject: [Bioperl-l] error loading sequence
In-Reply-To: <20070724171654.EEX04380@mirapointms6.wayne.edu>
References: <20070724171654.EEX04380@mirapointms6.wayne.edu>
Message-ID: <46A67FA5.3070505@sendu.me.uk>

Munirul Islam wrote:
> Hello everyone:
> 
> I am having problem loading a sequence file from within a directory.  
> 
> #############################################################
> $dirname = "rundir";
> opendir (DIR, $dirname) || die("can't open $dirname");
>       
> while (defined($file = readdir(DIR))) {
>     next if $file =~ /^\.\.?$/;		# skip . and ..
>     $abs_path = File::Spec->rel2abs( $file ) ;
>     
>     # gives a file not found exception for the following code

This isn't a Bioperl problem. You're using the wrong File::Spec method. 
You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Wed Jul 25 00:10:04 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Tue, 24 Jul 2007 20:10:04 -0400 (EDT)
Subject: [Bioperl-l] error loading sequence
Message-ID: <20070724201004.EEX30791@mirapointms6.wayne.edu>

Thanks.  That worked nicely.  I need your suggestion to load codeml control data from a file.  Consider the following code:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params =>	{'noisy' => 9,
		 'verbose' => 2,
		 'runmode' => 0,
		 'seqtype' => 1,
		 'CodonFreq' => 2,
		 'aaDist' => 0,
		 'model' => 2,
		 'NSsites' => 2,
		 'icode' => 0	});
-------------------------------------------------------------

Tried to modify it by passing a hash reference after loading data from a file.:

-------------------------------------------------------------
my $codeml_null = Bio::Tools::Run::Phylo::PAML::Codeml->new(-save_tempfiles => 1,
-params => \%hashlist );
-------------------------------------------------------------

Still that didn't work.  Your suggestions pls.

Munir

---- Original message ----
>Date: Tue, 24 Jul 2007 23:39:33 +0100
>From: Sendu Bala <bix at sendu.me.uk>  
>Subject: Re: [Bioperl-l] error loading sequence  
>To: Munirul Islam <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>Munirul Islam wrote:
>> Hello everyone:
>> 
>> I am having problem loading a sequence file from within a directory.  
>> 
>> #############################################################
>> $dirname = "rundir";
>> opendir (DIR, $dirname) || die("can't open $dirname");
>>       
>> while (defined($file = readdir(DIR))) {
>>     next if $file =~ /^\.\.?$/;		# skip . and ..
>>     $abs_path = File::Spec->rel2abs( $file ) ;
>>     
>>     # gives a file not found exception for the following code
>
>This isn't a Bioperl problem. You're using the wrong File::Spec method. 
>You want File::Spec->catfile($dirname, $file).


From ba6450 at wayne.edu  Thu Jul 26 19:21:20 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 15:21:20 -0400 (EDT)
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
Message-ID: <20070726152120.EFA94600@mirapointms6.wayne.edu>

Hello Everyone:

I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.

my $alignio = Bio::AlignIO->new(-format => 'phylip',
				-file   => 'seq.txt');

I guess its not in valid phylip format.

I tried to change 'seq.txt' to sequential format.  Still that didn't work.

Any suggestions on how to load 'seq.txt' in bioperl?  

Thanks,

Munir
PhD Student
Computer Science
Wayne State University
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: seq.txt
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0004.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seq.out
Type: application/octet-stream
Size: 24318 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070726/7c180f0b/attachment-0004.obj>

From jason at bioperl.org  Fri Jul 27 00:12:03 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 17:12:03 -0700
Subject: [Bioperl-l] Alignment works with codeml (not loading in bioperl)
In-Reply-To: <20070726152120.EFA94600@mirapointms6.wayne.edu>
References: <20070726152120.EFA94600@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707261712o149fb884v2044421146e8bc24@mail.gmail.com>

You can try and pass in -interleaved => 0 as another option when you
init your AlignIO object.

On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
> Hello Everyone:
>
> I have an alignment ('seq.txt').  It runs fine when I directly run codeml.  But when I try to load the alignment in bioperl, I get an error saying that the sequence is not interleaved.
>
> my $alignio = Bio::AlignIO->new(-format => 'phylip',
>                                 -file   => 'seq.txt');
>
> I guess its not in valid phylip format.
>
> I tried to change 'seq.txt' to sequential format.  Still that didn't work.
>
> Any suggestions on how to load 'seq.txt' in bioperl?
>
> Thanks,
>
> Munir
> PhD Student
> Computer Science
> Wayne State University
>
>      11     2202
>
> human
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAT AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CAC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CGG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGA GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AGT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> chimp
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AGA ACC NNN AAT CTC ACC GAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CGT --- GGA GAG TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC GAG ACC GGT GAG CTG GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AAA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAC GAC GCC TTT GCC CGC GCC TTC GCA CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATC GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGC CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCA ACT CGG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ATC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACC GAC GGT CGC TCC GAC
> GGC TTG CCC TGG TGC AGT ACC ACG GCC AAC TAC GAC ACC GAC GAC CGG TTT GGC TTC TGC
> CCC AGC GAG AGA CTT TAC ACC CAG GAT GGC AAT GCT GAT GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CGG GAC AAG CTC TTC GGC TTC TGC CCG ACC
> CGA GCT GAC TCG ACG GTG ATG GGG GGC AAC TCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACT TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT TTG TTC CTC GTG GCG GCG CAT GAG TTC GGC CAC GCG CTG GGC TTA GAT CAT
> TCC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGC TTC ACT GAG GGG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CGG CAC CTC TAT GGT CCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACC GGA CCC CCC ACT GTC CGC CCC TCA GAG CGC CCC ACA GCT GGC CCC
> ACA GGT CCC CCC NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN --- NNN ACT GCT GGC CCT TCT ACG GCC ACT ACT --- GTG
> CCT TTG AGT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC GCG GAG ATT
> GGG AAC CAG CTG TAT TTG TTC AAG GAT GGG AAG TAC TGG --- --- CGA TTC TCT GAG ---
> --- GGC AGG GGG AGC CGG CCG CAG GGC CCC TTC CTT ATC GCC GAC AAG TGG CCC GCG CTG
> CCC CGC AAG CTG GAC TCG GTC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT TTG GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG AGT GGC AGG GGG
> AAG ATG CTG CTG TTC AGC GGG CGG CGC CTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTG GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TCC CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG TTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> macaca
> GCT GCC CCC AGA CAG CGC --- CAG TCC --- ACC CTT --- GTG CTC TTC CCT GGA GAC CTG
> AAA ACC NNN AAT CTC ACT GAC AGG CAG CTG GCA GAG GAC TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GTG GCA GAG ATG CAT --- GGA GAC TCG AAA --- TCT CTG GGG --- CCT GCG CTG
> CTG CTT CTC CAG AAG CAA CTG TCC CTG CCC CAG ACC GGT GAG CTA GAC AGC GCC ACG CTG
> AAG GCC ATG CGA ACC CCA CGG TGC GGG GTC CCA GAC CTG GGC AGA TTC CAA ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC CAC AAC ATC ACC TAT TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCG CGG GCG GTG ATT GAA GAC GCC TTT GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCG CTC ACC TTC ACT CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGG AAG GAC GGG CTC CTG GCA CAC GCC
> TTT CCT CCT GGG CCC GGC ATT CAG GGA GAC GCC CAT TTC GAC GAT GAC GAG TTG TGG TCG
> CTG GGC AAG GGC GTC GTG GTT CCA ACT AAG TTT GGA AAC GCA GAT GGC GCG GCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCT GCC TGC ACC ACA GAC GGT CGC TCC GAC
> GGC GTG CCC TGG TGC AGT ACC ACA GCC AAC TAC GAC ACT GAC CGC CGG TTT GGC TTC TGT
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCT GAC GGG AAA CCC TGC CAG TTT CCA
> TTC ATC TTC CAA GGC CAA TCC TAC TCC GCC TGC ACC ACG GAC GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCC GAC TCG ACC GTG ATC GGG GGC AAC TCG GCG GGG GAG CTG TGC GTT TTC CCC TTC
> ACC TTC CTG GGT AAG GAG TAC TCG ACC TGT ACC AGC GAG GGC CGC GGA GAT GGG CGC CTC
> TGG TGC GCT ACC ACC TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGT CTG TTC CTC GTG GCA GCT CAC GAA TTC GGC CAC GCG CTG GGC TTA GAT CAT
> ACC TCA GTG CCG GAG GCG CTC ATG TAC CCT ATG TAC CGA TTC ACT GAG GAG CCC CCC TTG
> CAT AAG GAC GAC GTG AAT GGC ATC CAG TAT CTC TAT GGT TCT CGC CCT GAA CCT GAG CCA
> CGG CCT CCA ACC ACC ACC ACN NNN NNN NNN NNN NNA --- CCG CAG CCC ACG GCT CCC CCG
> ACG GTC TGC CCC ACT GGA CCC CCC ACT GTC CGC CCC TCA GAC CGC CCC ACA GCC GGC CCC
> ACA GGT CCC CCC TCA GCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCT TCT ACG ACC ACT ACT --- GTG
> CCT TTG AAT CCG GTG GAC GAT GCC TGC AAC GTG AAC ATC TTC GAC GCC ATC ACG GAG ATC
> GGG AAC CAG CTG TAT CTG TTC AAG GAT GGG AGG TAC TGG --- --- CGA TTC TCC GAG ---
> --- CGC AGG GGG AGC CGG CTG CAG GGC CCC TTC CTT ATC GCC GAC ACG TGG CCC GCG TTG
> CCC CGC AAG CTG GAC TCG GCC TTT GAG GAG CCG CTC TCC AAG AAG CTT TTC TTC TTC TCT
> GGG CGC CAG GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTA GAC AAG
> CTG GGC CTG GGC GCC GAC GTG GCC CAG GTG ACC GGG GCC CTC --- CGG CGT GGC GCG GGG
> AAG ATG CTG CTA TTC AGC GGG CGG CGC TTC TGG AGG TTC GAC GTG AAG GCG CAG ATG GTG
> GAT CCC CGG AGC GCC AGC GAG --- --- GTA GAC CGG ATG TTC CCC GGG GTG CCT TTG GAC
> ACG CAC GAC GTC TTC CAG TAC CAA --- GAG AAA GCC TAT --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG AGT TTC CAG AGT NNN NNN NNN NNN NNN NNN NNN GGG GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG ACC TAT GAC ATC CTG CAG TGC CCT
> mouse
> GCT GCC CCT TAC CAG CGC --- CAG CCG --- ACT TTT --- GTG GTC TTC CCC AAA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACC CAG CTG GCA GAG GCA TAC TTG TAC CGC TAT GGT TAC
> ACC CGG GCC GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCT CTA CGG --- CCG GCT TTG
> CTG ATG CTT CAG AAG CAG CTC TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC CAG ACA CTA
> AAG GCC ATT CGA ACA CCA CGC TGT GGT GTC CCA GAC GTG GGT CGA TTC CAA ACC TTC AAA
> GGC NNN CTC AAG TGG GAC CAT CAT AAC ATC ACA TAC TGG ATC CAA AAC TAC TCT GAA GAC
> TTG CCG CGA GAC ATG ATC GAT GAC GCC TTC GCG CGC GCC TTC GCG GTG TGG GGC GAG GTG
> GCA CCC CTC ACC TTC ACC CGC GTG TAC GGA CCC GAA GCG GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGA GAC GGG TAT CCC TTC GAC GGC AAG GAC GGC CTT CTG GCA CAC GCC
> TTT CCC CCT GGC GCC GGC GTT CAG GGA GAT GCC CAT TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GTC GTG ATC CCC ACT TAC TAT GGA AAC TCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TCG GCC TGC ACC ACA GAC GGC CGC AAC GAC
> GGC ACG CCT TGG TGT AGC ACA ACA GCT GAC TAC GAT AAG GAC GGC AAA TTT GGT TTC TGC
> CCT AGT GAG AGA CTC TAC ACG GAG CAC GGC AAC GGA GAA GGC AAA CCC TGT GTG TTC CCG
> TTC ATC TTT GAG GGC CGC TCC TAC TCT GCC TGC ACC ACT AAA GGC CGC TCG GAT GGT TAC
> CGC TGG TGC GCC ACC ACA GCC AAC TAT GAC CAG GAT AAA CTG TAT GGC TTC TGC CCT ACC
> CGA GTG GAC GCG ACC GTA GTT GGG GGC AAC TCG GCA GGA GAG CTG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT TCC TGT ACC AGC GAC GGC CGC AGG GAT GGG CGC CTC
> TGG TGT GCG ACC ACA TCG AAC TTC GAC ACT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTC CTG GTG GCA GCG CAC GAG TTC GGC CAT GCA CTG GGC TTA GAT CAT
> TCC AGC GTG CCG GAA GCG CTC ATG TAC CCG CTG TAT AGC TAC CTC GAG GGC TTC CCT CTG
> AAT AAA GAC GAC ATA GAC GGC ATC CAG TAT CTG TAT GGT CGT GGC TCT AAG CCT GAC CCA
> AGG CCT CCA GCC ACC ACC ACA ACT NNN NNN NNN GAA --- CCA CAG CCG ACA GCA CCT CCC
> ACT ATG TGT CCC ACT ATA CCT CCC ACG GCC TAT CCC ACA GTG GGC CCC ACG GTT GGC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA AGC AGC CCG TCA CCT GGC CCT ACA GGC GCC CCC
> TCA CCT GGC CCT ACA GCG CCC --- CCT ACT GCG GGC TCT TCT GAG GCC TCT ACA --- GAG
> TCT TTG AGT CCG GCA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCT ATT GCT GAG ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT TGG TAC TGG --- --- AAG TTC CTG AAT ---
> --- CAT AGA GGA AGC CCA TTA CAG GGC CCC TTC CTT ACT GCC CGC ACG TGG CCA GCC CTG
> CCT GCA ACG CTG GAC TCC GCC TTT GAG GAT CCG CAG ACC AAG AGG GTT TTC TTC TTC TCT
> GGA CGT CAA ATG TGG GTG TAC ACA GGC AAG ACC GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGT CTA GGC CCA GAG GTA ACC CAC GTC AGC GGG CTT CTC CCG CGT CGT CTC --- GGG
> AAG GCT CTG CTG TTC AGC AAG GGG CGT GTC TGG AGA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTC ATT CGC --- --- GTG GAT AAG GAG TTC TCT GGT GTG CCC TGG AAC
> TCA CAC GAC ATC TTC CAG TAC CAA --- GAC AAA GCC TAT --- TTC TGC CAT GGC AAA TTC
> TTC TGG CGT GTG AGT TTC CAA AAT GAG GTG AAC AAG GTG GAC CAT GAG GTG AAC CAG GTG
> GAC GAC GTG GGC TAC GTG ACC TAC GAC CTC CTG CAG TGC CCT
> rat
> GCT GCC CCT CAC CAG CGC --- CAG CCG --- ACT TAT --- GTG GTC TTC CCC CGA GAC CTG
> AAA ACC TCC AAC CTC ACG GAC ACA CAG CTG GCA GAG GAT TAC CTG TAC CGC TAT GGT TAC
> ACT CGG GCA GCC CAG ATG ATG --- GGA GAG AAG CAG --- TCC CTG CGG --- CCC GCT TTG
> CTG ATG CTT CAG AAG CAG CTG TCC CTG CCC CAG ACT GGT GAG CTG GAC AGC GAG ACA CTA
> AAG GCC ATT CGT TCA CCG CGC TGT GGT GTC CCA GAC GTG GGC AAA TTC CAA ACC TTC GAA
> GGC GAC CTC AAG TGG CAC CAT CAT AAC ATC ACC TAT TGG ATC CAA AGC TAC ACC GAA GAC
> TTG CCG CGA GAC GTG ATC GAT GAC TCC TTC GCG CGC GCC TTC GCG GTG TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACC CGC GTG TAC GGG CTC GAA GCA GAC ATT GTC ATC CAG TTT GGT
> GTC GCG GAG CAC GGG GAC GGG TAT CCC TTC GAC GGC AAG GAT GGT CTA CTG GCA CAC GCC
> TTT CCC CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAC GAG TTG TGG TCG
> CTG GGC AAA GGC GCC GTG GTC CCC ACT TAC TTT GGA AAC GCA AAT GGT GCC CCA TGT CAC
> TTT CCC TTC ACC TTC GAG GGA CGC TCC TAT TTG TCC TGC ACC ACG GAT GGC CGC AAC GAC
> GGC AAG CCT TGG TGT GGC ACG ACA GCT GAC TAC GAC ACA GAC AGA AAA TAT GGT TTC TGC
> CCC AGT GAG AAT CTC TAC ACG GAG CAT GGC AAC GGA GAC GGC AAA CCC TGC GTA TTT CCA
> TTC ATC TTC GAG GGC CAC TCC TAC TCT GCC TGC ACC ACT AAA GGT CGC TCG GAT GGT TAT
> CGC TGG TGC GCC ACC ACC GCC AAC TAT GAC CAG GAT AAG CTG TAT GGC TTC TGT CCT ACT
> CGA GCC GAC GTC ACT GTA ACT GGG GGC AAC TCG GCA GGA GAG ATG TGC GTC TTC CCC TTC
> GTC TTC CTG GGC AAG CAG TAC TCT ACC TGT ACC GGC GAG GGC CGC AGT GAT GGG CGC CTC
> TGG TGC GCG ACG ACG TCG AAC TTC GAC GCT GAC AAG AAG TGG GGT TTC TGT CCA GAC CAA
> GGG TAC AGC CTG TTT CTG GTG GCA GCG CAC GAG TTC GGC CAT GCG CTG GGC TTA GAT CAT
> TCT TCA GTG CCG GAA GCG CTC ATG TAC CCC ATG TAT CAC TAC CAC GAG GAC TCC CCT CTG
> CAT GAA GAC GAC ATA AAA GGC ATC CAG CAT CTG TAT GGT CGT GGC TCT AAA CCT GAC CCA
> AGG CCT CCA GCC ACC ACC GCA GCT NNN NNN NNN GAA --- CCA CAG CCG ACA GCT CCT CCC
> ACT ATG TGT CCC ACT GCA CCT CCC ATG GCC TAT CCC ACA GGG GGC CCC ACA GTC GCC CCT
> ACA GGC GCC CCC TCA CCT GGC CCC ACA GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCT ACT GCT GGT CCT TCT GAG GCC CCT ACA --- GAG
> TCT TCG ACT CCA GTA GAC AAT CCT TGC AAT GTG GAT GTT TTT GAT GCC ATT GCT GAT ATC
> CAG GGC GCT CTG CAT TTC TTC AAG GAC GGT CGG TAT TGG --- --- AAG TTC TCG AAT ---
> --- CAC GGA GGA AGC CAA TTG CAG GGC CCC TTT CTT ATT GCC CGC ACG TGG CCA GCT TTG
> CCT GCA AAG TTG AAC TCA GCC TTT GAG GAT CCG CAG TCC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC AAA ATG TGG GTG TAC ACA GGC CAG ACG GTG CTG GGC CCC AGG AGT CTG GAT AAG
> TTG GGG CTA GGC TCA GAG GTA ACC CTG GTC ACC GGA CTT CTC CCG CGT CGT GGA --- GGG
> AAG GCT CTG CTG ATC AGC CGG GAA CGT ATC TGG AAA TTC GAC TTG AAG TCT CAG AAG GTG
> GAT CCC CAG AGC GTT ACT CGC --- --- TTG GAT AAC GAG TTC TCT GGC GTG CCC TGG AAC
> TCA CAC AAC GTC TTT CAC TAC CAA --- GAC AAG GCC TAT --- TTC TGC CAT GAC AAA TAC
> TTC TGG CGT GTG AGT TTC CAC AAC NNN NNN NNN NNN NNN NNN NNN CGG GTG AAC CAG GTG
> GAC CAC GTG GCC TAC GTG ACC TAT GAC CTC CTG CAG TGC CCT
> rabbit
> GCC GCC CCT CGC CGC CGC --- CAG CCC --- ACC TTG --- GTG GTC TTC CCA GGA GAG CTG
> AGA ACC NNN AGG CTC ACC GAC AGG CAG CTG GCA GAG GAG TAC CTG TTC CGC TAT GGT TAC
> ACC CGC GTA GCC AGC ATG CAC --- GGA GAC AGC CAG --- TCC CTG CGG CTG CCG --- CTG
> CTA CTT CTG CAG AAG CAT CTG TCC CTG CCG GAG ACG GGG GAG CTG GAT AAT GCC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC GTG GGC AAA TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG CAC CAC CAC AAC ATC ACG TAC TGG ATC CAA AAC TAC TCC GAA GAC
> CTG CCG CGC GAC GTC ATC GAC GAC GCC TTC GCC CGC GCC TTT GCG CTG TGG AGC GCG GTG
> ACG CCA CTC ACC TTC ACC CGC GTG TAC AGC CGG GAC GCA GAC ATT GTC ATC CAG TTT GGG
> GTC GCG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGC AAG GAC GGG CTC CTG GCG CAC GCC
> TTC CCT CCT GGC CCC GGC ATT CAG GGA GAT GCC CAC TTC GAC GAC GAA GAG CTG TGG TCC
> CTG GGC AAG GGC GTC GTG GTT CCC ACG TAC TTT GGA AAC GCC GAC GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC ACC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGC ATG GCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTT GGC TTC TGC
> CCC AGC GAA AGA CTC TAC ACC CAG GAC GGC AAC GCA GAC GGC AAG CCC TGC GAG TTT CCG
> TTC ATC TTC CAG GGC CGT ACC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCC GAC GGC CAC
> CGC TGG TGC GCC ACC ACC GCC AGC TAC GAC AAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GCT GAC TCC ACG GTG GTC GGG GGC AAC TCG GCG GGA GAG CTG TGT GTC TTC CCC TTC
> GTC TTC CTG GGC AAA GAG TAC TCG TCC TGT ACC AGC GAG GGT CGC AGG GAT GGG CGC CTC
> TGG TGT GCC ACC ACT TCC AAC TTT GAC AGC GAC AAG AAG TGG GGC TTC TGC CCT GAT AAA
> GGA TAC AGC CTG TTC CTC GTG GCA GCC CAC GAG TTC GGC CAT GCA CTG GGC TTG GAT CAC
> TCC TCT GTG CCG GAG CGC CTC ATG TAC CCC ATG TAC CGC TAC CTA GAG GGG TCC CCC CTG
> CAC GAG GAC GAC GTC AGG GGC ATC CAG CAT CTA TAT GGT CCT AAC CCC AAC CCC CAG CCT
> --- CCA GCC ACC ACC ACA CCT GAN NNN NNN NNN NNN NNG CCG CAG CCC ACG GCT CCC CCG
> ACG GCC TGC CCC ACC TGG CCG GCC ACT GTG CGC CCC TCC GAG CAC CCC ACT ACC AGC CCT
> ACC GGC GCC CCC TCA GCT GGC CCT ACC GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACG GCC AGC CCC TCT GCG GCC CCC ACT --- GCG
> TCC TTG GAC CCA GCT GAA GAC GTC TGC AAC GTG AAT GTC TTC GAC GCC ATC GCC GAG ATA
> GGG AAC AAG CTG CAT GTC TTC AAG GAT GGG AGG TAC TGG --- --- CGG TTC TCC GAG ---
> --- GGC AGT GGG CGC CGG CCG CAG GGC CCC TTC CTC ATC GCC GAC ACC TGG CCC GCG CTG
> CCG GCC AAG CTG GAC TCC GCC TTT GAG GAG CCG CTC ACC AAG AAG CTG TTC TTC TTC TCG
> GGG CGC CAA GTG TGG GTG TAC ACA GGC GCG TCG GTG CTG GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGT CCC GAG GTG CCG CAC GTC ACC GGA GCC CTC CCG CGC GCC GGG --- GGC
> AAG GTG CTG CTG TTC GGC GCG CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACG GTG
> GAT TCC CGG AGC GGC GCT CCG --- --- GTG GAT CAG ATG TTC CCC GGG GTG CCT TTG AAC
> ACA CAC GAC GTC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TTC TGG CGT GTG AGT ACC CGG AAC NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CTA GTG
> GAC CAG GTG GGC TAC GTG AGC TTT GAC ATC CTG CAC TGC CCT
> dog
> GCA GCT CCC AGA CCA CAC --- AAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAC CTG
> AGA ACT NNN AAT CTC ACT GAC AAG CAG CTG GCA GAG GAA TAT CTG TTT CGC TAT GGC TAC
> ACT CAA GTG GCC GAG CTG AGC --- GAC GAC AAG CAG --- TCC CTG AGT CGC GGG --- CTG
> CGG CTT CTC CAG AGG CGC CTG GCT CTG CCT GAG ACT GGA GAG CTG GAC AAA ACC ACC CTG
> GAG GCC ATG CGG GCC CCG CGC TGC GGC GTC CCG GAC CTG GGC AAA TTC CAG ACC TTT GAG
> GGC GAC CTC AAG TGG CAC CAC AAC GAC ATC ACT TAC TGG ATA CAA AAC TAC TCG GAA GAC
> TTG CCC CGC GAC GTG ATC GAC GAC GCC TTT GCC CGA GCC TTC GCG GTC TGG AGC GCG GTG
> ACA CCG CTC ACC TTC ACT CGC GTG TAC GGC CCC GAA GCC GAC ATC ATC ATT CAG TTT GGT
> GTT AGG GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTT CTG GCT CAC GCC
> TTT CCT CCC GGC CCG GGC ATT CAG GGA GAC GCC CAC TTC GAC GAC GAG GAG TTA TGG ACT
> CTG GGC AAG GGC GTC GTG GTT CCG ACC CAC TTC GGA AAC GCA GAT GGC GCC CCC TGC CAC
> TTC CCC TTC ACC TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACG GAC GGC CGC TCC GAT
> GAC ACG CCC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGT CGG TTC GGC TTC TGC
> CCC AGC GAG AAA CTC TAC ACC CAG GAC GGC AAT GGG GAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACC TTC GAG GGC CGC TCC TAC TCC ACG TGC ACC ACC GAC GGC CGC TCG GAC GGC TAC
> CGC TGG TGC TCC ACC ACC GGC GAC TAC GAC CAG GAC AAA CTC TAC GGC TTC TGC CCA ACC
> CGA GTC GAT TCC GCG GTG ACC GGG GGC AAC TCC GCC GGG GAG CCG TGT GTC TTC CCC TTC
> ATC TTC CTG GGC AAG CAG TAC TCG ACG TGC ACC AGG GAG GGC CGC GGA GAT GGG CAC CTC
> TGG TGC GCC ACC ACT TCG AAC TTT GAC AGA GAC AAG AAG TGG GGC TTC TGC CCG GAC CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCC CAT GAG TTC GGC CAC GCG CTG GGT TTA GAT CAT
> TCA TCG GTG CCA GAA GCG CTC ATG TAC CCC ATG TAC AGC TTC ACC GAG GGC CCC CCC CTG
> CAT GAA GAC GAC GTG AGG GGC ATC CAG CAT CTG TAC GGT CCT CGC CCT GAA CCT GAG CCA
> CAG CCT CCA ACC GCN NNN NNN NNN NNN NNN NNN NNN --- NNC CCG CCC ACC GCC CCG CCC
> ACC GTC TGC GCT ACT GGT CCT CCC ACC ACC CGC CCC TCA GAG CGC CCC ACT GCT GGC CCC
> ACA GGC CCC CCT GCA GCT GGC CCC ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCC --- CCC ACT GCT GGC CCC TCT GAG GCC CCT ACA --- GTG
> CCT GTG GAT CCG GCA GAG GAT ATA TGC AAA GTG AAC ATC TTC GAC GCC ATC GCG GAG ATC
> AGG AAC TAC TTG CAT TTC TTC AAG GAA GGG AAG TAC TGG --- --- CGA TTC TCC AAG ---
> --- GGC AAG GGA CGC CGG GTG CAG GGC CCC TTC CTT ATC ACC GAC ACG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTT GAG GAC GGG CTC ACC AAG AAG ACT TTC TTC TTC TCT
> GGG CGC CAA GTG TGG GTG TAC ACA GGC ACG TCG GTG GTA GGC CCG AGG CGT CTG GAC AAG
> CTG GGC CTG GGC CCG GAG GTT ACC CAA GTC ACC GGC GCC CTC CCG CAA GGC GGG --- GGT
> AAG GTG CTG CTG TTC AGC AGG CAG CGC TTC TGG AGT TTC GAC GTG AAG ACG CAG ACC GTG
> GAT CCC AGG AGC GCC GGC TCG --- --- GTG GAA CAG ATG TAC CCC GGG GTG CCC TTG AAC
> ACG CAT GAC ATC TTC CAG TAC CAA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGT GTG AAT TCT CGG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAC CAG GTG
> GAC GAA GTG GGC TAC GTG ACC TTT GAC ATT TTG CAG TGC CCT
> cow
> GCT GTC CCC AGA CGA CGC --- CAG CCC --- ACC GTT --- GTG GTC TTT CCA GGA GAA CCA
> CGA ACC NNN AAC CTC ACC AAC AGG CAG CTG GCA GAG GAA TAC CTG TAC CGC TAT GGC TAC
> ACT CCT GGG GCA GAG CTG AGC --- GAG GAC GGT CAG --- TCC CTG CAG CGA GCT CTG CTG
> CGC --- TTC CAG CGG CGC CTG TCC CTG CCC GAG ACT GGC GAG CTG GAC AGC ACC ACC CTG
> AAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC GTG GGC AGA TTC CAG ACC TTT GAG
> GGC GAA CTC AAG TGG CAC CAC CAC AAC ATC ACC TAC TGG ATC CAA AAT TAC TCG GAA GAC
> CTG CCG CGC GCC GTG ATC GAC GAC GCC TTT GCC CGC GCT TTC GCG CTC TGG AGC GCT GTG
> ACG CCG CTC ACC TTC ACT CGA GTG TAC GGC CCC GAA GCT GAC ATT GTC ATC CAG TTT GGT
> GTT AGA GAG CAC GGA GAT GGG TAT CCC TTC GAT GGG AAG AAC GGG CTC CTG GCA CAC GCC
> TTT CCG CCT GGC AAA GGC ATT CAG GGA GAT GCC CAC TTC GAC GAT GAA GAG TTG TGG TCT
> CTG GGC AAA GGC GTT GTG ATC CCG ACC TAC TTC GGA AAC GCG AAG GGC GCC GCC TGC CAC
> TTC CCC TTC ACC TTT GAG GGT CGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGT TCC GAC
> GAC ATG CTC TGG TGC AGC ACC ACC GCC GAC TAC GAC GCC GAC CGC CAG TTC GGC TTC TGC
> CCC AGC GAG AGA CTC TAC ACC CAG GAC GGC AAT GCG GAC GGC AAG CCC TGC GTC TTC CCG
> TTC ACC TTC CAG GGC CGC ACC TAC TCC GCC TGT ACC TCC GAT GGT CGC TCC GAC GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAC GGC TTC TGC CCG ACC
> CGA GTC GAT GCA ACG GTG ACC GGG GGC AAC GCG GCG GGG GAG CTG TGC GTC TTC CCC TTC
> ACC TTC CTG GGC AAG GAA TAC TCG GCC TGC ACC AGA GAG GGT CGC AAT GAT GGG CAC CTC
> TGG TGC GCC ACC ACC TCC AAC TTC GAC AAA GAC AAG AAG TGG GGC TTC TGC CCG GAT CAA
> GGA TAC AGC CTG TTC CTT GTG GCC GCA CAC GAG TTT GGC CAC GCG CTG GGC TTA GAT CAC
> ACC TCC GTG CCA GAG GCG CTC ATG TAC CCC ATG TAC AGA TTC ACA GAG GAG CAC CCC CTG
> CAT AGG GAC GAT GTT CAG GGC ATC CAG CAT CTG TAT GGT CCT CGC CCT GAG CCT GAA CCA
> CGG CCT CCG ACC ACT ACC ACC ACT ACC ACC ACC GAA --- CCC CAG CCC ACC GCT CCC CCC
> ACG GTC TGC GTC ACG GGG CCT CCC ACC GCC CGC CCC TCA GAG GGT CCC ACT ACT GGC CCC
> ACA GGG CCC CCG GCA GCT GGC CCT ACG GGT NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN CCT --- CCC ACG GCT GGC CCT TCT GCG GCC CCG ACG GAG TCC
> CCG --- GAT CCA GCG GAG GAC GTC TGC AAC GTG GAC ATC TTC GAC GCC ATC GCG GAG ATT
> AGG AAC CGC TTG CAT TTC TTC AAG GCT GGG AAG TAC TGG --- --- AGA CTT TCT GAG ---
> --- GGA GGG GGC CGC CGG GTG CAG GGT CCC TTC CTT GTC AAG AGC AAG TGG CCT GCG CTG
> CCC CGC AAG CTG GAC TCC GCC TTC GAG GAT CCG CTC ACC AAG AAG ATT TTC TTC TTC TCT
> GGG CGC CAA GTA TGG GTG TAC ACC GGC GCG TCG TTG CTA GGC CCG AGG CGT CTG GAC AAG
> TTG GGC CTG GGC CCG GAA GTG GCC CAG GTC ACC GGG GCC CTC CCG CGC CCT GAG --- GGT
> AAG GTG CTG CTG TTC AGC GGG CAG AGC TTC TGG AGG TTC GAC GTG AAG ACA CAG AAG GTG
> GAT CCC CAG AGC GTC ACC CCC --- --- GTG GAC CAG ATG TTC CCC GGG GTG CCC ATT AGC
> ACG CAC GAC ATC TTT CAG TAC CAA --- GAG AAA GCT TAC --- TTC TGC CAG GAT CAC TTC
> TAC TGG CGC GTG AGT TCC CAG AAT NNN NNN NNN NNN NNN NNN NNN GAG GTG AAT CAG GTG
> GAC TAT GTG GGC TAC GTG ACC TTC GAC CTC CTG AAG TGC CCT
> elephant
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
> --- --- --- --- --- --- --- --- --- --- --- GAG --- TAT CTG TAC CGC TAT GGC TAC
> ACT CGT GTG GCG GAG ATG AAC --- --- AGT AAG GTG --- TCC CTG GGT --- CGA GCG CTA
> AGG CTT CTC CAG CAA AAC CTG GCC CTG CCC GAG ACC GGC GAG CTG GAC AGC ACC ACC CTG
> GAC GCC ATG CGA GCC CCG CGC TGC GGC GTC CCA GAC ATG GGT GGC TTC CAG ACC TTC GAG
> GGT GAC CTC AAG TGG AAC CAC CAC AAC ATC ACA TAC TGG ATC CAA AAC TAC TCG GAA GAC
> TTG CCC AAA CAA GTG ATC GAA GAC GCT TTT GCC CGC GCC TTC GCG GCG TGG AGC GAG GTG
> ACA CCA CTC ACC TTC ACC CGC CTG CGC AGC AGG GAC GTG GAC ATC GTC ATC CGG TTT GGG
> GTC AAG GAG CAC GGA GAC GGG TAT CCT TTC GAC GGG AAG GAC GGG CTG CTG GCA CAC GCC
> TTT CCT CCC GGC CCC GGC ATT CAG GGA GAC GCG CAC TTC GAC GAT GAC GAA TTG TGG TCG
> TTG GGC AAG GGC GTC GTG GTT CCC ACC CGC TTT GGA AAC GCA GAT GGC GCC GCC TGC CAC
> TTT CCC TTC ACC TTC CAG GGC CGC TCG TAC ACT GCC TGC ACC GCC GAC GGC CGC TCC GAC
> GGC CAG CTC TGG TGC AGC ACC ACG GCC GAC TAT GAC ACC GAC CGC CAG TTT GGC TTC TGC
> CCC AGT GAG AGG CTC TAC ACC CAG CAC GGC AAT GAC AAC GGC AAG CCC TGC GTG TTT CCG
> TTC ACG TTC GAG GGC CGC TCC TAC TCG GCC TGC ACC ACC GAC GGC CGC TCG GAT GGC TAC
> CGC TGG TGC GCC ACC ACC GCC AAC TAC GAC CAG GAC AAG CTC TAT GGC TTC TGT CCC ACC
> CGA --- GNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- NNN NNN NNN ---
> --- --- --- --- --- --- --- --- NNN NNN --- NNN NNN NNN --- --- --- --- --- ---
> --- --- --- --- NNN NNN NNN NNN NNN --- --- --- --- --- --- --- --- NNN NNN NNN
> NNN NNN --- --- --- --- NNN --- NNN NNN NNN NNN --- --- --- --- NNN NNN --- ---
> --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- --- NNN NNN NNN NNN ---
> --- --- --- --- --- --- NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN
> NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN NNN --- NNN NNN NNN --- NNN
> NNN ATA GTG CTG TTT AGT AGA CAG CGC TTC TGG AGG TTC GAC GTG AAG ACG CAG ACT GTG
> GAG CCC CGG AGC GTC CGC TCG --- --- GTG GAC CAG GTG TTC TCC GGG GTG CCC TTG GAC
> ACG CAC GAC ATC TTC CAG TAC CGA --- GAG AAA GCC TAC --- TTC TGC CAG GAC CGC TTC
> TAC TGG CGC GTG TGT TTC CGG AAT GAT --- AAT GAA --- --- --- --- GTG AAC CAG GTG
> GAC CAA GTG GGC TAC GTG AAC TTT GAC ATC CTG CAG TGC CCT
> opossum
> GCT GCA CCC CGA GGG GGC CCC TCT CCC GGG TCT ATC TTG ATC ACC TTT CCT GAA GAG AGA
> --- ACA CGC ACT CTC ACT GAC CAG CAA TTT GCT GAG GAA TAT CTG CTT CGG TAC GGC TAC
> ATC CCG --- GCA GGG CTT CTG --- GGC CAA AAC CAC ACT TCT CTG AAG --- CAT GCC TTA
> AAG AAA CTC CAA CGT CAG CTG GCC CTG ACA CAG ACG GGA GAG CTG GAC AGC GCC ACC ATC
> GAG GCA ATG CGG GCC CCG CGC TGC GGA GTA CCC GAC GTC GCC CCA TTC CAA ACC TTC GAG
> GGT GAA CTG AAG TGG AAA CAT CAG AAC ATC ACC TAT CGG ATC CAG AAT TAC TCC CCC GAC
> CTG CCT CCT GAG GTG ACG GAT GAT GCT TTC CAA CGA GCC TTT GCT CTG TGG AGT AAA GTG
> ACC CCA CTC ACC TTC ACA CGT GTC AGC AGC GGG GAG GCA GAC ATC CTG ATC CAG TTT GGG
> ACC AGA GAG CAC GGC GAT GGA TAC CCT TTT GAC GGG AAA GAT GGA CTC TTG GCT CAC GCT
> TTC CCC CCG GGC CCA GGA ATC CAG GGA GAT GCC CAC TTT GAT GAC GAG GAG TTC TGG ACT
> CTA GGC AAA GGC GTC GTG GTC AAA ACG CGG TTC GGG AAC GCA GAC GGA GCC CCC TGC CAC
> TTT CCT TTC ACC TTC GAG GGC AGC TCC TAC TCC GCC TGC ACC ACG GAC GGC CGC TCT GAC
> GGG CTG CAC TGG TGC AGC ACT ACG GCT GAC TAT GAC AAG GAC CGC CTT TAC GGC TTT TGC
> CCT AGC GAG CTG CTC TAC ACC CTG GAT GGT AAC GCC AAT GGC GAT CCC TGC GTG TTC CCC
> TTC ACC TTC GAT GGT CGT TCC TAC ACA GCC TGC ACC ACT GAA GGA CGC TCT GAC GGC TAC
> CGC TGG TGT GCC ACT ACT GCC AGT TAC GAT CAG GAC AAG CTT TAT GGC TTC TGT CCC AAC
> CGA --- GAT ACT GCG GTG AGC GGA GGC AAC TCC CAA GGG GAA CCC TGC GTC TTT CCC TTC
> ACT TTC CTA AAT CGA GAA TAC TCA GCC TGC ACC AGT GAG GGC CGC AGT GAC GGT CGT CTC
> TGG TGT GCG ACC ACC GAT GAC TTC GAT CGG GAT CAC AAG TGG GGC TTC TGT CAG GAT CGA
> GGG TAC AGC TTA TTC CTT GTG GCC GCG CAC GAG TTT GGG CAC GCG CTG GGC TTG GAC CAC
> TCA TCT GTG CCG GAA GCA TTG ATG TTC CCA ATG TAC CGT TTT ACC GAG GGA CCC CCG TTG
> CAT GAG GAT GAC GTG AAG GGA ATC CAA CAT CTG TAT GGT TCT AGG ACT GAG CCG GAT CCG
> GAA CCT CCG ACC TCT --- --- --- TCT CCC TTA GAG --- CCA GAT TCC ACC ACT CAG TTC
> AAT GCT TGT --- --- --- CCC --- TCT GTA --- CCC CCC CCT --- --- --- GCC AGA CCC
> ACC GGC CCT CCT ACT GCT CGC CCC TCA --- --- --- --- --- --- --- --- GCA CCT CCC
> ACT GCT GGA CCC ACT GGT CCT --- CCC ACA GCC AAC CCT CCT GTG CCC CCC ACT --- GGG
> CCC TTG GAC CCA GCT GAC GAC GCT TGT GGC GTC CTG GTA TTT GAT GCC ATC GCT GAG ATT
> CGA GGC CAG CTT CAC TTC TTC AAA GAC GGA CGG TAC TGG CGA GTC CCC AGG GAC TCC ---
> --- AAG --- GGG CCA --- ACT CAA GGA CCC TTC CTC ATT GCT AAC ACT TGG TCT GCT TTG
> CCC CCA AAA CTG GAC TCG GCT TTC GAA GAT CCC CTG ACT AAG AAA CTC TTC TTC TTT TCA
> GGT AAA GGT ATG TGG GTA TAC ACA GGC CAG TCA GTT GTA GGT CCC CGG CGC CTG GAG AAG
> CTG GGT CTG CAT AGC AGA GTT CAA AGG ATA ACA GGT GCC ATT CAG CAT AAT GGA --- GGC
> AAG GTG CTA TTA TTC AGC CAG AAT CAA TAT TGG AGG TTG GAT GTG AAG AAG CAG AAG GTA
> GAC TCA AGA GAA CCT TAC CCT --- --- GTG GAG AAC ATG TTC CCT GGA GTA CCT GAA AAC
> ACT CAT GAT GTT TTC CTG TAT AAG GGA GAT ACA --- TAC --- TTC TGC CAG GGC ATC TTC
> TTC TGG CGC GTG AAC --- --- --- --- --- AAG GAG --- --- --- --- --- AAC AAG GTG
> GAC TTA GTA GGC TAC GTG ACC TAC GAC CTC CTG --- --- ---
> chicken
> GCC GCC CCA CTG CAC AGC --- AAG CCG CAG GCG GTC --- ATC ACC TTC CCA GGG GAG CTG
> --- CTC AGC GCC CCA TCA GAC GTG GAG CTG GCG GAG AAC TAC CTG CTG CGC TTC GGC TAC
> ATC CAG GAG GCA GAG GTG AGG AGG AGC AGC AAG CAC GTG TCC CTG GCC --- AAA GCG CTG
> CGC AGG ATG CAG AAG CAG CTG GGG CTG GAG GAG ACG GGG GAG CTG GAC GCC AGC ACC CTG
> GAG GCC ATG CGA GCC CCC CGC TGT GGG GTG CCT GAC GTG GGG GGT TTC CTC ACC TTC GAG
> GGG GAG CTC AAA TGG GAC CAC ATG GAC CTC ACG TAC CGG GTG ATG AAC TAC TCC CCC GAC
> CTG GAC CGT GCC GTG ATA GAT GAT GCC TTC CGG CGG GCA TTC AAG GTG TGG AGT GAT GTC
> ACT CCC CTC ACC TTC ACC CAG ATT TAC AGC GGC GAG GCA GAC ATC ATG ATC ATG TTC GGC
> AGC CAA GAG CAT GGT GAT GGG TAC CCC TTC GAC GGC AAG GAT GGG CTC CTG GCC CAC GCC
> TTT CCC CCC GGC AGT GGG ATT CAG GGC GAT GCC CAC TTC GAT GAT GAT GAG TTC TGG ACT
> CTG GGA ACC GGC TTA GAG GTG AAG ACC CGC TAT GGG AAT GCC AAC GGG GCC AGC TGC CAC
> TTC CCC TTC ATC TTT GAG GGC CGC TCC TAC TCC CGG TGC ATC ACG GAG GGC CGC ACG GAT
> GGG ATG CTG TGG TGT GCC ACC ACC GCC AGC TAC GAC GCC GAC AAG ACC TAC GGC TTC TGC
> CCC AGC GAG CTG CTC TAC ACC AAT GGT GGC AAC AGC GAT GGG TCT CCC TGC GTC TTC CCC
> TTC ATC TTC GAT GGC GCC TCC TAT GAC ACC TGC ACC ACA GAT GGG CGC TCT GAC GGC TAT
> CGC TGG TGT GCC ACC ACG GCC AAC TTC GAC CAG GAC AAG AAA TAC GGC TTC TGC CCC AAC
> CGA --- GAC ACG GCG GCG ATC GGT GGC AAC TCC CAG GGG GAC CCG TGT GTC TTC CCC TTC
> ACC TTC CTG GGG CAG TCC TAC AGC GCG CGC ACC AGC CAG GGC CGG CAG GAC GGG AAG CTC
> TGG TGT GCC ACC ACC AGC AAC TAT GAC ACC GAC AAG AAG TGG GGC TTC TGC CCA GAC AGA
> GGT TAC AGC ATC TTC TTG GTG GCT GCC CAC GAG TTT GGG CAC TCA CTG GGG CTG GAC CAC
> TCC AGC GTG CGC GAG GCA TTG ATG TAC CCT ATG TAC AGC TAC GTC CAG GAC TTC CAG CTG
> CAT GAG GAT GAT GTC CAG GGC ATC CAG TAC CTC TAT GGT CGT GGC TCT GGC CCT GAG CCC
> ACC CCC CCG --- --- --- --- --- GCA CCT TTG --- --- CCC --- --- ACC GAG GAG ---
> --- --- --- --- --- --- CCC CAG TCC ATA --- CCC ACC GAA --- --- --- GCT --- ---
> --- GGC --- --- AGT GCT TCC ACC ACA --- --- --- --- --- --- --- --- GAG GAG GAG
> GAG GAG --- GAG ACA --- CCT GAG CCC ACA GCT GAG --- --- --- --- CCC AGC --- ---
> CCC GTG GAC CCC AGC CGG GAT GCC TGC ATG GAG AAG AAC TTC GAC GCC ATC ACT GAG ATC
> AAT GGA GAG CTG CAC TTC TTC AAG AAT GGG AAA TAC TGG --- --- ACC CAC TCG TCC TTC
> TGG AAA TCA GGC --- --- ACT CAG GGC GCC TTC TCT ATC GCT GAC ACC TGG CCC GGC CTC
> CCG GCT GTC ATC GAC GCG GCG TTC CAA GAT GTG CTC ACC AAG AGG GTC TTC TTC TTC GCG
> GGA CGG CAG TTC TGG GTG TTC TCC GGC AAG AAC GCA GTG GGC CCC CGT AGG ATT GAG AAG
> TTG GGC ATT GGG AAG GAG GCC GGG CGC ATC ACG GGG GCC CTG CAG CGG GGA CGT --- GGC
> AAA GTG CTG CTC TTC AGT GGG GAG CAC TAC TGG AGG CTG GAC GTG AAG GTC CAG ACA GTG
> GAC --- AAG GGC --- TAC CCC CGT GAC ACT GAT GAT GTC TTT ACT GGT GTC CCC CTT GAC
> GCA CGT AAC GTC TTC CTG TAC CAA --- GAC AAG --- TAC CAC TTC TGC CGG GAC AGC TTC
> TAC TGG AGG ATG ACC --- --- --- --- --- CCA CGT --- --- --- --- --- TAC CAG GTG
> GAC CGC GTG GGA TAC ATC AGA TAC GAC CTC CTG CAG TGC CCC
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From ba6450 at wayne.edu  Fri Jul 27 01:20:11 2007
From: ba6450 at wayne.edu (Munirul Islam)
Date: Thu, 26 Jul 2007 21:20:11 -0400 (EDT)
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
Message-ID: <20070726212011.EFB49252@mirapointms6.wayne.edu>

Thanks.  The error is removed now.

I have a question.  Is there any function that I can use to get the sequence list (human, chimp, etc.) after loading an alignment from file?

Munir

---- Original message ----
>Date: Thu, 26 Jul 2007 17:12:03 -0700
>From: "Jason Stajich" <jason at bioperl.org>  
>Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in bioperl)  
>To: "Munirul Islam" <ba6450 at wayne.edu>
>Cc: bioperl-l at lists.open-bio.org
>
>You can try and pass in -interleaved => 0 as another option when you
>init your AlignIO object.
>


From jason at bioperl.org  Fri Jul 27 04:28:36 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Jul 2007 21:28:36 -0700
Subject: [Bioperl-l] Finding the Sequence List in an Alignment
In-Reply-To: <20070726212011.EFB49252@mirapointms6.wayne.edu>
References: <20070726212011.EFB49252@mirapointms6.wayne.edu>
Message-ID: <8273f6c20707262128s23e7e3ebgeb1cb74b3c0baf37@mail.gmail.com>

Have you tried reading the documentation for the Bio::SimpleAlign object?

for my $seq ( $aln->each_seq ) {
 print $seq->display_id, "\n";
}

I'd appreciate if you added some of your questions with the answers to the
FAQ or to other places on the wiki so that other people can benefit from
your learning here.


On 7/26/07, Munirul Islam <ba6450 at wayne.edu> wrote:
>
> Thanks.  The error is removed now.
>
> I have a question.  Is there any function that I can use to get the
> sequence list (human, chimp, etc.) after loading an alignment from file?
>
> Munir
>
> ---- Original message ----
> >Date: Thu, 26 Jul 2007 17:12:03 -0700
> >From: "Jason Stajich" <jason at bioperl.org>
> >Subject: Re: [Bioperl-l] Alignment works with codeml (not loading in
> bioperl)
> >To: "Munirul Islam" <ba6450 at wayne.edu>
> >Cc: bioperl-l at lists.open-bio.org
> >
> >You can try and pass in -interleaved => 0 as another option when you
> >init your AlignIO object.
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki/User:Jason


From arareko at campus.iztacala.unam.mx  Fri Jul 27 15:18:55 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 10:18:55 -0500
Subject: [Bioperl-l] Perl Survey 2007
Message-ID: <46AA0CDF.1030503@campus.iztacala.unam.mx>

It really takes about 5 minutes:

http://perlsurvey.org/

Cheers,
Mauricio.

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From dhoworth at mrc-lmb.cam.ac.uk  Fri Jul 27 16:07:17 2007
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Fri, 27 Jul 2007 17:07:17 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA0CDF.1030503@campus.iztacala.unam.mx>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
Message-ID: <46AA1835.2020004@mrc-lmb.cam.ac.uk>

Mauricio Herrera Cuadra wrote:
> It really takes about 5 minutes:
> http://perlsurvey.org/

and gives all your personal information including email address to
anybody who cares to snoop the HTTP POST message! So there's definitely
no anonymity.

Cheers, Dave


From spiros at lokku.com  Fri Jul 27 16:38:57 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Fri, 27 Jul 2007 17:38:57 +0100
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <46AA1835.2020004@mrc-lmb.cam.ac.uk>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>
	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
Message-ID: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>

On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
> Mauricio Herrera Cuadra wrote:
> > It really takes about 5 minutes:
> > http://perlsurvey.org/
>
> and gives all your personal information including email address to
> anybody who cares to snoop the HTTP POST message! So there's definitely
> no anonymity.

Not to mention that it requires registration (?). Who is behind the
survey ? I am on a number of Perl and Perl related lists and haven't
seen it being mentioned.

Spiros


From arareko at campus.iztacala.unam.mx  Fri Jul 27 17:37:31 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Fri, 27 Jul 2007 12:37:31 -0500
Subject: [Bioperl-l] Perl Survey 2007
In-Reply-To: <bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
References: <46AA0CDF.1030503@campus.iztacala.unam.mx>	<46AA1835.2020004@mrc-lmb.cam.ac.uk>
	<bba689ec0707270938i612840e6r6b4d71ea943b4cc3@mail.gmail.com>
Message-ID: <46AA2D5B.9080304@campus.iztacala.unam.mx>

Spiros Denaxas wrote:
> On 7/27/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
>> Mauricio Herrera Cuadra wrote:
>>> It really takes about 5 minutes:
>>> http://perlsurvey.org/
>> and gives all your personal information including email address to
>> anybody who cares to snoop the HTTP POST message! So there's definitely
>> no anonymity.

I didn't provided any personal information other than my country and 
birthyear. As for my email, I always use the one I have for all the SPAM 
I'd like to subscribe to :)

> Not to mention that it requires registration (?). Who is behind the
> survey ? I am on a number of Perl and Perl related lists and haven't
> seen it being mentioned.

Registration is rather different from confirming your email (which 
prevents filling the DB multiple times by spambots/yourself, thus 
screwing the survey). Who's behind it, its purpose, privacy, etc., 
please read the FAQ:

http://perlsurvey.org/faq/

Cheers,
Mauricio.

> Spiros
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From Alicia.Amadoz at uv.es  Mon Jul 30 15:46:57 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Mon, 30 Jul 2007 17:46:57 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
Message-ID: <1245168492amadoz@uv.es>

Hi, i'm trying to run a bioperl script in linux with standaloneblast
from a webserver but I have the following error:

-------------------- WARNING ---------------------
MSG: cannot find path to blastall
---------------------------------------------------

I have tried several things to fix it as setting some environment
variables both directly through the shell and adding some code in my
script with,

BEGIN {
$ENV{PATH} .= ':/usr/local/blast-2.2.16';
$ENV{BLASTDIR} = '/usr/local/blast-2.2.16/'; 
$ENV{BLASTDATADIR} = '/usr/local/data/';
}

and with,

$local->executable('/usr/local/bin');
my $blast_report = $local->blastall($inputfilename); 

I have also checked that the webserver has permission of read and
execute in all blast executables and directories. But trying all of
these things it keeps showing the same error above.

Any more idea to solve this problem? My script works well when I use it
as a simply script and I've reboot the system several times when changes
where performed. 

Thanks to anyone who will be able to help me!
Regards,
Alicia


From gyang at plantbio.uga.edu  Mon Jul 30 20:58:51 2007
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 30 Jul 2007 16:58:51 -0400
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
Message-ID: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>

I am running remoteblast and using readmethod "xml", I noticed that it is printing the output repeatedly nonstop. It's like in a loop. Did anybody notice this before? Can anybody help me getting out of this?  
Thanks a lot,  
   

Guojun Yang
University of Georgia
  
   
From grafman at graphcomp.com  Sun Jul 29 21:08:04 2007
From: grafman at graphcomp.com (Grafman Productions)
Date: Sun, 29 Jul 2007 14:08:04 -0700
Subject: [Bioperl-l] Perl 3D OpenGL
Message-ID: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>

If this posting is inappropriate, please let me know - my apologies.

I recently came across an article on BioPerl, and it occurred to me that 
there might be some need for 3D rendering within your BioPerl project.

I released a number of new/updated Perl OpenGL (POGL) modules this year, 
along with benchmarks that demonstrate that it performs comparably to C.

If there's a need for 3D features within BioPerl, and if I can be of any 
assistance in helping to add such features, I would enjoy the opportunity. 


From torsten.seemann at infotech.monash.edu.au  Mon Jul 30 23:27:46 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 09:27:46 +1000
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <1245168492amadoz@uv.es>
References: <1245168492amadoz@uv.es>
Message-ID: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>

Alicia,

> Hi, i'm trying to run a bioperl script in linux with standaloneblast
> from a webserver but I have the following error:
> -------------------- WARNING ---------------------
> MSG: cannot find path to blastall
> ---------------------------------------------------
> $ENV{BLASTDATADIR} = '/usr/local/data/';
> $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';

I think the last one (or two) paths should be
'/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
BLAST installation is where the 'blastall' binary actually lives.

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From cjfields at uiuc.edu  Tue Jul 31 00:53:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Jul 2007 19:53:45 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
Message-ID: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>


On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:

> I am running remoteblast and using readmethod "xml", I noticed that  
> it is printing the output repeatedly nonstop. It's like in a loop.  
> Did anybody notice this before? Can anybody help me getting out of  
> this?
> Thanks a lot,
>
>
> Guojun Yang
> University of Georgia

Not seeing that using bioperl-live; you may need to update  
RemoteBlast.pm as this sounds similar to an issue that popped up  
earlier in the spring.

chris


From torsten.seemann at infotech.monash.edu.au  Tue Jul 31 06:24:34 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 31 Jul 2007 16:24:34 +1000
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
References: <20070730205851.e921d31c@dogwood.plantbio.uga.edu>
	<FAFBA7E9-F8C5-439C-B3D8-39AA876F03FA@uiuc.edu>
Message-ID: <a79f6a4b0707302324t261687e7g1012e1f536500c09@mail.gmail.com>

> as this sounds similar to an issue that popped up
> earlier in the spring.

I could have sworn it was autumn! ;-)

-- 
--Torsten Seemann
--Victorian Bioinformatics Consortium, Monash University


From Alicia.Amadoz at uv.es  Tue Jul 31 10:11:54 2007
From: Alicia.Amadoz at uv.es (Alicia Amadoz)
Date: Tue, 31 Jul 2007 12:11:54 +0200 (CEST)
Subject: [Bioperl-l] error using standaloneblast through webserver
In-Reply-To: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
References: <a79f6a4b0707301627u488fd1e8m9022160278b9b652@mail.gmail.com>
Message-ID: <2361686267amadoz@uv.es>

Hi, I tried what you suggested and that was it, it works perfectly.
Thank you very much. 

Regards,
Alicia

> Alicia,
> 
> > Hi, i'm trying to run a bioperl script in linux with standaloneblast
> > from a webserver but I have the following error:
> > -------------------- WARNING ---------------------
> > MSG: cannot find path to blastall
> > ---------------------------------------------------
> > $ENV{BLASTDATADIR} = '/usr/local/data/';
> > $ENV{PATH} .= ':/usr/local/blast-2.2.16';
> > $ENV{BLASTDIR} = '/usr/local/blast-2.2.16/';
> 
> I think the last one (or two) paths should be
> '/usr/local/blast-2.2.16/bin' as the bin/ subdirectory of a standard
> BLAST installation is where the 'blastall' binary actually lives.
> 
> -- 
> --Torsten Seemann
> --Victorian Bioinformatics Consortium, Monash University
> 
> 


From jay at jays.net  Tue Jul 31 12:00:56 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 31 Jul 2007 07:00:56 -0500
Subject: [Bioperl-l] Perl 3D OpenGL
In-Reply-To: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
References: <152401c7d224$8e2455b0$6e4e7c0a@HPONE>
Message-ID: <25A5F0A3-1CC3-46B5-8976-A24C451204E7@jays.net>

On Jul 29, 2007, at 4:08 PM, Grafman Productions wrote:
> If this posting is inappropriate, please let me know - my apologies.

Not at all. AFAIK this is the perfect place to discuss any  
contributions you're motivated to make to the BioPerl project.

> I recently came across an article on BioPerl, and it occurred to me  
> that
> there might be some need for 3D rendering within your BioPerl project.
>
> I released a number of new/updated Perl OpenGL (POGL) modules this  
> year,
> along with benchmarks that demonstrate that it performs comparably  
> to C.
>
> If there's a need for 3D features within BioPerl, and if I can be  
> of any
> assistance in helping to add such features, I would enjoy the  
> opportunity.

I know nothing about 3D modeling in biology, nor do I hang out with  
any protein structure folks, but 3D always sounds sexy. -grin-

If you're new to bioinformatics (I certainly am) you might want to  
read this:

   http://en.wikipedia.org/wiki/Protein_structure

Because that's probably where your 3D work would be used. Especially  
note the "Software" section, where you'll find some of the  
"competition".  :)

There's some cool stuff out there. I don't know what all would or  
wouldn't be time well spent in Perl / BioPerl.

HTH,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From cjfields at uiuc.edu  Tue Jul 31 16:51:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 31 Jul 2007 11:51:42 -0500
Subject: [Bioperl-l] nonstop repeated output from Remote_blast with xml
In-Reply-To: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
References: <20070731104052.b4b93021@dogwood.plantbio.uga.edu>
Message-ID: <7A2D7E4A-4024-48DB-88C8-063388A98419@uiuc.edu>

Make sure to keep responses on the ail list.

You might want to run a full install, just in case.  If I remember  
correctly Sendu made some changes a while back in the BLAST-related  
modules which may be related to this.  At the very least install/ 
upgrade all modules in Bio::Tools::Run.

chris

On Jul 31, 2007, at 9:40 AM, Guojun Yang wrote:

> Thanks, Chris,
> But when I replaced the old RemoteBlast.pm with the new one, I got  
> "can't locate the object method "retrieve_parameter"". Does this  
> mean I need to install something else?
> Guojun
>
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] nonstop repeated output from Remote_blast  
> with xml
>
>
>>> On Jul 30, 2007, at 3:58 PM, Guojun Yang wrote:
>>>> I am running remoteblast and using readmethod "xml", I noticed that
>>> it is printing the output repeatedly nonstop. It's like in a loop.
>>> Did anybody notice this before? Can anybody help me getting out of
>>> this?
>>> Thanks a lot,
>>>
>>>
>>> Guojun Yang
>>> University of Georgia
>>> Not seeing that using bioperl-live; you may need to update
>> RemoteBlast.pm as this sounds similar to an issue that popped up
>> earlier in the spring.
>>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign