From dmessina at wustl.edu Sun Jul 1 01:38:48 2007 From: dmessina at wustl.edu (David Messina) Date: Sun, 1 Jul 2007 00:38:48 -0500 Subject: [Bioperl-l] svn auto-properties [was Re: First cut svn repository] In-Reply-To: <46869226.70203@sheffield.ac.uk> References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <5764264E-5C40-4C9E-B1C9-A70628AC1DD0@uiuc.edu> <18051.44281.831316.749586@almost.alerce.com> <18051.61992.627473.323346@almost.alerce.com> <4684AF3D.5090907@sheffield.ac.uk> <843758CD-9C5B-4DDA-9FF4-B90AA225BDB3@wustl.edu> <468628AC.9060200@sheffield.ac.uk> <461F64B9-87FD-458A-8945-8238E7076109@wustl.edu> <46869226.70203@sheffield.ac.uk> Message-ID: <3164A6E3-77CF-4E61-9609-1408768862B1@wustl.edu> > [Nath] > I think the list of seq formats recognised by Bioperl in Bio::SeqIO > and > Bio::AlignIO would be a good start. As these are likely to be the ones > that are sensitive to file format recognition and thus could break > tests > if renamed. Sounds good to me. I will do a quick tour of the rest of the repo looking for other common or important file extensions, but I don't expect there to be many if any. > [still Nath] > I think a lot of people have used "." in file names as an > alternative to > a space. I think it would be beneficial to use an underscore "_" in > these cases and leave the "." to represent the beginning of the file > extension. That's a great idea. > [Chris] > Do we need to define every filetype extension, or can there be a > fallback (eg if it isn't on the list or has no extension it's plain > text)? For every file that's added, svn takes a peek to see if it's human- readable. If not, it's tagged with the generic MIME type application/ octet-stream. (It does this so it knows not to try to do diffs and merges on a binary file.) So the default for a human-readable file is no MIME type, which I believe is essentially the same thing as text/plain. And then regardless of the outcome of svn's peek, any matching auto- props are then applied, overriding svn's choice. So if we don't define every extension, I think we'll be fine. It'd be nice to have everything tagged with a MIME type, though. For one thing, Apache will use it to do the right thing when people browse the repo over the web. And two, because metadata is cool. :) One more thing: in the course of reading up on this, I learned that my earlier expectation about multiple auto-prop matches was incorrect. It's true that multiple unrelated matches means that multiple properties are set on the file. But when a file matches multiple *conflicting* auto-property patterns, there's no telling which value it'll get. Dave From hartzell at alerce.com Sun Jul 1 12:29:29 2007 From: hartzell at alerce.com (George Hartzell) Date: Sun, 1 Jul 2007 09:29:29 -0700 Subject: [Bioperl-l] First cut svn repository In-Reply-To: References: <3097065.1181941697249.JavaMail.myubc2@brahms.my.ubc.ca> <185BDA34-1449-49CA-B146-ADF27D2928CD@gmx.net> <8D3B697E-2072-46FE-A1C9-E546D9DEAA45@uiuc.edu> <4673C7CB.1030709@mail.nih.gov> <410EF5F9-A30E-4AB7-85F7-7E761E3890D5@uiuc.edu> <18049.30026.61328.134490@almost.alerce.com> <4683A7D1.8070403@sendu.me.uk> <18051.48684.996884.134046@almost.alerce.com> <4683C385.3050904@sendu.me.uk> <18051.63674.685297.426813@almost.alerce.com> <18052.3946.224905.415905@almost.alerce.com> <2159ED58-E6F4-4ED8-AC23-E8BAF69FE240@gmx.net> Message-ID: <18055.54889.677775.868974@almost.alerce.com> Hilmar Lapp writes: > It turns out that both files are also present on the release-0-9-3, > bioperl-1-0-0, bioperl-1-0-alpha, and bioperl-1-0-alpha2-rc tags, so add > > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/release-0-9-3/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-0/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha/t/data/ > HUMBETGLOA.fasta > $ svn rm -m "Removing offending duplicate" svn+ssh://dev.open-bio.org/ > home/hartzell/bioperl/bioperl-live/tags/bioperl-1-0-alpha2-rc/t/data/ > HUMBETGLOA.fasta > > to the post-processing commands. > [...] Will do. Thanks for working out the incantations! g. From cjfields at uiuc.edu Mon Jul 2 09:26:06 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:26:06 -0500 Subject: [Bioperl-l] test data Message-ID: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> I am planing on adding test data to cvs for eutils and have run across some stuff in bugzilla that needs to be added as well. Should we, as convention, start adding data sequestered to a fold with the test name, within t/data? This might make life easier in the long run (keep track of files, get rid of old files, etc), and may make it easier for wrapping up the correct data with tests if we start submitting single module CPAN updates. chris From cjfields at uiuc.edu Mon Jul 2 09:52:27 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 2 Jul 2007 08:52:27 -0500 Subject: [Bioperl-l] test data In-Reply-To: <468901C1.8020505@sendu.me.uk> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> Message-ID: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Chris Fields wrote: >> I am planing on adding test data to cvs for eutils and have run >> across some stuff in bugzilla that needs to be added as well. >> Should we, as convention, start adding data sequestered to a fold >> with the test name, within t/data? > > I'd actually argue that this shouldn't be done: data is sometimes > reused amongst multiple different test scripts, and when looking > for data to reuse its easier to spot it in a single directory > compared to searching through multiple directories. > > >> This might make life easier in the long run (keep track of files, >> get rid of old files, etc), and may make it easier for wrapping up >> the correct data with tests if we start submitting single module >> CPAN updates. > > I don't think that will be an issue. The automated process would > read the test script and see what input files it uses, copying > those into the archive. So, just be sure to standardise on using > test_input_file() to make that possible. > > > That said, I wouldn't mind especially either way. Just don't do it > now, since test script names (and therefore the name of the > directory you'd want to store the input files in) might all change. > > > In fact we can imagine that we have a test script t/ > BioZombieKitten.t which stores its test data in t/data/ > BioZombieKitten/input.file but the script gets the path to this > file by: > my $input_file = test_input_file('input.file'); > > test_input_file() is then implemented to look for the file in the > subdir of data corresponding to the script name if we're dealing > with the 900-modules-in-a-package checkout-type situation, but just > in t/data if we're in the one-module-in-a-package situation. > > In any case, things will be most flexible if you drop files > directly into t/data for now and reference them without any subdirs > in the call to test_input_file(). Fine by me, I just find it very cluttered. BioZombieKitten?!? chris From bix at sendu.me.uk Mon Jul 2 10:00:37 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 15:00:37 +0100 Subject: [Bioperl-l] test data In-Reply-To: <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> <468901C1.8020505@sendu.me.uk> <61239EEF-D079-4D47-BDCD-A2B5CCC1C84D@uiuc.edu> Message-ID: <46890505.1070707@sendu.me.uk> Chris Fields wrote: > On Jul 2, 2007, at 8:46 AM, Sendu Bala wrote: > Fine by me, I just find it very cluttered. Yes, I agree. I also wish we had a decent naming convention for files. (Ie. it would be nice to have a good idea what a file was for without having to study the test script that uses it.) > BioZombieKitten?!? I get Bio/perl/ and Bio/ware/ confused in my head ;) http://forums.bioware.com/viewtopic.html?topic=562916&forum=84 From bix at sendu.me.uk Mon Jul 2 09:46:41 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 02 Jul 2007 14:46:41 +0100 Subject: [Bioperl-l] test data In-Reply-To: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> References: <82E2A57B-CB7B-4437-94A1-7AEFCA6A4F5F@uiuc.edu> Message-ID: <468901C1.8020505@sendu.me.uk> Chris Fields wrote: > I am planing on adding test data to cvs for eutils and have run across > some stuff in bugzilla that needs to be added as well. > > Should we, as convention, start adding data sequestered to a fold with > the test name, within t/data? I'd actually argue that this shouldn't be done: data is sometimes reused amongst multiple different test scripts, and when looking for data to reuse its easier to spot it in a single directory compared to searching through multiple directories. > This might make life easier in the long > run (keep track of files, get rid of old files, etc), and may make it > easier for wrapping up the correct data with tests if we start > submitting single module CPAN updates. I don't think that will be an issue. The automated process would read the test script and see what input files it uses, copying those into the archive. So, just be sure to standardise on using test_input_file() to make that possible. That said, I wouldn't mind especially either way. Just don't do it now, since test script names (and therefore the name of the directory you'd want to store the input files in) might all change. In fact we can imagine that we have a test script t/BioZombieKitten.t which stores its test data in t/data/BioZombieKitten/input.file but the script gets the path to this file by: my $input_file = test_input_file('input.file'); test_input_file() is then implemented to look for the file in the subdir of data corresponding to the script name if we're dealing with the 900-modules-in-a-package checkout-type situation, but just in t/data if we're in the one-module-in-a-package situation. In any case, things will be most flexible if you drop files directly into t/data for now and reference them without any subdirs in the call to test_input_file(). From hlapp at gmx.net Mon Jul 2 16:02:37 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 16:02:37 -0400 Subject: [Bioperl-l] Take 2 of the new subversion repository. In-Reply-To: <18054.63942.316904.413911@almost.alerce.com> References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: Just FYI, after applying the changes I've been sending, I was able to check out the repository in its entirety. -hilmar On Jun 30, 2007, at 8:48 PM, George Hartzell wrote: > > There's a second cut at the subversion repository. I've done a better > job of setting svn:keywords and svn:eol-style on various files. The > defaults were more cautious and I used an auto-props files based on > the wiki version. > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take2 > > The old repository's still around as > > svn+ssh://dev.open-bio.org/home/hartzell/bioperl_take1 > > I renamed it so that people would work with it by mistake. If, for > some hard-to-imagine reason, you have a working copy that you want to > run against it, you should be able to do an svn switch --relocate on > your working copy and be back in shape. In fact, it might be a good > time to give it a try.... > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wrp at virginia.edu Mon Jul 2 16:08:04 2007 From: wrp at virginia.edu (William R. Pearson) Date: Mon, 2 Jul 2007 16:08:04 -0400 Subject: [Bioperl-l] Course: Computational and Comparative Genomics Message-ID: <4B3F66D7-CF05-4CD1-A148-272B4B56FBD4@virginia.edu> Course announcement - Application deadline, July 15, 2007 ================================================================ Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 7 - 13, 200 Application Deadline: July 15, 2007 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include gene finding, dentifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2006 course, see: http://fasta.bioch.virginia.edu/cshl06 ================================================================ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/courses/courseapplication.asp ================================================================ Bill Pearson From niels at genomics.dk Mon Jul 2 16:45:07 2007 From: niels at genomics.dk (Niels Larsen) Date: Mon, 02 Jul 2007 22:45:07 +0200 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: References: <18054.63942.316904.413911@almost.alerce.com> Message-ID: <468963D3.3000007@genomics.dk> I write hoping someone could show me how to create a PrimarySeq object without parsing features and all first. The lines below return "Can't locate object method "next_seq" via package "Bio::PrimarySeq" at ./tst2 line 16." whereas calling Bio::SeqIO-> gives no error, but a too big object. The GenBank record after the __END__ is the "1.gb" file. I could not find out how from the tutorial or the Bio::PrimarySeq description. Niels L #!/usr/bin/env perl use strict; use warnings FATAL => qw ( all ); use Data::Dumper; use Bio::Seq; use Bio::SeqIO; my ( $seq_h, $seq ); $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", -format => 'genbank' ); $seq = $seq_h->next_seq(); # print Dumper( $seq ); __END__ LOCUS X60065 9 bp mRNA linear MAM 14-NOV-2006 DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. ACCESSION X60065 REGION: 1..9 VERSION X60065.1 GI:5 KEYWORDS beta-2 glycoprotein I. SOURCE Bos taurus (cattle) ORGANISM Bos taurus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos. REFERENCE 1 AUTHORS Bendixen,E., Halkier,T., Magnusson,S., Sottrup-Jensen,L. and Kristensen,T. TITLE Complete primary structure of bovine beta 2-glycoprotein I: localization of the disulfide bridges JOURNAL Biochemistry 31 (14), 3611-3617 (1992) PUBMED 1567819 REFERENCE 2 (bases 1 to 9) AUTHORS Kristensen,T. TITLE Direct Submission JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of Mol Biology, University of Aarhus, C F Mollers Alle 130, DK-8000 Aarhus C, DENMARK FEATURES Location/Qualifiers source 1..9 /organism="Bos taurus" /mol_type="mRNA" /db_xref="taxon:9913" /clone="pBB2I" /tissue_type="liver" gene <1..>9 /gene="beta-2-gpI" CDS <1..>9 /gene="beta-2-gpI" /codon_start=1 /product="beta-2-glycoprotein I" /protein_id="CAA42669.1" /db_xref="GI:6" /db_xref="GOA:P17690" /db_xref="UniProtKB/Swiss-Prot:P17690" /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT DASDVKPC" sig_peptide <1..>9 /gene="beta-2-gpI" ORIGIN 1 ccagcgctc // From Kevin.M.Brown at asu.edu Mon Jul 2 17:35:12 2007 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 2 Jul 2007 14:35:12 -0700 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <468963D3.3000007@genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> Message-ID: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Start by having a look at the following link: http://bioperl.org/cgi-bin/deob_interface.cgi SeqIO is how one reads or writes sequences to/from files. Bio::PrimarySeq is just an object that holds information about a sequence obtained from a file. As for how to parse a Genbank file into a list of features: $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); while (my $seq = $file->next_seq()) { @features = $seq->all_SeqFeatures; # sort features by their primary tags for my $f (@features) { my $tag = $f->primary_tag; if ($tag eq 'CDS') { # @sorted_features holds all the Bio::PrimarySeq features obtained from the genbank file push @sorted_features, $f; } } } > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Niels Larsen > Sent: Monday, July 02, 2007 1:45 PM > Cc: bioperl-l List > Subject: [Bioperl-l] simple PrimarySeq question > > I write hoping someone could show me how to create a > PrimarySeq object without parsing features and all first. The > lines below return > > "Can't locate object method "next_seq" via package > "Bio::PrimarySeq" at ./tst2 line 16." > > whereas calling Bio::SeqIO-> gives no error, but a too big object. > The GenBank record after the __END__ is the "1.gb" file. I > could not find out how from the tutorial or the > Bio::PrimarySeq description. > > Niels L > > > #!/usr/bin/env perl > > use strict; > use warnings FATAL => qw ( all ); > > use Data::Dumper; > > use Bio::Seq; > use Bio::SeqIO; > > my ( $seq_h, $seq ); > > $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => > 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", > -format => 'genbank' ); > > $seq = $seq_h->next_seq(); > > # print Dumper( $seq ); > > __END__ > > LOCUS X60065 9 bp mRNA linear > MAM 14-NOV-2006 > DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. > ACCESSION X60065 REGION: 1..9 > VERSION X60065.1 GI:5 > KEYWORDS beta-2 glycoprotein I. > SOURCE Bos taurus (cattle) > ORGANISM Bos taurus > Eukaryota; Metazoa; Chordata; Craniata; > Vertebrata; Euteleostomi; > Mammalia; Eutheria; Laurasiatheria; > Cetartiodactyla; Ruminantia; > Pecora; Bovidae; Bovinae; Bos. > REFERENCE 1 > AUTHORS Bendixen,E., Halkier,T., Magnusson,S., > Sottrup-Jensen,L. and > Kristensen,T. > TITLE Complete primary structure of bovine beta > 2-glycoprotein I: > localization of the disulfide bridges > JOURNAL Biochemistry 31 (14), 3611-3617 (1992) > PUBMED 1567819 > REFERENCE 2 (bases 1 to 9) > AUTHORS Kristensen,T. > TITLE Direct Submission > JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of > Mol Biology, > University of Aarhus, C F Mollers Alle 130, > DK-8000 Aarhus C, > DENMARK > FEATURES Location/Qualifiers > source 1..9 > /organism="Bos taurus" > /mol_type="mRNA" > /db_xref="taxon:9913" > /clone="pBB2I" > /tissue_type="liver" > gene <1..>9 > /gene="beta-2-gpI" > CDS <1..>9 > /gene="beta-2-gpI" > /codon_start=1 > /product="beta-2-glycoprotein I" > /protein_id="CAA42669.1" > /db_xref="GI:6" > /db_xref="GOA:P17690" > /db_xref="UniProtKB/Swiss-Prot:P17690" > > /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI > > VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT > > ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN > > SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN > > PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER > > VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT > DASDVKPC" > sig_peptide <1..>9 > /gene="beta-2-gpI" > ORIGIN > 1 ccagcgctc > // > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From niels at genomics.dk Mon Jul 2 20:41:24 2007 From: niels at genomics.dk (niels at genomics.dk) Date: Tue, 3 Jul 2007 02:41:24 +0200 (CEST) Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> Message-ID: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Kevin, Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO gets entries from file, and from those large parsed entries I can get a simplified primary_seq object. But the SeqIO object includes feature and annotation objects etc that takes time to make, and I wish to know if there is a way to get a primari_seq object without this overhead. I apologize if I overlooked it in the docs. Niels > Start by having a look at the following link: > http://bioperl.org/cgi-bin/deob_interface.cgi > > SeqIO is how one reads or writes sequences to/from files. > Bio::PrimarySeq is just an object that holds information about a > sequence obtained from a file. > > As for how to parse a Genbank file into a list of features: > > $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); > while (my $seq = $file->next_seq()) > { > @features = $seq->all_SeqFeatures; > # sort features by their primary tags > for my $f (@features) > { > my $tag = $f->primary_tag; > if ($tag eq 'CDS') > { > # @sorted_features holds all the Bio::PrimarySeq > features obtained from the genbank file > push @sorted_features, $f; > } > } > } > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Niels Larsen >> Sent: Monday, July 02, 2007 1:45 PM >> Cc: bioperl-l List >> Subject: [Bioperl-l] simple PrimarySeq question >> >> I write hoping someone could show me how to create a >> PrimarySeq object without parsing features and all first. The >> lines below return >> >> "Can't locate object method "next_seq" via package >> "Bio::PrimarySeq" at ./tst2 line 16." >> >> whereas calling Bio::SeqIO-> gives no error, but a too big object. >> The GenBank record after the __END__ is the "1.gb" file. I >> could not find out how from the tutorial or the >> Bio::PrimarySeq description. >> >> Niels L >> >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings FATAL => qw ( all ); >> >> use Data::Dumper; >> >> use Bio::Seq; >> use Bio::SeqIO; >> >> my ( $seq_h, $seq ); >> >> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >> -format => 'genbank' ); >> >> $seq = $seq_h->next_seq(); >> >> # print Dumper( $seq ); >> >> __END__ >> >> LOCUS X60065 9 bp mRNA linear >> MAM 14-NOV-2006 >> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >> ACCESSION X60065 REGION: 1..9 >> VERSION X60065.1 GI:5 >> KEYWORDS beta-2 glycoprotein I. >> SOURCE Bos taurus (cattle) >> ORGANISM Bos taurus >> Eukaryota; Metazoa; Chordata; Craniata; >> Vertebrata; Euteleostomi; >> Mammalia; Eutheria; Laurasiatheria; >> Cetartiodactyla; Ruminantia; >> Pecora; Bovidae; Bovinae; Bos. >> REFERENCE 1 >> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >> Sottrup-Jensen,L. and >> Kristensen,T. >> TITLE Complete primary structure of bovine beta >> 2-glycoprotein I: >> localization of the disulfide bridges >> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >> PUBMED 1567819 >> REFERENCE 2 (bases 1 to 9) >> AUTHORS Kristensen,T. >> TITLE Direct Submission >> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >> Mol Biology, >> University of Aarhus, C F Mollers Alle 130, >> DK-8000 Aarhus C, >> DENMARK >> FEATURES Location/Qualifiers >> source 1..9 >> /organism="Bos taurus" >> /mol_type="mRNA" >> /db_xref="taxon:9913" >> /clone="pBB2I" >> /tissue_type="liver" >> gene <1..>9 >> /gene="beta-2-gpI" >> CDS <1..>9 >> /gene="beta-2-gpI" >> /codon_start=1 >> /product="beta-2-glycoprotein I" >> /protein_id="CAA42669.1" >> /db_xref="GI:6" >> /db_xref="GOA:P17690" >> /db_xref="UniProtKB/Swiss-Prot:P17690" >> >> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >> >> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >> >> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >> >> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >> >> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >> >> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >> DASDVKPC" >> sig_peptide <1..>9 >> /gene="beta-2-gpI" >> ORIGIN >> 1 ccagcgctc >> // >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Mon Jul 2 22:36:19 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 2 Jul 2007 22:36:19 -0400 Subject: [Bioperl-l] simple PrimarySeq question In-Reply-To: <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> References: <18054.63942.316904.413911@almost.alerce.com> <468963D3.3000007@genomics.dk> <1A4207F8295607498283FE9E93B775B403576504@EX02.asurite.ad.asu.edu> <23897.85.82.195.151.1183423284.squirrel@mail.genomics.dk> Message-ID: <84F5C120-FE0B-472D-8F1B-026AD238E959@gmx.net> Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have examples for what you want to do: use Bio::SeqIO; # usually you won't instantiate this yourself - a SeqIO object - # you will have one already my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank"); my $builder = $seqin->sequence_builder(); # if you need only sequence, id, and description (e.g. for # conversion to FASTA format): $builder->want_none(); $builder->add_wanted_slot('display_id','desc','seq'); # if you want everything except the sequence and features $builder->want_all(1); # this is the default if it's untouched $builder->add_unwanted_slot('seq','features'); Let us know if that doesn't answer your question. Note that this is currently only implemented for Genbank format. -hilmar On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote: > Kevin, > > Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO > gets entries from file, and from those large parsed entries I can > get a > simplified primary_seq object. But the SeqIO object includes feature > and annotation objects etc that takes time to make, and I wish to know > if there is a way to get a primari_seq object without this overhead. I > apologize if I overlooked it in the docs. > > Niels > > > > >> Start by having a look at the following link: >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> SeqIO is how one reads or writes sequences to/from files. >> Bio::PrimarySeq is just an object that holds information about a >> sequence obtained from a file. >> >> As for how to parse a Genbank file into a list of features: >> >> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb"); >> while (my $seq = $file->next_seq()) >> { >> @features = $seq->all_SeqFeatures; >> # sort features by their primary tags >> for my $f (@features) >> { >> my $tag = $f->primary_tag; >> if ($tag eq 'CDS') >> { >> # @sorted_features holds all the Bio::PrimarySeq >> features obtained from the genbank file >> push @sorted_features, $f; >> } >> } >> } >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Niels Larsen >>> Sent: Monday, July 02, 2007 1:45 PM >>> Cc: bioperl-l List >>> Subject: [Bioperl-l] simple PrimarySeq question >>> >>> I write hoping someone could show me how to create a >>> PrimarySeq object without parsing features and all first. The >>> lines below return >>> >>> "Can't locate object method "next_seq" via package >>> "Bio::PrimarySeq" at ./tst2 line 16." >>> >>> whereas calling Bio::SeqIO-> gives no error, but a too big object. >>> The GenBank record after the __END__ is the "1.gb" file. I >>> could not find out how from the tutorial or the >>> Bio::PrimarySeq description. >>> >>> Niels L >>> >>> >>> #!/usr/bin/env perl >>> >>> use strict; >>> use warnings FATAL => qw ( all ); >>> >>> use Data::Dumper; >>> >>> use Bio::Seq; >>> use Bio::SeqIO; >>> >>> my ( $seq_h, $seq ); >>> >>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format => >>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb", >>> -format => 'genbank' ); >>> >>> $seq = $seq_h->next_seq(); >>> >>> # print Dumper( $seq ); >>> >>> __END__ >>> >>> LOCUS X60065 9 bp mRNA linear >>> MAM 14-NOV-2006 >>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I. >>> ACCESSION X60065 REGION: 1..9 >>> VERSION X60065.1 GI:5 >>> KEYWORDS beta-2 glycoprotein I. >>> SOURCE Bos taurus (cattle) >>> ORGANISM Bos taurus >>> Eukaryota; Metazoa; Chordata; Craniata; >>> Vertebrata; Euteleostomi; >>> Mammalia; Eutheria; Laurasiatheria; >>> Cetartiodactyla; Ruminantia; >>> Pecora; Bovidae; Bovinae; Bos. >>> REFERENCE 1 >>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S., >>> Sottrup-Jensen,L. and >>> Kristensen,T. >>> TITLE Complete primary structure of bovine beta >>> 2-glycoprotein I: >>> localization of the disulfide bridges >>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992) >>> PUBMED 1567819 >>> REFERENCE 2 (bases 1 to 9) >>> AUTHORS Kristensen,T. >>> TITLE Direct Submission >>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of >>> Mol Biology, >>> University of Aarhus, C F Mollers Alle 130, >>> DK-8000 Aarhus C, >>> DENMARK >>> FEATURES Location/Qualifiers >>> source 1..9 >>> /organism="Bos taurus" >>> /mol_type="mRNA" >>> /db_xref="taxon:9913" >>> /clone="pBB2I" >>> /tissue_type="liver" >>> gene <1..>9 >>> /gene="beta-2-gpI" >>> CDS <1..>9 >>> /gene="beta-2-gpI" >>> /codon_start=1 >>> /product="beta-2-glycoprotein I" >>> /protein_id="CAA42669.1" >>> /db_xref="GI:6" >>> /db_xref="GOA:P17690" >>> /db_xref="UniProtKB/Swiss-Prot:P17690" >>> >>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI >>> >>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT >>> >>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN >>> >>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN >>> >>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER >>> >>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT >>> DASDVKPC" >>> sig_peptide <1..>9 >>> /gene="beta-2-gpI" >>> ORIGIN >>> 1 ccagcgctc >>> // >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From ewijaya at gmail.com Tue Jul 3 02:56:30 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 14:56:30 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at gmail.com Tue Jul 3 03:00:16 2007 From: ewijaya at gmail.com (Edward Wijaya) Date: Tue, 3 Jul 2007 15:00:16 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 Message-ID: <3521d3670707030000t5ab77608x264d49125255a6d1@mail.gmail.com> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. Similarly my script that uses GD.pm doesn't execute. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward From ewijaya at i2r.a-star.edu.sg Tue Jul 3 02:35:12 2007 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Tue, 3 Jul 2007 14:35:12 +0800 Subject: [Bioperl-l] Problem with GD.pm version 2.35 References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net> Message-ID: <3ACF03E372996C4EACD542EA8A05E66A26EB85@mailbe01.teak.local.net> Dear all, I was trying to perform check with this command: $ perl -MGD -e 'print $GD::VERSION'; And it gave: GD object version 2.32 does not match $GD::VERSION 2.35 at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. Compilation failed in require. BEGIN failed--compilation aborted. I have installed the latest version of libgd version 2.0.35 downloaded from http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 Can anybody suggest how can I resolve my problem? This is my Perl version: This is perl, v5.8.8 built for i386-linux-thread-multi -- Edward ------------ Institute For Infocomm Research - Disclaimer -------------This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.-------------------------------------------------------- From lstein at cshl.edu Tue Jul 3 10:41:26 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 3 Jul 2007 10:40:26 -0401 Subject: [Bioperl-l] Problem with GD.pm version 2.35 In-Reply-To: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> References: <3521d3670707022356tbc38694mfcb5185b1dfc005d@mail.gmail.com> Message-ID: <6dce9a0b0707030741r52b8d0beq757a8faf982e1f2f@mail.gmail.com> This happens when there is a mismatch between the compiled (.so) portion of GD and the perl (.pm) version. Typically it occurs when you have installed GD incorrectly by, e.g., copying the .pm file into position rather than using the make file. Solution: Uninstall old versions of GD by manually finding all occurrences of GD.so and GD.pm and removing them. Then reinstall the correct way. Lincoln On 7/3/07, Edward Wijaya wrote: > > Dear all, > I was trying to perform check with this command: > > $ perl -MGD -e 'print $GD::VERSION'; > > And it gave: > > GD object version 2.32 does not match $GD::VERSION 2.35 at > /usr/lib/perl5/5.8.8/i386-linux-thread-multi/DynaLoader.pm line 253. > Compilation failed in require. > BEGIN failed--compilation aborted. > > Similarly my script that uses GD.pm doesn't execute. > > > I have installed the latest version of libgd version 2.0.35 downloaded > from > http://www.libgd.org/Downloads#Download_the_latest_.282.0.35.29 > > Can anybody suggest how can I resolve my problem? > > This is my Perl version: > This is perl, v5.8.8 built for i386-linux-thread-multi > > -- > Edward > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From cjfields at uiuc.edu Wed Jul 4 01:45:16 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 00:45:16 -0500 Subject: [Bioperl-l] genbank2gff3 - Name attribute? Message-ID: I noticed that genbank2gff3.pl doesn't have an explicitly defined way of converting the gene/locus/etc name to a Name tag (for, say, GBrowse). Any particular reason? Should I stick with GFF2 for now? chris From bix at sendu.me.uk Wed Jul 4 06:00:31 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 04 Jul 2007 11:00:31 +0100 Subject: [Bioperl-l] Splitting Bioperl Message-ID: <468B6FBF.1070708@sendu.me.uk> To summarise some previous threads: http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/focus=15409 # Bioperl is currently one monolithic distribution of ~900 modules # There is some desire to split it up into smaller functional groups # There are some problems with that proposal # An extreme variant of that proposal is to make the groups individual modules Following this discussion: http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html (especially Adam Kennedy's postings of 4/07, soon to appear in that archive), the extreme variant doesn't seem like a good idea. I'm now suggesting that Steve's original split idea, as modified/expanded by Adam's driver and other ideas, is the best choice. The problems I previously identified can be solved in the same way they were solved in my extreme variant: the splits are done by Build.PL automation working on a single repository/code-base, not by splitting things up at the repository level. As I see it, the way forward now is for someone interested enough to decide on the specifics of how things will be split and offer them up to the group for discussion. I don't mean vague possibilities of what might work as a split, but rather some real thought should go into it to make sure the split makes sense and will actually work in practice. Following that, the splits can be implemented by some automated dist action of Build.PL. If there isn't sufficient interest to make this happen, I don't see that as a terrible thing. There are benefits to keeping Bioperl monolithic, and some of the problems (eg. lack of updates) can be solved without changing its nature. From cjfields at uiuc.edu Wed Jul 4 10:53:45 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 09:53:45 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <468B6FBF.1070708@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> Message-ID: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> On Jul 4, 2007, at 5:00 AM, Sendu Bala wrote: > To summarise some previous threads: > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15315 > http://thread.gmane.org/gmane.comp.lang.perl.bio.general/15338/ > focus=15409 > > # Bioperl is currently one monolithic distribution of ~900 modules > # There is some desire to split it up into smaller functional groups > # There are some problems with that proposal > # An extreme variant of that proposal is to make the groups individual > modules > > > Following this discussion: > http://www.nntp.perl.org/group/perl.modules/2007/07/msg55160.html > (especially Adam Kennedy's postings of 4/07, soon to appear in that > archive), the extreme variant doesn't seem like a good idea. brian d foy made some sound arguments against it as well. > I'm now suggesting that Steve's original split idea, as > modified/expanded by Adam's driver and other ideas, is the best > choice. > The problems I previously identified can be solved in the same way > they > were solved in my extreme variant: the splits are done by Build.PL > automation working on a single repository/code-base, not by splitting > things up at the repository level. > > As I see it, the way forward now is for someone interested enough to > decide on the specifics of how things will be split and offer them > up to > the group for discussion. I don't mean vague possibilities of what > might > work as a split, but rather some real thought should go into it to > make > sure the split makes sense and will actually work in practice. We've already identified a few (SearchIO, Tools, GBrowse-related, etc). ... > If there isn't sufficient interest to make this happen, I don't see > that > as a terrible thing. There are benefits to keeping Bioperl monolithic, > and some of the problems (eg. lack of updates) can be solved without > changing its nature. If so, proposals that solve this problem need to be made as well. If we stay monolithic, then here's mine: we start having fixed, regularly timed dev releases like Parrot, monthly or bimonthly (quite common on CPAN), with brief release reports on which bugs have been fixed, code has been added, so on. Not every bug has to be fixed per dev release; if that were true there would never be releases for some of the XML parser packages. No RCs for dev releases (it's a dev release!). These would be 1.x.y. We can then, every once in a while, have a bug-squashing session, hackathon, etc, and have regular non-dev release (1.x) that all core devs accept and that passes a particular milestone. As for the advantage of a split approach, as mentioned previously it is to focus modules/tests/scripts into groups with related functions. Even just splitting off ones with external reqs (XML parsers, GD, etc) into an 'aux' release would be an advantage, as it doesn't confront a new user with the burden of installing a large list of dependencies, some of which may be complicated for a perl newbie to either install from scratch (DBD::mysql, GD) or to get the latest bug-fixed prereq release for their OS (the recent debacle with XML::SAX::Expat issues come to mind, which wasn't immediately available for win32 as a PPM). I'm fairly open to any approach as long as it's reasonably though out, though I am admittedly a bit biased towards the split approach. I do think some change is in order; I worry about there ever being a 1.6 release at this point. chris From davila at ioc.fiocruz.br Wed Jul 4 13:11:20 2007 From: davila at ioc.fiocruz.br (Alberto Davila) Date: Wed, 04 Jul 2007 14:11:20 -0300 Subject: [Bioperl-l] ESTs in EST format Message-ID: <468BD4B8.5050105@ioc.fiocruz.br> Dear All, I am trying to get all ESTs from a given species (eg: Trypanosoma brucei) from Genbank in EST format (eg: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucest&id=10280980)... while using Entrez I can "display" individual EST entries in EST format, this "EST format" is not an option in the main "display" menu for batch download ... I dont see the EST format listed (http://www.bioperl.org/wiki/Sequence_formats) among the ones that SeqIO deal with, so wonder there would another BioPerl module to do this ? any tips, would be greatly appreciated ;-) Kindest regards, Alberto From jason at bioperl.org Wed Jul 4 13:52:59 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 10:52:59 -0700 Subject: [Bioperl-l] ESTs in EST format In-Reply-To: <468BD4B8.5050105@ioc.fiocruz.br> References: <468BD4B8.5050105@ioc.fiocruz.br> Message-ID: Currently we don't support this format as far as I know it isn't a published standard nor is it a format that you NCBI distributes this data in flat format for (i.e. genbank dumps). Is there any reason why you can't get what you need from the GenBank format? http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucest&qty=1&c_start=1&list_uids=10280980&uids=&dopt=gb -jason On Jul 4, 2007, at 10:11 AM, Alberto Davila wrote: > Dear All, > > I am trying to get all ESTs from a given species (eg: Trypanosoma > brucei) from Genbank in EST format (eg: > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > db=nucest&id=10280980)... > while using Entrez I can "display" individual EST entries in EST > format, > this "EST format" is not an option in the main "display" menu for > batch > download ... > > I dont see the EST format listed > (http://www.bioperl.org/wiki/Sequence_formats) among the ones that > SeqIO > deal with, so wonder there would another BioPerl module to do > this ? any > tips, would be greatly appreciated ;-) > > Kindest regards, Alberto > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From dmessina at wustl.edu Wed Jul 4 14:37:22 2007 From: dmessina at wustl.edu (David Messina) Date: Wed, 4 Jul 2007 13:37:22 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> Message-ID: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > we start having fixed, > regularly timed dev releases like Parrot, monthly or bimonthly (quite > common on CPAN), with brief release reports on which bugs have been > fixed, code has been added, so on. Not every bug has to be fixed per > dev release; if that were true there would never be releases for some > of the XML parser packages. No RCs for dev releases (it's a dev > release!). These would be 1.x.y. We can then, every once in a > while, have a bug-squashing session, hackathon, etc, and have regular > non-dev release (1.x) that all core devs accept and that passes a > particular milestone. Regardless of whether we split or don't, I think these ideas of adding a little more structure to BioPerl's development cycles -- especially having bug-squashing and hacking sessions, where we all band together and commit some time to cranking through a bunch of to- dos -- would be beneficial, particularly as a means to keeping a certain basal level of momentum in BioPerl. Dave From jason at bioperl.org Wed Jul 4 15:45:29 2007 From: jason at bioperl.org (Jason Stajich) Date: Wed, 4 Jul 2007 12:45:29 -0700 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I definitely agree - we can live up to the unstable "living on the edge" nature of dev releases a bit more perhaps? On Jul 4, 2007, at 11:37 AM, David Messina wrote: > > On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: > >> we start having fixed, >> regularly timed dev releases like Parrot, monthly or bimonthly (quite >> common on CPAN), with brief release reports on which bugs have been >> fixed, code has been added, so on. Not every bug has to be fixed per >> dev release; if that were true there would never be releases for some >> of the XML parser packages. No RCs for dev releases (it's a dev >> release!). These would be 1.x.y. We can then, every once in a >> while, have a bug-squashing session, hackathon, etc, and have regular >> non-dev release (1.x) that all core devs accept and that passes a >> particular milestone. > > > Regardless of whether we split or don't, I think these ideas of > adding a little more structure to BioPerl's development cycles -- > especially having bug-squashing and hacking sessions, where we all > band together and commit some time to cranking through a bunch of to- > dos -- would be beneficial, particularly as a means to keeping a > certain basal level of momentum in BioPerl. > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Wed Jul 4 16:54:14 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 4 Jul 2007 15:54:14 -0500 Subject: [Bioperl-l] Splitting Bioperl In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: I think what's partially responsible for slowing down releases is the expectation that each dev release is supposed to have all bugs fixed, work for every OS, etc. In other words, act like a stable release. A developer release by nature is living on the edge, so why not have regular dev releases? We keep telling users to update to using bioperl-live whenever something breaks, anyway. We could decide to split stuff off along the way into more 'stable' sections if there were more demand for it, and have the more API-volatile code (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the 'dev' tag until we feel it's ready for prime time. chris On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > I definitely agree - we can live up to the unstable "living on the > edge" nature of dev releases a bit more perhaps? > > > On Jul 4, 2007, at 11:37 AM, David Messina wrote: > >> >> On Jul 4, 2007, at 9:53 AM, Chris Fields wrote: >> >>> we start having fixed, >>> regularly timed dev releases like Parrot, monthly or bimonthly >>> (quite >>> common on CPAN), with brief release reports on which bugs have been >>> fixed, code has been added, so on. Not every bug has to be fixed >>> per >>> dev release; if that were true there would never be releases for >>> some >>> of the XML parser packages. No RCs for dev releases (it's a dev >>> release!). These would be 1.x.y. We can then, every once in a >>> while, have a bug-squashing session, hackathon, etc, and have >>> regular >>> non-dev release (1.x) that all core devs accept and that passes a >>> particular milestone. >> >> >> Regardless of whether we split or don't, I think these ideas of >> adding a little more structure to BioPerl's development cycles -- >> especially having bug-squashing and hacking sessions, where we all >> band together and commit some time to cranking through a bunch of to- >> dos -- would be beneficial, particularly as a means to keeping a >> certain basal level of momentum in BioPerl. >> >> Dave >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > http://jason.open-bio.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From n.haigh at sheffield.ac.uk Thu Jul 5 04:09:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 09:09:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> Message-ID: <468CA721.4020804@sheffield.ac.uk> Chris Fields wrote: > I think what's partially responsible for slowing down releases is the > expectation that each dev release is supposed to have all bugs fixed, > work for every OS, etc. In other words, act like a stable release. > > A developer release by nature is living on the edge, so why not have > regular dev releases? We keep telling users to update to using > bioperl-live whenever something breaks, anyway. We could decide to > split stuff off along the way into more 'stable' sections if there > were more demand for it, and have the more API-volatile code > (DB::SeqFeature, EUtilities, GFF3, Chado-related, etc.) retain the > 'dev' tag until we feel it's ready for prime time. > > chris > > On Jul 4, 2007, at 2:45 PM, Jason Stajich wrote: > > -- snip -- I agree, although would the dev releases still need to pass all the tests? I'm thinking of people installing via CPAN. I also agree with what was said in a previous post about bringing back bioperl-run (and some others) back into the same repository as bioperl-core (after a successful move over to svn) and have Build.PL deal with creating the packages etc for CPAN. This would hopefully help keep the run package (and others) up to speed with the core package. I also agree with previous posts about organising and/or having some naming convention for test data files. I think an approach whereby data files were organised into directory trees (1 - 3 deep) with names that elude to the type of data in that subtree/file rather than the tests that use it etc. For example: t/data |__ formats | |__ seq | | |__ legal_fasta | | | |__ extension.fas | | | |__ extension.fasta | | | |__ extension.foo | | | |__ extension.bar | | | |__ no_extension | | | |__ interleaved.fas | | | |__ non_interleaved.fas | | | |__ single_seq.fas | | | |__ multiple_seq.fas | | | |__ desc_line1.fas | | | |__ desc_line2.fas | | | | | |__ illegal_fasta | | | |__ illegal_chars.fas | | | |__ some_other_illegal_alternative.fas | | | | | |__ legal_genbank | | | |__ etc etc | | | | | |__ illegal_genank | | |__ etc etc | | | |__ aln | |__ blast | | |__ legal_blastx | | | | | |__ legal_blastp | | | | | |__ legal_tblastx | | | | | |__ legal_plastpsi | | | | | |__ legal_wublast | |__ foo | |__ bar | |__ misc | |__ etc This type of setup, might lend itself to having a test script simply try to parse all the files in a directory to ensure nothing fails (for legal file formats) and fails for illegal formats. Naming of the file paths would help test authors to identify a suitable data file for their own tests before adding their own to the t/data dir. It might also help to identify areas where example test data is currently lacking. Thinking about this a little more, I think it would be a good idea to include Test::Exception in t/lib. We should also be testing that warnings and exceptions are generated when expected - e.g. illegal characters in seq files etc etc. Without these sorts of tests we are only getting half the story. This testing might account for a large chunk of the poor test coverage, particularly when it comes to branches in the code. Anyway, this type of reorganisation couldn't take place until the svn repo is up and working. I'd appreciate any comments on the above! Nath From bix at sendu.me.uk Thu Jul 5 04:55:25 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 09:55:25 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <468CB1FD.7060301@sendu.me.uk> Nathan S. Haigh wrote: > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Yes, they'd all have to pass. 'Developer release' should never have the connotation of 'broken release'. However, getting all tests to pass is a lot easier than fixing all bugs in bugzilla. (... which actually goes to show how poor our tests are) Worst case, if we were forced to stick to a schedule but couldn't fix a failing test, we could always make it a 'todo' test. > I also agree with what was said in a previous post about bringing back > bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) Agree (with myself essentially). > I also agree with previous posts about organising and/or having some > naming convention for test data files. I think an approach whereby data > files were organised into directory trees (1 - 3 deep) with names that > elude to the type of data in that subtree/file rather than the tests > that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas [snip] At that level, files don't need extensions and can have fully informative names that explain what's interesting or special about them. > This type of setup, might lend itself to having a test script simply try > to parse all the files in a directory to ensure nothing fails (for legal > file formats) and fails for illegal formats. Great idea. > Thinking about this a little more, I think it would be a good idea to > include Test::Exception in t/lib. Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > Anyway, this type of reorganisation couldn't take place until the svn > repo is up and working. Agree. From bix at sendu.me.uk Thu Jul 5 05:39:10 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 10:39:10 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <468CBC3E.1020408@sendu.me.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Thinking about this a little more, I think it would be a good idea to >> include Test::Exception in t/lib. > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. I've now done that: BioperlTest loads Test::Exception, from the copy in t/lib if necessary. So, in BioperlTest-using scripts you now have access to the methods dies_ok, lives_ok, throws_ok and lives_and. From N.Haigh at sheffield.ac.uk Thu Jul 5 06:01:04 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 11:01:04 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CB1FD.7060301@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> Message-ID: <1183629664.468cc1609891a@webmail.shef.ac.uk> Quoting Sendu Bala : -- snip -- > > > > I also agree with previous posts about organising and/or having some > > naming convention for test data files. I think an approach whereby data > > files were organised into directory trees (1 - 3 deep) with names that > > elude to the type of data in that subtree/file rather than the tests > > that use it etc. For example: > > > > t/data > > |__ formats > > | |__ seq > > | | |__ legal_fasta > > | | | |__ extension.fas > [snip] > > At that level, files don't need extensions and can have fully > informative names that explain what's interesting or special about them. > You may be correct in most cases, however, isn't there a method for detecting the file format from the file extension and failing that it peeks inside the file? Therefore there should be a file extension for each of these to get good code coverage as well as each format not having an extension to check that the peek inside the file correctly determines the format. -- snip -- From bix at sendu.me.uk Thu Jul 5 06:04:16 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:04:16 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <1183629664.468cc1609891a@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> Message-ID: <468CC220.804@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Sendu Bala : > > -- snip -- >> >>> I also agree with previous posts about organising and/or having >>> some naming convention for test data files. I think an approach >>> whereby data files were organised into directory trees (1 - 3 >>> deep) with names that elude to the type of data in that >>> subtree/file rather than the tests that use it etc. For example: >>> >>> t/data |__ formats | |__ seq | | |__ >>> legal_fasta | | | |__ extension.fas >>> >> [snip] >> >> At that level, files don't need extensions and can have fully >> informative names that explain what's interesting or special about >> them. >> > > You may be correct in most cases, however, isn't there a method for > detecting the file format from the file extension and failing that it > peeks inside the file? Therefore there should be a file extension for > each of these to get good code coverage as well as each format not > having an extension to check that the peek inside the file correctly > determines the format. Yes, you're quite correct. From bix at sendu.me.uk Thu Jul 5 06:47:12 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 11:47:12 +0100 Subject: [Bioperl-l] Warnings Message-ID: <468CCC30.90406@sendu.me.uk> I'm trying to get Test::Warn to work with Bioperl warnings as produced by Bio::Root::RootI::warn(). However, afaict the warnings must be generated with CORE::warn(), not print STDERR. Is there any particular reason RootI::warn is done with print and not CORE::warn ? Can I change it to a warn? From bix at sendu.me.uk Thu Jul 5 09:04:50 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:04:50 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <200707051458.59921.heikki@sanbi.ac.za> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> Message-ID: <468CEC72.4090909@sendu.me.uk> Heikki Lehvaslaiho wrote: > My guess is that using 'print STDERR' avoids showing sometimes annoying > errordescription at programname line NN > syntax being used. Afaik, CORE::warn "anything\n"; never includes the line number: messages with a new line always disable that feature. Bio::Root::RootI::warn /always/ puts new lines into the message, so they /never/ have the line number. > On the other hand, the main reason we need to set verbosity to 1 in BioPerl > objects is to find where warnings are coming from. Maybe extra text in > warnings leads to easier debugging. > > I favour changing it. So its my understanding there will be absolutely no difference in behaviour following this change (except that warning can be caught by Test::Warn). I just wanted to confirm my understanding. From hlapp at gmx.net Thu Jul 5 09:07:27 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 5 Jul 2007 09:07:27 -0400 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > Chris Fields wrote: >> I think what's partially responsible for slowing down releases is the >> expectation that each dev release is supposed to have all bugs fixed, >> work for every OS, etc. In other words, act like a stable release. >> It doesn't. A stable release has a stable API that will be supported until the next stable release through point releases. >> A developer release by nature is living on the edge, so why not have >> regular dev releases? There's no problem with regular dev releases, but tests will need to pass. There was never a stipulation that all bugs need to have been fixed. But all tests need to pass, so in an ideal world (in which everything is being tested) all tests passing would imply all (known) bugs fixed. Obviously, we don't live in an ideal world ... If not everything passes then what is the big difference to a code snapshot? If using cvs (or svn) is too difficult for most people, we can consider creating a mechanism that puts up nightly snapshots for download. > -- snip -- > > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. For example, that's another point. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From heikki at sanbi.ac.za Thu Jul 5 09:12:37 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 15:12:37 +0200 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <200707051512.38185.heikki@sanbi.ac.za> One more suggestion: It would be extemaly useful if we had a standard way of testing that a when a file is read into a bioperl object and then written out again into a same format, the input and output files are identical. If not, the test should show where the the differences start (showing all the differences would just clutter the screen). This standard method/subroutine should be used to test all sequence and other text file IO. Any takers? -Heikki On Thursday 05 July 2007 11:39:10 Sendu Bala wrote: > Sendu Bala wrote: > > Nathan S. Haigh wrote: > >> Thinking about this a little more, I think it would be a good idea to > >> include Test::Exception in t/lib. > > > > Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jul 5 08:58:59 2007 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 5 Jul 2007 14:58:59 +0200 Subject: [Bioperl-l] Warnings In-Reply-To: <468CCC30.90406@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> Message-ID: <200707051458.59921.heikki@sanbi.ac.za> My guess is that using 'print STDERR' avoids showing sometimes annoying errordescription at programname line NN syntax being used. On the other hand, the main reason we need to set verbosity to 1 in BioPerl objects is to find where warnings are coming from. Maybe extra text in warnings leads to easier debugging. I favour changing it. -Heikki On Thursday 05 July 2007 12:47:12 Sendu Bala wrote: > I'm trying to get Test::Warn to work with Bioperl warnings as produced > by Bio::Root::RootI::warn(). However, afaict the warnings must be > generated with CORE::warn(), not print STDERR. > > Is there any particular reason RootI::warn is done with print and not > CORE::warn ? Can I change it to a warn? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From bix at sendu.me.uk Thu Jul 5 09:44:08 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 14:44:08 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF5A8.7040402@sendu.me.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that > a when a file is read into a bioperl object and then written out > again into a same format, the input and output files are identical. As Hilmar has pointed out in the past, Bioperl doesn't aim for the files to be identical, only for none of the information to be lost and to be ouput in the correct format. So a round-trip test should read in the original, store all the parsed data, write it out, then read in the written version and see if the new parsed data matches the original. For simpler or ultra-strict file formats, though... > If not, the test should show where the the differences start (showing > all the differences would just clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other text file IO. > > Any takers? There's already something along these lines in t/SeqIO.t (the section that uses Algorithm::Diff). I copied that over from the old testformats.pl script but haven't really taken the time to see if its a good way of doing the test. Is it? Can someone come up with something better? Can someone generalise it if necessary? I imagine you could just read the files into arrays and use Test::More::is_deeply(). If that would be satisfactory I could easily add a little method to BioperlTest that did that. From n.haigh at sheffield.ac.uk Thu Jul 5 09:47:24 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 14:47:24 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <468CF66C.2070907@sheffield.ac.uk> Heikki Lehvaslaiho wrote: > One more suggestion: > > It would be extemaly useful if we had a standard way of testing that a when a > file is read into a bioperl object and then written out again into a same > format, the input and output files are identical. If not, the test should > show where the the differences start (showing all the differences would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence and other > text file IO. > > Any takers? > > -Heikki > Wouldn't this require info about the formatting of the file to be stored in the object as well, such that the same formatting could be used when writing the file? Wouldn't a better approach be to read the contents of file1 into ojb1, write obj1 to file2 in the same format, and then read file2 into obj2 and compare obj1 to obj2 to ensure we have all the same data. Nath From cjfields at uiuc.edu Thu Jul 5 09:52:12 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 08:52:12 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CA721.4020804@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 3:09 AM, Nathan S. Haigh wrote: > ... > I agree, although would the dev releases still need to pass all the > tests? I'm thinking of people installing via CPAN. Remains to be decided. All current tests (net and non-non) should pass. Any bug fixes should try to have added tests if possible, with in-process stuff as TODO's. Network tests are left up to user discretion, so if they fail for any particular reason there is a way around them. > I also agree with what was said in a previous post about bringing > back bioperl-run (and some others) back into the same repository as > bioperl-core (after a successful move over to svn) and have > Build.PL deal with creating the packages etc for CPAN. This would > hopefully help keep the run package (and others) up to speed with > the core package. It's up to how we want to have everything split. I don't think it's immediately prescient (there are more important priorities, i.e. bugs, svn) but I would say folding everything back into live and 'splitting' them out using an automated Build process is a viable option. > I also agree with previous posts about organising and/or having > some naming convention for test data files. I think an approach > whereby data files were organised into directory trees (1 - 3 deep) > with names that elude to the type of data in that subtree/file > rather than the tests that use it etc. For example: > > t/data > |__ formats > | |__ seq > | | |__ legal_fasta > | | | |__ extension.fas > | | | |__ extension.fasta > | | | |__ extension.foo > | | | |__ extension.bar > | | | |__ no_extension > | | | |__ interleaved.fas > | | | |__ non_interleaved.fas > | | | |__ single_seq.fas > | | | |__ multiple_seq.fas > | | | |__ desc_line1.fas > | | | |__ desc_line2.fas > | | | > | | |__ illegal_fasta > | | | |__ illegal_chars.fas > | | | |__ > some_other_illegal_alternative.fas > | | | > | | |__ legal_genbank > | | | |__ etc etc > | | | > | | |__ illegal_genank > | | |__ etc etc > | | > | |__ aln > | |__ blast > | | |__ legal_blastx > | | | > | | |__ legal_blastp > | | | > | | |__ legal_tblastx > | | | > | | |__ legal_plastpsi > | | | > | | |__ legal_wublast > | |__ foo > | |__ bar > | |__ misc > | > |__ etc > > This type of setup, might lend itself to having a test script > simply try to parse all the files in a directory to ensure nothing > fails (for legal file formats) and fails for illegal formats. > Naming of the file paths would help test authors to identify a > suitable data file for their own tests before adding their own to > the t/data dir. It might also help to identify areas where example > test data is currently lacking. ... This seems like more of a 'guess sequence' and format validation issue, something we've talked about before: http://bugzilla.open-bio.org/show_bug.cgi?id=1508 The way I feel about it is sequence format validation and sequence parsing should be separate issues and therefore in separate classes (with parsing optionally preceded by validation), but that's something for another discussion. > Thinking about this a little more, I think it would be a good idea > to include Test::Exception in t/lib. We should also be testing that > warnings and exceptions are generated when expected - e.g. illegal > characters in seq files etc etc. Without these sorts of tests we > are only getting half the story. This testing might account for a > large chunk of the poor test coverage, particularly when it comes > to branches in the code. > > Anyway, this type of reorganisation couldn't take place until the > svn repo is up and working. > > I'd appreciate any comments on the above! > Nath chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:08:29 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:08:29 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CF5A8.7040402@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> Message-ID: <468CFB5D.6080406@sheffield.ac.uk> Is there a way to install all the modules that are used in the tests? I mean there are cases where tests are skipped and pass if the required module for testing is not installed. Therefore, missing out a chunk of the tests. It would be desirable to be able to install all these modules in order to complete they whole test suite - any ideas if/how this can be done? Cheers Nath From bix at sendu.me.uk Thu Jul 5 10:15:34 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 15:15:34 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: <468CFD06.3080604@sendu.me.uk> Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these modules > in order to complete they whole test suite - any ideas if/how this can > be done? Yes, add them as recommended (or perhaps 'build_requires') modules in Build.PL, then run Build.PL and install the modules when it asks you. Everything should be in Build.PL already. If I missed something, please add it. From cjfields at uiuc.edu Thu Jul 5 10:18:08 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:08 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFB5D.6080406@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> Message-ID: On Jul 5, 2007, at 9:08 AM, Nathan S. Haigh wrote: > Is there a way to install all the modules that are used in the > tests? I > mean there are cases where tests are skipped and pass if the required > module for testing is not installed. Therefore, missing out a chunk of > the tests. It would be desirable to be able to install all these > modules > in order to complete they whole test suite - any ideas if/how this can > be done? > > Cheers > Nath That's optionally done upon 'perl Build.PL', correct? So if you choose not to install a particular prereq (i.e. XML::SAX), you shouldn't be forced to install it later just for tests. Or am I misunderstanding you? chris From cjfields at uiuc.edu Thu Jul 5 10:18:23 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:18:23 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468CC220.804@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > Nathan S. Haigh wrote: >> Quoting Sendu Bala : >>> ... >>> At that level, files don't need extensions and can have fully >>> informative names that explain what's interesting or special about >>> them. >>> >> >> You may be correct in most cases, however, isn't there a method for >> detecting the file format from the file extension and failing that it >> peeks inside the file? Therefore there should be a file extension for >> each of these to get good code coverage as well as each format not >> having an extension to check that the peek inside the file correctly >> determines the format. > > Yes, you're quite correct. I actually like Sendu's idea more, or the idea of each test suite having it's own directory. Tests which need to guess/validate the format are probably best left sequestered to a specific suite focused on format guessing/ validation, at least in my opinion. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:22:40 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:22:40 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFD06.3080604@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> Message-ID: <468CFEB0.80201@sheffield.ac.uk> Sendu Bala wrote: > Nathan S. Haigh wrote: >> Is there a way to install all the modules that are used in the tests? >> I mean there are cases where tests are skipped and pass if the >> required module for testing is not installed. Therefore, missing out a >> chunk of the tests. It would be desirable to be able to install all >> these modules in order to complete they whole test suite - any ideas >> if/how this can be done? > > Yes, add them as recommended (or perhaps 'build_requires') modules in > Build.PL, then run Build.PL and install the modules when it asks you. > > Everything should be in Build.PL already. If I missed something, please > add it. > OK, to clarify using the test file Sendu mentioned in a previous post: t/SeqIO.t This test skips tests if Algorithm::Diff, IO::ScalarArray or IO::String are not installed (the first two are not mentioned in Build.PL). However, if there are a lot of such skips in the whole test suite then there maybe few system with all these modules installed in order to conduct a complete test. These are the modules I'm referring to. Nath From n.haigh at sheffield.ac.uk Thu Jul 5 10:30:05 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:30:05 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> Message-ID: <468D006D.6050806@sheffield.ac.uk> Chris Fields wrote: > > On Jul 5, 2007, at 5:04 AM, Sendu Bala wrote: > >> Nathan S. Haigh wrote: >>> Quoting Sendu Bala : >>>> ... >>>> At that level, files don't need extensions and can have fully >>>> informative names that explain what's interesting or special about >>>> them. >>>> >>> >>> You may be correct in most cases, however, isn't there a method for >>> detecting the file format from the file extension and failing that it >>> peeks inside the file? Therefore there should be a file extension for >>> each of these to get good code coverage as well as each format not >>> having an extension to check that the peek inside the file correctly >>> determines the format. >> >> Yes, you're quite correct. > > I actually like Sendu's idea more, or the idea of each test suite having > it's own directory. > > Tests which need to guess/validate the format are probably best left > sequestered to a specific suite focused on format guessing/validation, > at least in my opinion. > > chris How easily would this lend itself to using the same data for multiple tests, or is it likely to lead to/exacerbate a culture of adding duplicate data files in each "test suite" rather than reusing? Nath From cjfields at uiuc.edu Thu Jul 5 10:33:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:33:46 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 8:07 AM, Hilmar Lapp wrote: > On Jul 5, 2007, at 4:09 AM, Nathan S. Haigh wrote: > >> Chris Fields wrote: >>> I think what's partially responsible for slowing down releases is >>> the >>> expectation that each dev release is supposed to have all bugs >>> fixed, >>> work for every OS, etc. In other words, act like a stable release. > > It doesn't. A stable release has a stable API that will be > supported until the next stable release through point releases. I agree, but I think there is still an expectation that 1.5.2 and beyond are more like true 'stable' releases even though we still designate them as 'developer.' We unfortunately reinforce that when we tell users they need to update to v. 1.5.2 or bioperl-live to fix a particular bug in the 1.4 release. There's nothing we can do about that now (hindsight is always 20/20, and 1.4 is just too old). We (pumpkin, core devs) can try correcting that by ensuring any bug fixes be committed to any new stable branch as well as to live, at least until it becomes too problematic to maintain that particular stable branch (at which point we would go about getting ready for the next 'stable' and repeat the cycle over again). >>> A developer release by nature is living on the edge, so why not have >>> regular dev releases? > > There's no problem with regular dev releases, but tests will need > to pass. There was never a stipulation that all bugs need to have > been fixed. But all tests need to pass, so in an ideal world (in > which everything is being tested) all tests passing would imply all > (known) bugs fixed. Obviously, we don't live in an ideal world ... ...particularly when it comes to network-related tests and remote server problems (but those are by default not run, so there is a way around test fails there). I agree here as well (all tests must pass). As for the bug fixes, we can just stipulate which ones were fixed with the release (in a RELEASE_NOTES or similar), and maybe have TODO's in the test suite designating they are being worked on. Basically, at regular intervals, maybe with a few weeks of lead time, the pumpkin would announce an impending dev. release. Go through rounds of tests, bug fixes, etc. When all tests pass post it on CPAN as a dev. release. If we have a stable release branch with relevant bug fixes we can post that as well, again to the point where it becomes too problematic. Would we just take a snapshot of MAIN and any relevant stable branch at that particular point for the CPAN release, just increasing the version number (1.x.y)? Would it make sense to have a 1.x.y branch for each release (I don't think so, but maybe others disagree)? > If not everything passes then what is the big difference to a code > snapshot? If using cvs (or svn) is too difficult for most people, > we can consider creating a mechanism that puts up nightly snapshots > for download. If we feel a nightly snapshot is warranted we could do that though. I personally don't think there is a need, particularly since we have several means to obtain the latest code at any point in time (including the browsable CVS 'Download tarball'). We could state the next dev/stable CPAN release (pending on date dd/mm/yy) will have the bug fix, and if they want it immediately then pick it up from CVS. >> -- snip -- >> >> I agree, although would the dev releases still need to pass all the >> tests? I'm thinking of people installing via CPAN. > > For example, that's another point. > > -hilmar Yes, I agree. As an aside, I don't think dev. releases pop up when you run a simple 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know the answer to that. chris From cjfields at uiuc.edu Thu Jul 5 10:34:22 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:34:22 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <200707051512.38185.heikki@sanbi.ac.za> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> Message-ID: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > One more suggestion: > > It would be extemaly useful if we had a standard way of testing > that a when a > file is read into a bioperl object and then written out again into > a same > format, the input and output files are identical. If not, the test > should > show where the the differences start (showing all the differences > would just > clutter the screen). > > This standard method/subroutine should be used to test all sequence > and other > text file IO. > > Any takers? > > -Heikki ... I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t that do some checking, I think, but something like this would be of use. However, what if the test file is old (as many in t/data are) and the format has changed? GenBank and EMBL, for instance, have gone through several changes to format. chris From n.haigh at sheffield.ac.uk Thu Jul 5 10:43:51 2007 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 05 Jul 2007 15:43:51 +0100 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <468D03A7.3090408@sheffield.ac.uk> Chris Fields wrote: -- snip -- >>> >>> I agree, although would the dev releases still need to pass all the >>> tests? I'm thinking of people installing via CPAN. >> >> For example, that's another point. >> >> -hilmar > > Yes, I agree. > > As an aside, I don't think dev. releases pop up when you run a simple > 'install Foo::Bar' from the CPAN shell but I'm not sure; Sendu may know > the answer to that. > > chris Thats right, it'll only install the non-developer releases (1.4 currently). If you want to install the developer release from CPAN you need to know the path the archive and then do: cpan> install S/SE/SENDU/bioperl-1.5.2_102.tar.gz as detailed on the wiki: http://www.bioperl.org/wiki/Release_1.5.2 Nath From cjfields at uiuc.edu Thu Jul 5 10:49:33 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:49:33 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <468CFEB0.80201@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> Message-ID: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > Sendu Bala wrote: >> ... >> Yes, add them as recommended (or perhaps 'build_requires') modules in >> Build.PL, then run Build.PL and install the modules when it asks you. >> >> Everything should be in Build.PL already. If I missed something, >> please >> add it. >> > > OK, to clarify using the test file Sendu mentioned in a previous post: > t/SeqIO.t > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > IO::String > are not installed (the first two are not mentioned in Build.PL). > However, if there are a lot of such skips in the whole test suite then > there maybe few system with all these modules installed in order to > conduct a complete test. These are the modules I'm referring to. > > Nath If they are only necessary for tests, work for all OSs, and are pure Perl they should be added to t/lib, like Test::More and the rest. If they only work for some OSs they could be added to t/lib and skip based on OS, but they still must be pure Perl. I would avoid anything that requires any compiling for XS or Inline altogether (I don't want to go down the nightmare road of OS-dependent compiler issues for a few tests). Finally, if they are needed for core modules (not just tests) then they should be added to the core prereqs in Build. chris From cjfields at uiuc.edu Thu Jul 5 10:52:58 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 09:52:58 -0500 Subject: [Bioperl-l] Warnings In-Reply-To: <468CEC72.4090909@sendu.me.uk> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> Message-ID: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > ... > > So its my understanding there will be absolutely no difference in > behaviour following this change (except that warning can be caught by > Test::Warn). I just wanted to confirm my understanding. You can always just try it out and run tests. Might be interesting to see if anything breaks. chris From N.Haigh at sheffield.ac.uk Thu Jul 5 10:58:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 15:58:30 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> Message-ID: <1183647510.468d07168963c@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: > > > > > One more suggestion: > > > > It would be extemaly useful if we had a standard way of testing > > that a when a > > file is read into a bioperl object and then written out again into > > a same > > format, the input and output files are identical. If not, the test > > should > > show where the the differences start (showing all the differences > > would just > > clutter the screen). > > > > This standard method/subroutine should be used to test all sequence > > and other > > text file IO. > > > > Any takers? > > > > -Heikki > ... > > I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t > that do some checking, I think, but something like this would be of > use. However, what if the test file is old (as many in t/data are) > and the format has changed? GenBank and EMBL, for instance, have > gone through several changes to format. > > chris > > Is there any way to distinguish variants apart other than just layout? e.g. a version number of the likes? Nath From N.Haigh at sheffield.ac.uk Thu Jul 5 11:04:30 2007 From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Thu, 5 Jul 2007 16:04:30 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> Message-ID: <1183647870.468d087ed4c80@webmail.shef.ac.uk> Quoting Chris Fields : > > On Jul 5, 2007, at 9:22 AM, Nathan S. Haigh wrote: > > > Sendu Bala wrote: > >> ... > >> Yes, add them as recommended (or perhaps 'build_requires') modules in > >> Build.PL, then run Build.PL and install the modules when it asks you. > >> > >> Everything should be in Build.PL already. If I missed something, > >> please > >> add it. > >> > > > > OK, to clarify using the test file Sendu mentioned in a previous post: > > t/SeqIO.t > > > > This test skips tests if Algorithm::Diff, IO::ScalarArray or > > IO::String > > are not installed (the first two are not mentioned in Build.PL). > > However, if there are a lot of such skips in the whole test suite then > > there maybe few system with all these modules installed in order to > > conduct a complete test. These are the modules I'm referring to. > > > > Nath > > If they are only necessary for tests, work for all OSs, and are pure > Perl they should be added to t/lib, like Test::More and the rest. If > they only work for some OSs they could be added to t/lib and skip > based on OS, but they still must be pure Perl. I would avoid > anything that requires any compiling for XS or Inline altogether (I > don't want to go down the nightmare road of OS-dependent compiler > issues for a few tests). If this is the case, there surely is no need to skip the tests if they should be provided in the t/lib dir. Am I missing something!? > > Finally, if they are needed for core modules (not just tests) then > they should be added to the core prereqs in Build. > > chris > From bix at sendu.me.uk Thu Jul 5 11:13:35 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:13:35 +0100 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: <468D0A9F.4010709@sendu.me.uk> Nathan S. Haigh wrote: > Quoting Chris Fields : >>> OK, to clarify using the test file Sendu mentioned in a previous >>> post: t/SeqIO.t >>> >>> This test skips tests if Algorithm::Diff, IO::ScalarArray or >>> IO::String are not installed >> >> If they are only necessary for tests, work for all OSs, and are >> pure Perl they should be added to t/lib, like Test::More and the >> rest. If they only work for some OSs they could be added to t/lib >> and skip based on OS, but they still must be pure Perl. I would >> avoid anything that requires any compiling for XS or Inline >> altogether (I don't want to go down the nightmare road of >> OS-dependent compiler issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? That skip in SeqIO.t is new and I simply didn't think of them as important enough to make anyone install them or include them in t/lib. I'd go ahead and add those modules, but like I say, it may make more sense just to use is_deeply(), removing the dependency on Algorithm::Diff and IO::ScalarArray completely. From cjfields at uiuc.edu Thu Jul 5 11:35:41 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:35:41 -0500 Subject: [Bioperl-l] Installing all modules required for testing In-Reply-To: <1183647870.468d087ed4c80@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <468CF5A8.7040402@sendu.me.uk> <468CFB5D.6080406@sheffield.ac.uk> <468CFD06.3080604@sendu.me.uk> <468CFEB0.80201@sheffield.ac.uk> <16397C9B-07DA-416E-A2AF-5FA403BA0388@uiuc.edu> <1183647870.468d087ed4c80@webmail.shef.ac.uk> Message-ID: On Jul 5, 2007, at 10:04 AM, Nathan S. Haigh wrote: > ... >> If they are only necessary for tests, work for all OSs, and are pure >> Perl they should be added to t/lib, like Test::More and the rest. If >> they only work for some OSs they could be added to t/lib and skip >> based on OS, but they still must be pure Perl. I would avoid >> anything that requires any compiling for XS or Inline altogether (I >> don't want to go down the nightmare road of OS-dependent compiler >> issues for a few tests). > > If this is the case, there surely is no need to skip the tests if > they should be provided in the t/lib dir. Am I missing something!? No, you are correct, but these are currently not in t/lib (unless someone snuck them in....) Of the modules you listed above, only one (IO::String) is required by the core modules. The others are not. Users shouldn't be forced to install Algorithm::Diff or IO::ScalarArray just to run tests, so anything not required should go into t/lib if at all possible. If there any reasons (OS issues, list of prereqs) which preclude adding these to t/lib we need to ask ourselves (1) why we are using that module in the first place? And, if there is a good reason, (2) can we skip them if they aren't present? Both of those options are already available. chris From cjfields at uiuc.edu Thu Jul 5 11:50:55 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:50:55 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <468D006D.6050806@sheffield.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <1183629664.468cc1609891a@webmail.shef.ac.uk> <468CC220.804@sendu.me.uk> <468D006D.6050806@sheffield.ac.uk> Message-ID: <404EEDE8-53AC-411E-B4F0-CF4B4AABE9E0@uiuc.edu> On Jul 5, 2007, at 9:30 AM, Nathan S. Haigh wrote: > ... >> I actually like Sendu's idea more, or the idea of each test suite >> having it's own directory. >> Tests which need to guess/validate the format are probably best >> left sequestered to a specific suite focused on format guessing/ >> validation, at least in my opinion. >> chris > > > How easily would this lend itself to using the same data for > multiple tests, or is it likely to lead to/exacerbate a culture of > adding duplicate data files in each "test suite" rather than reusing? > > Nath If there is a group of test data used for more than one test suite we can group those together into a common use folder, or we can go by format. I'm pretty open to anything, really, as long as it is more organized. My point is really concerned more with validation/guessing. I think we should limit those tests to their respective specific test suites, or even to sections within a particular test suite (for instance, genbank.t), but not to force sequence guessing or validation in other cases. To me validation, guessing, and parsing are three distinct issues (much like XML parsers handle things), so they require three distinct tests. As for true sequence validation, there is no official format validation scheme yet in BioPerl. It's sort of unofficially intergrated into the sequence parsers themselves (something which I find to be problematic for several reasons too long to outline here). chris From cjfields at uiuc.edu Thu Jul 5 11:54:42 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 10:54:42 -0500 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <1183647510.468d07168963c@webmail.shef.ac.uk> References: <468B6FBF.1070708@sendu.me.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> <200707051512.38185.heikki@sanbi.ac.za> <2C7BA3C8-0693-458F-A49D-879ABAB8012E@uiuc.edu> <1183647510.468d07168963c@webmail.shef.ac.uk> Message-ID: <48474A2C-2A58-4D51-8E7F-7CE083948D0F@uiuc.edu> On Jul 5, 2007, at 9:58 AM, Nathan S. Haigh wrote: > Quoting Chris Fields : > >> >> On Jul 5, 2007, at 8:12 AM, Heikki Lehvaslaiho wrote: >> >>> >>> One more suggestion: >>> >>> It would be extemaly useful if we had a standard way of testing >>> that a when a >>> file is read into a bioperl object and then written out again into >>> a same >>> format, the input and output files are identical. If not, the test >>> should >>> show where the the differences start (showing all the differences >>> would just >>> clutter the screen). >>> >>> This standard method/subroutine should be used to test all sequence >>> and other >>> text file IO. >>> >>> Any takers? >>> >>> -Heikki >> ... >> >> I agree. There are some 'round-trip' tests with genbank.t or SeqIO.t >> that do some checking, I think, but something like this would be of >> use. However, what if the test file is old (as many in t/data are) >> and the format has changed? GenBank and EMBL, for instance, have >> gone through several changes to format. >> >> chris >> >> > > Is there any way to distinguish variants apart other than just > layout? e.g. a version number of the likes? > > Nath I don't think so; this veers back into the whole validation issue (i.e. does the record fit certain specifications). There are examples of seq records from different sources which bioperl is expected to parse, for example Ensembl GenBank records. Some of those have feature tags or annotation fields which may not appear in output when using write_seq(). I don't think it's as important to replicate the output data exactly like the input as much as it's important to have the data represented in a Bio::Seq object (or any other Bio* instance) in a consistent manner and have the ability to incorporate new fields (such as the recent addition of genome projects) transparently. The latter is hard to do with the current genbank parser (you have to specifically code for it), but it is a bit easier to do with the driver-handler model I'm working on. chris From bix at sendu.me.uk Thu Jul 5 11:56:29 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:56:29 +0100 Subject: [Bioperl-l] Test related Suggestions In-Reply-To: <468CBC3E.1020408@sendu.me.uk> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <468CB1FD.7060301@sendu.me.uk> <468CBC3E.1020408@sendu.me.uk> Message-ID: <468D14AD.8050007@sendu.me.uk> Sendu Bala wrote: > Sendu Bala wrote: >> Nathan S. Haigh wrote: >>> Thinking about this a little more, I think it would be a good idea to >>> include Test::Exception in t/lib. >> Agree. I'll see if I can have it auto-loaded by BioperlTest.pm. > > I've now done that: BioperlTest loads Test::Exception, from the copy in > t/lib if necessary. > > So, in BioperlTest-using scripts you now have access to the methods > dies_ok, lives_ok, throws_ok and lives_and. And I've also now added in support for Test::Warn, giving you warning_is, warnings_are, warning_like and warnings_like. I've updated the HOWTO as well: http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests You can see these things in action in t/seq_quality.t From bix at sendu.me.uk Thu Jul 5 11:57:23 2007 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 05 Jul 2007 16:57:23 +0100 Subject: [Bioperl-l] Warnings In-Reply-To: <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> References: <468CCC30.90406@sendu.me.uk> <200707051458.59921.heikki@sanbi.ac.za> <468CEC72.4090909@sendu.me.uk> <2E0C7F35-9AA1-479A-A430-7D4037D98A3A@uiuc.edu> Message-ID: <468D14E3.6030104@sendu.me.uk> Chris Fields wrote: > > On Jul 5, 2007, at 8:04 AM, Sendu Bala wrote: > >> ... >> >> So its my understanding there will be absolutely no difference in >> behaviour following this change (except that warning can be caught by >> Test::Warn). I just wanted to confirm my understanding. > > You can always just try it out and run tests. Might be interesting to > see if anything breaks. I've made the change. Everything seems ok as far as I can tell. From dmessina at wustl.edu Thu Jul 5 12:02:26 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:02:26 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 9:33 AM, Chris Fields wrote: > I agree, but I think there is still an expectation that 1.5.2 and > beyond are more like true 'stable' releases even though we still > designate them as 'developer.' We unfortunately reinforce that when > we tell users they need to update to v. 1.5.2 or bioperl-live to fix > a particular bug in the 1.4 release. I know this has been discussed before, but while we're talking about future release plans, it might be worth revisiting the BioPerl policy of designating only even-numbered releases as 'stable'. It's taking so long to get from 1.4 to 1.6. While the principle of keeping a stable API between 'stable' releases is valid in the ideal case, I think that continuing to label 1.5.2 (or whatever the latest 'dev' release is) as a developer release (which implies potentially unstable or bleeding-edge code) is highly misleading since we would never ever tell anyone to get 1.4 instead. Alternatively, if we adopt a more aggressive release schedule as Chris proposed a couple days ago, then perhaps we could agree to push out an even-numbered release once a year or so, so that there is a 'stable' release we could recommend. > If we feel a nightly snapshot is warranted we could do that though. > I personally don't think there is a need, particularly since we have > several means to obtain the latest code at any point in time > (including the browsable CVS 'Download tarball'). We could state the > next dev/stable CPAN release (pending on date dd/mm/yy) will have the > bug fix, and if they want it immediately then pick it up from CVS. To make it easier for people to obtain the latest tarball, we could put the 'download tarball' link directly on the 'Getting_BioPerl' wiki page instead of only a link to the viewcvs interface. That way they wouldn't have to navigate the source tree to figure out which tarball they want (which is almost always going to be the bioperl- live tarball). I think the actual URL underlying the 'Download tarball' link on viewcvs is stable: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- live.tar.gz?tarball=1 Dave From cjfields at uiuc.edu Thu Jul 5 12:13:30 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:13:30 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: On Jul 5, 2007, at 11:02 AM, David Messina wrote: > ... > I know this has been discussed before, but while we're talking > about future release plans, it might be worth revisiting the > BioPerl policy of designating only even-numbered releases as > 'stable'. It's taking so long to get from 1.4 to 1.6. While the > principle of keeping a stable API between 'stable' releases is > valid in the ideal case, I think that continuing to label 1.5.2 (or > whatever the latest 'dev' release is) as a developer release (which > implies potentially unstable or bleeding-edge code) is highly > misleading since we would never ever tell anyone to get 1.4 instead. > > Alternatively, if we adopt a more aggressive release schedule as > Chris proposed a couple days ago, then perhaps we could agree to > push out an even-numbered release once a year or so, so that there > is a 'stable' release we could recommend. I think the idea of 'stable' is best summarized back in Hilmar's post (i.e. we support a particular API for that release). The 1.5 releases I believe break some aspects of 1.4 API (some of the Feature/ Annotation stuff introduced before the official 1.5 release). We still need to address some of those issues before a 1.6 which seems to be the only real stumbling block, but they are unfortunately not well-documented and are somewhat interwoven with GMOD code. > ... > To make it easier for people to obtain the latest tarball, we could > put the 'download tarball' link directly on the 'Getting_BioPerl' > wiki page instead of only a link to the viewcvs interface. That way > they wouldn't have to navigate the source tree to figure out which > tarball they want (which is almost always going to be the bioperl- > live tarball). > > I think the actual URL underlying the 'Download tarball' link on > viewcvs is stable: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl- > live.tar.gz?tarball=1 > > Dave Sounds reasonable enough. Do you want to do the honors? chris From dmessina at wustl.edu Thu Jul 5 12:44:28 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 11:44:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> Message-ID: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> > [Chris] > The 1.5 releases I believe break some aspects of 1.4 API Yes, this is true. I question, though, whether it's relevant given that virtually no one uses 1.4 anymore. In any case, I would venture that the number of people who would be bitten by the 1.4->1.5 API change is much smaller than the number of people who download 1.4 and then ask us why it doesn't work. I think that, rather than continuing to call 1.5.x the developer release in order to adhere to the API guarantee, it would be much clearer to users if we state clearly that everyone should download 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API changes. >> [me] >> we could put the 'download tarball' link directly on the >> 'Getting_BioPerl' wiki page > > [Chris] > Sounds reasonable enough. Do you want to do the honors? Done. Dave From cjfields at uiuc.edu Thu Jul 5 12:57:28 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 5 Jul 2007 11:57:28 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: On Jul 5, 2007, at 11:44 AM, David Messina wrote: > >> [Chris] >> The 1.5 releases I believe break some aspects of 1.4 API > > Yes, this is true. > > I question, though, whether it's relevant given that virtually no > one uses 1.4 anymore. In any case, I would venture that the number > of people who would be bitten by the 1.4->1.5 API change is much > smaller than the number of people who download 1.4 and then ask us > why it doesn't work. > > I think that, rather than continuing to call 1.5.x the developer > release in order to adhere to the API guarantee, it would be much > clearer to users if we state clearly that everyone should download > 1.5.x, and that if you're upgrading from 1.4 to 1.5 there are API > changes. You'd be surprised how many are still using bioperl 1.2.3 (Ensembl) and 1.4 (any admin too scared to go with a 'dev' release). The real answer is to get out a stable 1.6 ASAP. The problem we currently have is (horrible Texas pun) 'too many pokers in the fire.' We have svn migration, major changes in the test suite, talk about splitting bioperl, a lot of bugs to sort through, new code to add or work on, etc. Not to mention our $jobs! I think we should just bite the bullet and proceed with pulling out the controversial operator overloading in Bio::Annotation*, deprecate the tag methods in AnnotatableI, and go about fixing everything up. If that occurs (which seems to be the major impediment) and we get GMOD/GBrowse playing well with BioPerl then we can aim for a new stable release, and then institute a regular release cycle. chris From bpederse at gmail.com Thu Jul 5 13:58:24 2007 From: bpederse at gmail.com (Brent Pedersen) Date: Thu, 5 Jul 2007 10:58:24 -0700 Subject: [Bioperl-l] slippy map for genomic features. Message-ID: hi, here's a side project i've been tinkering on in googlecode svn that may be useful to some. http://code.google.com/p/genome-browser/ it's a simple hack on top of OpenLayers (openlayers.org) to provide a javascript slippy map interface and API to view and browse genomic features. It can be used with any image generation program that can accept &xmin= and &xmax= parameters through the url. -- though i havent had it working it bioperl as bioperl generates images of different height depending on the number of tracks. there's a live example of the code in SVN here: http://toxic.berkeley.edu/bpederse/genome-browser/ with images generated by a colleague's modules on first request. those images are then cached by a simple perl script included in the SVN repo. all subsequent requests are returned from the cache. an image request (automatically generated by the javascript) looks like: http://toxic.berkeley.edu/bpederse/genome-browser/tiler.pl?chr=1&version=6&layers=mRNA&organism=arabidopsis&xmin=491520&xmax=499712&width=512 but any implementation need only implement xmin and xmax. all other parameters will be used for caching but are not required. if anyone is interested in getting this going with bioperl image generation--or improving the project in any way, let me know and i'll add you as a committer and provide any javascript support that i can. -brent tar ball download: http://genome-browser.googlecode.com/files/genome-browser-0.02.tar From dmessina at wustl.edu Thu Jul 5 14:39:16 2007 From: dmessina at wustl.edu (David Messina) Date: Thu, 5 Jul 2007 13:39:16 -0500 Subject: [Bioperl-l] Splitting Bioperl and Test related Suggestions In-Reply-To: References: <468B6FBF.1070708@sendu.me.uk> <9C10E360-BBD6-40FE-B15F-64660104DFD7@uiuc.edu> <6815832D-F43D-4C95-AD68-5D26826C1ECE@wustl.edu> <468CA721.4020804@sheffield.ac.uk> <21EF9B14-E88F-49BF-B046-AFE7E0090A10@gmx.net> <8C32D2FF-CDFA-4276-9350-8991CE4496DB@wustl.edu> Message-ID: > The real answer is to get out a stable 1.6 ASAP. The problem we > currently have is (horrible Texas pun) 'too many pokers in the > fire.' We have svn migration, major changes in the test suite, > talk about splitting bioperl, a lot of bugs to sort through, new > code to add or work on, etc. Not to mention our $jobs! Yep, I hear ya. > I think we should just bite the bullet and proceed with pulling out > the controversial operator overloading in Bio::Annotation*, > deprecate the tag methods in AnnotatableI, and go about fixing > everything up. If that occurs (which seems to be the major > impediment) and we get GMOD/GBrowse playing